These are chat archives for nextflow-io/nextflow

10th
Oct 2018
Rad Suchecki
@rsuchecki
Oct 10 2018 03:17

Can a label directive value by dynamically assigned from input? In the following example

process foo {
tag("${tool}")
label("${tool}") 
input:
  each tool from tools

the tag directive works fine but adding the label directive causes: ERROR ~ No such variable: tool

Paolo Di Tommaso
@pditommaso
Oct 10 2018 06:56
@rsuchecki weird, it should work, please report a bug
Maxime Vallée
@valleem
Oct 10 2018 07:01
Hello all! @pditommaso has this discussion nextflow-io/nextflow#828 led to a commit (besides the documentation) included in the current release (0.32.0)?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 07:07
yes, cache 'lenient'
Maxime Vallée
@valleem
Oct 10 2018 07:13
Yes, I saw the discussion, however I was not able to find the commit about it. I was looking for it in the first place because it does not seem to solve my issue (jobs relaunching and they should not) and I was thinking I made a typo. In the discussion you said it is in NXF_VER=0.32.0u3-SNAPSHOT, and I use your nextflow-0.32.0-all on an offline cluster. So I was worried in the release I do not have the cache 'lenient' commit.
Martin Proks
@matq007
Oct 10 2018 07:16
@mes5k thanks for the tip, I'll try it out
Maxime Vallée
@valleem
Oct 10 2018 07:18
There (https://github.com/nextflow-io/nextflow/commit/16a7ba36fd6a62ddfc8b4a2e3332c868286c84ac#diff-c197962302397baf3a4cc36463dce5ea) I found that NF went from 0.32.0u1-SNAPSHOT to 0.32.0, and in the discussion previously linked, you were talking about u3. Therefore I was wondering if in the 0.32.0 release, there is the lenient functionality (because I really think this functionality will save my weird resume behavior).
Paolo Di Tommaso
@pditommaso
Oct 10 2018 07:19
likely I missed to include in the changelog, that feature in the 0.32 release, tho not sure it helps to solve your problem
Maxime Vallée
@valleem
Oct 10 2018 07:25
I tried cache 'deep' and it works, but it is super slow and it might die in my real world computation for my full cohort. I am trying on a new cluster to see if the problem persists. Nextflow does not throw an error or a warning when we use a non-existing cache directive?
Maxime Vallée
@valleem
Oct 10 2018 07:44
You worked on this fork https://github.com/nextflow-io/nextflow/compare/issue_828 and I downloaded the source of 0.32.0 and I cannot find your changes by grepping recursively in the subdirectories.
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:01
Morning
I'm trying to use storeDir but when I resume is still looking at the folder structure inside work
my idea is to remove work folder and re-run using resume. Is this the expected behaviour?
Maxime Vallée
@valleem
Oct 10 2018 08:08
@lucacozzuto yes resume will look for what has been done in work. Even if you published your data, it will not look where it is published in my experience.
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:14
@valleem but is storeDir supposed to do this (it is not publishDir)
checking in the log I have this...
[fc/bea420] NOTE: Cannot access folder: '/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/blastdb' -- Error is ignored
I solved by chmod 777 but is this what am I supposed to do, @pditommaso?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 08:23
not sure to understand the problem
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:25
me too :P So I used storeDir to store some outputs I would need also after removing the work folder
then I re-run the pipeline with resume
and it complained that cannot access that folder
Paolo Di Tommaso
@pditommaso
Oct 10 2018 08:31
it doesn't do what you would like, but what it's written here :)
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:35
The process is executed only if the files declared in the output clause do not exist in the directory specified by the storeDir directive. When the files exist the process execution is skipped and these files are used as the actual process result.
Paolo Di Tommaso
@pditommaso
Oct 10 2018 08:37
bro, if you think it's an error, open an issue with a test case
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:39
well actually it does what I want, the problem is why the folder has to be 777
I'll make more tests...
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:51
ok so I did another test. It looks like a privileges problems
[warm up] executor > crg
[60/e2dc89] Submitted process > msconvert (44363b73-ffa3-4bd0-b1ad-1cb58b986f5b_QCS1_5ed32d2f96d9671ced7c0d26a93fce8a)
[21/950f45] Submitted process > makeblastdb (BSA)
[f7/d80183] Submitted process > makeblastdb (HELA)
[21/950f45] NOTE: Cannot access folder: '/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/blastdb' -- Error is ignored
[fe/e53739] Submitted process > msconvert (9d9d9d1b-9d9d-4f1a-9d27-9d2f7635059d_QC03_b6f96ae8248181b823f5a720c09445cd)
[f7/d80183] NOTE: Cannot access folder: '/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/blastdb' -- Error is ignored
[c0/1f7fe6] Submitted process > correctMzml (44363b73-ffa3-4bd0-b1ad-1cb58b986f5b_QCS1_5ed32d2f96d9671ced7c0d26a93fce8a)
but the files are copied there correctly
privileges are the same...
Luca Cozzuto
@lucacozzuto
Oct 10 2018 09:07
Alexander Peltzer
@apeltzer
Oct 10 2018 09:28
Can I define a channel dependency as input? So if params.whatever is set, take channel Y as input else take channel X ?
Already checked the patterns repository and I thought I saw something like this already but didn't recall the syntax for that
Paolo Di Tommaso
@pditommaso
Oct 10 2018 09:31
input: 
file x from ( params.whatever ? X : Y )
Alexander Peltzer
@apeltzer
Oct 10 2018 09:50
Nice
That was what I was looking for :-)
Thanks @pditommaso !
Paolo Di Tommaso
@pditommaso
Oct 10 2018 09:53
:ok_hand:
Maxime Vallée
@valleem
Oct 10 2018 09:56
@pditommaso GREAT NEWS! I grabbed the source of the fork with the lenient work, compiled and packed. Tried it, and the cache 'lenient' is completely working as intended!
Paolo Di Tommaso
@pditommaso
Oct 10 2018 10:00
oh, that means it was not merged .. :shamed:
Rad Suchecki
@rsuchecki
Oct 10 2018 10:25
Bug reported @pditommaso nextflow-io/nextflow#894
Paolo Di Tommaso
@pditommaso
Oct 10 2018 10:26
tx
Luca Cozzuto
@lucacozzuto
Oct 10 2018 10:27
Hi @pditommaso
process calc_MS2_spectral_count {
    publishDir "output/spec_count"
    tag { "${sample_id}-${analysis_type}" }

    input:
    set sample_id, internal_code, analysis_type, checksum, file(qcmlfile) from qcmlfiles_for_MS2_spectral_count.mix(shot_qc4l_cid_qcmlfiles_for_MS2_spectral_count, shot_qc4l_hcd_qcmlfiles_for_MS2_spectral_count, shot_qc4l_etcid_qcmlfiles_for_MS2_spectral_count, shot_qc4l_ethcd_qcmlfiles_for_MS2_spectral_count)
    file(workflowfile) from getWFFile(baseQCPath, "MS2specCount")

    output:
    set sample_id, file("${sample_id}_QC_${Correspondence['MS2specCount'][analysis_type]}.json") into ms2_spectral_for_delivery

    script:
    def analysis_id = "${Correspondence['MS2specCount'][analysis_type]}"
    def knime = new Knime(wf:workflowfile, mem:"${task.memory.mega-5000}m", qcml:qcmlfile, qccv:"QC_${analysis_id}", qccvp:"QC_${ontology[analysis_type]}", chksum:checksum, ojid:"${sample_id}")
    knime.launch()
}
I canot understand why this is correctly resolved file("${sample_id}_QC_${Correspondence['MS2specCount'][analysis_type]}.json")
and this not def analysis_id = "${Correspondence['MS2specCount'][analysis_type]}"
i also tried with
def analysis_id = Correspondence['MS2specCount'][analysis_type]
Paolo Di Tommaso
@pditommaso
Oct 10 2018 10:33
no idea, it looks fine
Luca Cozzuto
@lucacozzuto
Oct 10 2018 10:38
I'm doing a test case
Luca Cozzuto
@lucacozzuto
Oct 10 2018 10:47
done
Luca Cozzuto
@lucacozzuto
Oct 10 2018 11:04
yep found the problem. My fault :)
Paolo Di Tommaso
@pditommaso
Oct 10 2018 11:30
something was suggesting me that :)
Luca Cozzuto
@lucacozzuto
Oct 10 2018 11:32
@pditommaso ehehehe
Alexander Peltzer
@apeltzer
Oct 10 2018 11:50

Input:

 set val(name), file(reads) from ch_read_files_complexity_filtering

Output should be the same set of value and filePair ... but next process complains that name is not defined...

output:
    set val(name), file("*pG.fq.gz") into ch_clipped_reads_complexity_filtered, ch_debug_complexity_filtering
    file("*.json") into ch_fastp_for_multiqc
Paolo Di Tommaso
@pditommaso
Oct 10 2018 11:54
use into (ch_clipped_reads_complexity_filtered, ch_debug_complexity_filtering)
Alexander Peltzer
@apeltzer
Oct 10 2018 12:00
Hm, that didn't help : https://github.com/nf-core/eager/blob/9e4b8411fbcebcb814c43d77f8f0a32cd4d77ef9/main.nf#L375 this is the line NXF complains about a missing name variable
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:01
it should be set val(name), file(reads) not file val(name), file(reads)
I have a friend organising NF crash courses .. :joy:
Alexander Peltzer
@apeltzer
Oct 10 2018 12:03
:-P
I should stop working with fever -.-
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:04
take a rest !
Maxime Vallée
@valleem
Oct 10 2018 12:35
@pditommaso yes, do you think you will merge the change or should I use my homemade compilation of the fork?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:37
merged a few minutes ago
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:44
@valleem thanks to find this out
Maxime Vallée
@valleem
Oct 10 2018 12:47
@pditommaso my pleasure! I was not sure at all navigating in all the arborescence of files, and the idea of a lenient check was really hitting my target.
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:47
are you at IARC right ?
Maxime Vallée
@valleem
Oct 10 2018 12:50
yes, converted to NF by @mfoll
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:50
I'm paying him a lot of money to turn people to NF :joy:
kidding apart, say hello to him
Maxime Vallée
@valleem
Oct 10 2018 12:53
I was wondering how he afforded those Porsches he drives around.
:D
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:53
ahaha
Martin Proks
@matq007
Oct 10 2018 13:18
has anyone faced an issue that when I'm missing a file in the input the process will be totally ignored?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:19
it's not an issue, it's a feature :sunglasses:
Martin Proks
@matq007
Oct 10 2018 13:19
soo can I verify it somehow inside the input?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:19
??
Martin Proks
@matq007
Oct 10 2018 13:20
I would the process to execute anyway because I have multiple inputs in the process and if one is missing it's fine with me. In my case the process will be just ignored :|
Martin Proks
@matq007
Oct 10 2018 13:22
you are the best :zap:
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:22
:v:
Luca Cozzuto
@lucacozzuto
Oct 10 2018 13:29
Hi, is it possible to send the content of two files generated from a process to the output channel?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:34
show me a draw ..
Luca Cozzuto
@lucacozzuto
Oct 10 2018 13:37
process parseAfile {

    input:
    file(filename) from files

    output:
    set content("param1.txt"), content("param2.txt") into panelle4paolo

    script:
    """
        parse.py ${filename} > param1.txt
        parse2.py ${filename} > param2.txt
    """
}
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:41
no, use file instead and then
panelle4paolo.map { f1, f2 -> [ f1.text, f2.text ] }
Luca Cozzuto
@lucacozzuto
Oct 10 2018 13:43
thanks! However I think that having a new qualifier that does this can be a nice feature...
I'll add as a new feature request.
Karin Lagesen
@karinlag
Oct 10 2018 13:46
I am trying to understand the difference between merge and join here
not quite sure when I would use one vs the other?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:50
join requires a matching key, it's more like a sql join
merge put together two separate emissions
Karin Lagesen
@karinlag
Oct 10 2018 13:51
that clarifies it, thanks!
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:53
nice
Crabime
@Crabime
Oct 10 2018 15:54
Hello Palo, recently i was annoyed by my cluster whose ceph process occupy too much IO, and SGE jobs dispatched by NF execute really slow, there was a rsem-calculate-expression task which already executed for three days in SGE cluster, but it executes 20 hours in local machine, so can you give me some advice on this question? thanks very much
I found ceph disk operation speed is about 11-12Mb/s, but my task can get just 10k/s, no more. so my process will always be in sleeping state to wait for IO resource
Félix C. Morency
@fmorency
Oct 10 2018 15:59

@Crabime You might want to use

process {
    stageInMode = 'copy'
    scratch = true
}

in your nextflow.config. This will copy the input files locally, execute the job locally, storing all intermediate files in the scratch (defaults to /tmp) and upload the results to the work directory when done.

Crabime
@Crabime
Oct 10 2018 15:59
so whether ceph is a proper tool for SGE?
@fmorency wow! fantastic, i will have a try latter, but here i want to ask another question: the intermediate files will still be copied to another hosts, so will my process still waits for IO resource because it self a heavy IO process? if all tasks work in one host, cluster will be useless, but i still want to use SGE cluster
Félix C. Morency
@fmorency
Oct 10 2018 16:34
Differents tasks will be scheduled on different hosts. Why do you want to use a network storage for intermediate files that won't be required? Only the work directory needs to be on the network share.