These are chat archives for nextflow-io/nextflow

10th
Oct 2018
Rad Suchecki
@rsuchecki
Oct 10 2018 03:17 UTC

Can a label directive value by dynamically assigned from input? In the following example

process foo {
tag("${tool}")
label("${tool}") 
input:
  each tool from tools

the tag directive works fine but adding the label directive causes: ERROR ~ No such variable: tool

Paolo Di Tommaso
@pditommaso
Oct 10 2018 06:56 UTC
@rsuchecki weird, it should work, please report a bug
Maxime Vallée
@valleem
Oct 10 2018 07:01 UTC
Hello all! @pditommaso has this discussion nextflow-io/nextflow#828 led to a commit (besides the documentation) included in the current release (0.32.0)?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 07:07 UTC
yes, cache 'lenient'
Maxime Vallée
@valleem
Oct 10 2018 07:13 UTC
Yes, I saw the discussion, however I was not able to find the commit about it. I was looking for it in the first place because it does not seem to solve my issue (jobs relaunching and they should not) and I was thinking I made a typo. In the discussion you said it is in NXF_VER=0.32.0u3-SNAPSHOT, and I use your nextflow-0.32.0-all on an offline cluster. So I was worried in the release I do not have the cache 'lenient' commit.
Martin Proks
@matq007
Oct 10 2018 07:16 UTC
@mes5k thanks for the tip, I'll try it out
Maxime Vallée
@valleem
Oct 10 2018 07:18 UTC
There (https://github.com/nextflow-io/nextflow/commit/16a7ba36fd6a62ddfc8b4a2e3332c868286c84ac#diff-c197962302397baf3a4cc36463dce5ea) I found that NF went from 0.32.0u1-SNAPSHOT to 0.32.0, and in the discussion previously linked, you were talking about u3. Therefore I was wondering if in the 0.32.0 release, there is the lenient functionality (because I really think this functionality will save my weird resume behavior).
Paolo Di Tommaso
@pditommaso
Oct 10 2018 07:19 UTC
likely I missed to include in the changelog, that feature in the 0.32 release, tho not sure it helps to solve your problem
Maxime Vallée
@valleem
Oct 10 2018 07:25 UTC
I tried cache 'deep' and it works, but it is super slow and it might die in my real world computation for my full cohort. I am trying on a new cluster to see if the problem persists. Nextflow does not throw an error or a warning when we use a non-existing cache directive?
Maxime Vallée
@valleem
Oct 10 2018 07:44 UTC
You worked on this fork https://github.com/nextflow-io/nextflow/compare/issue_828 and I downloaded the source of 0.32.0 and I cannot find your changes by grepping recursively in the subdirectories.
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:01 UTC
Morning
I'm trying to use storeDir but when I resume is still looking at the folder structure inside work
my idea is to remove work folder and re-run using resume. Is this the expected behaviour?
Maxime Vallée
@valleem
Oct 10 2018 08:08 UTC
@lucacozzuto yes resume will look for what has been done in work. Even if you published your data, it will not look where it is published in my experience.
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:14 UTC
@valleem but is storeDir supposed to do this (it is not publishDir)
checking in the log I have this...
[fc/bea420] NOTE: Cannot access folder: '/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/blastdb' -- Error is ignored
I solved by chmod 777 but is this what am I supposed to do, @pditommaso?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 08:23 UTC
not sure to understand the problem
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:25 UTC
me too :P So I used storeDir to store some outputs I would need also after removing the work folder
then I re-run the pipeline with resume
and it complained that cannot access that folder
Paolo Di Tommaso
@pditommaso
Oct 10 2018 08:31 UTC
it doesn't do what you would like, but what it's written here :)
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:35 UTC
The process is executed only if the files declared in the output clause do not exist in the directory specified by the storeDir directive. When the files exist the process execution is skipped and these files are used as the actual process result.
Paolo Di Tommaso
@pditommaso
Oct 10 2018 08:37 UTC
bro, if you think it's an error, open an issue with a test case
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:39 UTC
well actually it does what I want, the problem is why the folder has to be 777
I'll make more tests...
Luca Cozzuto
@lucacozzuto
Oct 10 2018 08:51 UTC
ok so I did another test. It looks like a privileges problems
[warm up] executor > crg
[60/e2dc89] Submitted process > msconvert (44363b73-ffa3-4bd0-b1ad-1cb58b986f5b_QCS1_5ed32d2f96d9671ced7c0d26a93fce8a)
[21/950f45] Submitted process > makeblastdb (BSA)
[f7/d80183] Submitted process > makeblastdb (HELA)
[21/950f45] NOTE: Cannot access folder: '/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/blastdb' -- Error is ignored
[fe/e53739] Submitted process > msconvert (9d9d9d1b-9d9d-4f1a-9d27-9d2f7635059d_QC03_b6f96ae8248181b823f5a720c09445cd)
[f7/d80183] NOTE: Cannot access folder: '/nfs/software/bi/biocore_tools/git/nextflow/Qcloud/blastdb' -- Error is ignored
[c0/1f7fe6] Submitted process > correctMzml (44363b73-ffa3-4bd0-b1ad-1cb58b986f5b_QCS1_5ed32d2f96d9671ced7c0d26a93fce8a)
but the files are copied there correctly
privileges are the same...
Alexander Peltzer
@apeltzer
Oct 10 2018 09:28 UTC
Can I define a channel dependency as input? So if params.whatever is set, take channel Y as input else take channel X ?
Already checked the patterns repository and I thought I saw something like this already but didn't recall the syntax for that
Paolo Di Tommaso
@pditommaso
Oct 10 2018 09:31 UTC
input: 
file x from ( params.whatever ? X : Y )
Alexander Peltzer
@apeltzer
Oct 10 2018 09:50 UTC
Nice
That was what I was looking for :-)
Thanks @pditommaso !
Paolo Di Tommaso
@pditommaso
Oct 10 2018 09:53 UTC
:ok_hand:
Maxime Vallée
@valleem
Oct 10 2018 09:56 UTC
@pditommaso GREAT NEWS! I grabbed the source of the fork with the lenient work, compiled and packed. Tried it, and the cache 'lenient' is completely working as intended!
Paolo Di Tommaso
@pditommaso
Oct 10 2018 10:00 UTC
oh, that means it was not merged .. :shamed:
Rad Suchecki
@rsuchecki
Oct 10 2018 10:25 UTC
Bug reported @pditommaso nextflow-io/nextflow#894
Paolo Di Tommaso
@pditommaso
Oct 10 2018 10:26 UTC
tx
Luca Cozzuto
@lucacozzuto
Oct 10 2018 10:27 UTC
Hi @pditommaso
process calc_MS2_spectral_count {
    publishDir "output/spec_count"
    tag { "${sample_id}-${analysis_type}" }

    input:
    set sample_id, internal_code, analysis_type, checksum, file(qcmlfile) from qcmlfiles_for_MS2_spectral_count.mix(shot_qc4l_cid_qcmlfiles_for_MS2_spectral_count, shot_qc4l_hcd_qcmlfiles_for_MS2_spectral_count, shot_qc4l_etcid_qcmlfiles_for_MS2_spectral_count, shot_qc4l_ethcd_qcmlfiles_for_MS2_spectral_count)
    file(workflowfile) from getWFFile(baseQCPath, "MS2specCount")

    output:
    set sample_id, file("${sample_id}_QC_${Correspondence['MS2specCount'][analysis_type]}.json") into ms2_spectral_for_delivery

    script:
    def analysis_id = "${Correspondence['MS2specCount'][analysis_type]}"
    def knime = new Knime(wf:workflowfile, mem:"${task.memory.mega-5000}m", qcml:qcmlfile, qccv:"QC_${analysis_id}", qccvp:"QC_${ontology[analysis_type]}", chksum:checksum, ojid:"${sample_id}")
    knime.launch()
}
I canot understand why this is correctly resolved file("${sample_id}_QC_${Correspondence['MS2specCount'][analysis_type]}.json")
and this not def analysis_id = "${Correspondence['MS2specCount'][analysis_type]}"
i also tried with
def analysis_id = Correspondence['MS2specCount'][analysis_type]
Paolo Di Tommaso
@pditommaso
Oct 10 2018 10:33 UTC
no idea, it looks fine
Luca Cozzuto
@lucacozzuto
Oct 10 2018 10:38 UTC
I'm doing a test case
Luca Cozzuto
@lucacozzuto
Oct 10 2018 10:47 UTC
done
Luca Cozzuto
@lucacozzuto
Oct 10 2018 11:04 UTC
yep found the problem. My fault :)
Paolo Di Tommaso
@pditommaso
Oct 10 2018 11:30 UTC
something was suggesting me that :)
Luca Cozzuto
@lucacozzuto
Oct 10 2018 11:32 UTC
@pditommaso ehehehe
Alexander Peltzer
@apeltzer
Oct 10 2018 11:50 UTC

Input:

 set val(name), file(reads) from ch_read_files_complexity_filtering

Output should be the same set of value and filePair ... but next process complains that name is not defined...

output:
    set val(name), file("*pG.fq.gz") into ch_clipped_reads_complexity_filtered, ch_debug_complexity_filtering
    file("*.json") into ch_fastp_for_multiqc
Paolo Di Tommaso
@pditommaso
Oct 10 2018 11:54 UTC
use into (ch_clipped_reads_complexity_filtered, ch_debug_complexity_filtering)
Alexander Peltzer
@apeltzer
Oct 10 2018 12:00 UTC
Hm, that didn't help : https://github.com/nf-core/eager/blob/9e4b8411fbcebcb814c43d77f8f0a32cd4d77ef9/main.nf#L375 this is the line NXF complains about a missing name variable
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:01 UTC
it should be set val(name), file(reads) not file val(name), file(reads)
I have a friend organising NF crash courses .. :joy:
Alexander Peltzer
@apeltzer
Oct 10 2018 12:03 UTC
:-P
I should stop working with fever -.-
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:04 UTC
take a rest !
Maxime Vallée
@valleem
Oct 10 2018 12:35 UTC
@pditommaso yes, do you think you will merge the change or should I use my homemade compilation of the fork?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:37 UTC
merged a few minutes ago
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:44 UTC
@valleem thanks to find this out
Maxime Vallée
@valleem
Oct 10 2018 12:47 UTC
@pditommaso my pleasure! I was not sure at all navigating in all the arborescence of files, and the idea of a lenient check was really hitting my target.
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:47 UTC
are you at IARC right ?
Maxime Vallée
@valleem
Oct 10 2018 12:50 UTC
yes, converted to NF by @mfoll
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:50 UTC
I'm paying him a lot of money to turn people to NF :joy:
kidding apart, say hello to him
Maxime Vallée
@valleem
Oct 10 2018 12:53 UTC
I was wondering how he afforded those Porsches he drives around.
:D
Paolo Di Tommaso
@pditommaso
Oct 10 2018 12:53 UTC
ahaha
Martin Proks
@matq007
Oct 10 2018 13:18 UTC
has anyone faced an issue that when I'm missing a file in the input the process will be totally ignored?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:19 UTC
it's not an issue, it's a feature :sunglasses:
Martin Proks
@matq007
Oct 10 2018 13:19 UTC
soo can I verify it somehow inside the input?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:19 UTC
??
Martin Proks
@matq007
Oct 10 2018 13:20 UTC
I would the process to execute anyway because I have multiple inputs in the process and if one is missing it's fine with me. In my case the process will be just ignored :|
Martin Proks
@matq007
Oct 10 2018 13:22 UTC
you are the best :zap:
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:22 UTC
:v:
Luca Cozzuto
@lucacozzuto
Oct 10 2018 13:29 UTC
Hi, is it possible to send the content of two files generated from a process to the output channel?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:34 UTC
show me a draw ..
Luca Cozzuto
@lucacozzuto
Oct 10 2018 13:37 UTC
process parseAfile {

    input:
    file(filename) from files

    output:
    set content("param1.txt"), content("param2.txt") into panelle4paolo

    script:
    """
        parse.py ${filename} > param1.txt
        parse2.py ${filename} > param2.txt
    """
}
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:41 UTC
no, use file instead and then
panelle4paolo.map { f1, f2 -> [ f1.text, f2.text ] }
Luca Cozzuto
@lucacozzuto
Oct 10 2018 13:43 UTC
thanks! However I think that having a new qualifier that does this can be a nice feature...
I'll add as a new feature request.
Karin Lagesen
@karinlag
Oct 10 2018 13:46 UTC
I am trying to understand the difference between merge and join here
not quite sure when I would use one vs the other?
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:50 UTC
join requires a matching key, it's more like a sql join
merge put together two separate emissions
Karin Lagesen
@karinlag
Oct 10 2018 13:51 UTC
that clarifies it, thanks!
Paolo Di Tommaso
@pditommaso
Oct 10 2018 13:53 UTC
nice
Crabime
@Crabime
Oct 10 2018 15:54 UTC
Hello Palo, recently i was annoyed by my cluster whose ceph process occupy too much IO, and SGE jobs dispatched by NF execute really slow, there was a rsem-calculate-expression task which already executed for three days in SGE cluster, but it executes 20 hours in local machine, so can you give me some advice on this question? thanks very much
I found ceph disk operation speed is about 11-12Mb/s, but my task can get just 10k/s, no more. so my process will always be in sleeping state to wait for IO resource
Félix C. Morency
@fmorency
Oct 10 2018 15:59 UTC

@Crabime You might want to use

process {
    stageInMode = 'copy'
    scratch = true
}

in your nextflow.config. This will copy the input files locally, execute the job locally, storing all intermediate files in the scratch (defaults to /tmp) and upload the results to the work directory when done.

Crabime
@Crabime
Oct 10 2018 15:59 UTC
so whether ceph is a proper tool for SGE?
@fmorency wow! fantastic, i will have a try latter, but here i want to ask another question: the intermediate files will still be copied to another hosts, so will my process still waits for IO resource because it self a heavy IO process? if all tasks work in one host, cluster will be useless, but i still want to use SGE cluster
Félix C. Morency
@fmorency
Oct 10 2018 16:34 UTC
Differents tasks will be scheduled on different hosts. Why do you want to use a network storage for intermediate files that won't be required? Only the work directory needs to be on the network share.