Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:44
    pditommaso commented #3367
  • 13:42
    marcodelapierre commented #3012
  • 13:35
    pditommaso milestoned #3012
  • 13:35
    pditommaso closed #3012
  • 13:35
    pditommaso commented #3012
  • 13:33
    marcodelapierre commented #3012
  • 13:30
    pditommaso milestoned #3472
  • 13:06
    phue commented #3367
  • 13:05
    phue commented #3367
  • 13:01
    marcodelapierre commented #3012
  • 13:01
    marcodelapierre opened #3478
  • 12:33
    l-modolo commented #3367
  • 11:59
    phue commented #3367
  • 11:56
    phue commented #3367
  • 11:55
    phue commented #3367
  • 11:52
    phue commented #3367
  • 11:43
    l-modolo commented #3367
  • 11:38
    phue commented #3367
  • 11:32
    phue commented #3367
  • 11:09
    pditommaso commented #3012
rithy8
@rithy8
Hello,

I am using Nextflow version 19.05.0-edge build 5097
I want execute process X four time. However, the process only executed once.
Could someone explain what I did wrong? thanks.

```
nextflow.preview.dsl=2

aaa = Channel.from([[9],[1],[7],[5]])

process X{
input:
val b
script:
"""
echo ${b} > hello.txt
"""
}

X(aaa)

Paolo Di Tommaso
@pditommaso
I got
executor >  local (4)
[43/9ff7f3] process > X (1) [100%] 4 of 4
4 of 4 ..
Stijn van Dongen
@micans
#!/bin/bash
set -euo pipefail
nfversion=19.05.0-edge

NXF_VER=$nfversion nextflow run - <<EOC
nextflow.preview.dsl=2
aaa = Channel.from([[9],[1],[7],[5]])
process X {
  input: val b
  script: "echo \${b} > hello.txt"
}

X(aaa)
EOC
same here (pleasant to see that nextflow run - <<EOC works!)
Riccardo Giannico
@giannicorik_twitter

@madkinsz I believe you're searching for this:

Channel.fromFilePairs("${params.infolder}/*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+_L/)[0]}
        .ifEmpty {error "File ${params.infolder} not parsed properly"}
        .set { ch_fastqgz } 

process mergefastq {
    tag ${sample}
    input:
    set val(sample), file (fastqfiles) from ch_fastqgz  
    """
    ls ${sample}_S*_R1_*.fastq.gz | xargs zcat > ${sample}.R1.fastq
    ls ${sample}_S*_R2_*.fastq.gz | xargs zcat > ${sample}.R2.fastq
    """

}

channel ch_fastqgz contains something like this:

[ [sample1 , [sample1_S001_L001_R1_0001.fastq.gz , sample1_S001_L001_R2_0001.fastq.gz ]] , 
[sample2, [sample2_S001_L001_R1_0001.fastq.gz , sample2_S001_L001_R2_0001.fastq.gz] ]]
Riccardo Giannico
@giannicorik_twitter
Ah, you also asked for you example "how to make a channel read from fastq_files channel"
The answer is you need to use the Operators (see here: https://www.nextflow.io/docs/latest/operator.html)
for example, take your fastq list and create a new channel containing only R1 fastqs:
fastq_pairs.filter{ it =~ /_R1_/ }.tap{fastq_R1only}
lauw04
@lauw04
Hello
I runned a nextflow pipe in a remote server (in2p3) and my job was aborted because of the memory I used. It states : " Max vmem = 20.710G
Max rss = 563.410M" so I thought I used max vmem but actually I really used max rss, the real ram memory. But I don't understand the difference between vmem and rss
Anthony Ferrari
@af8
What is the simple syntax in Groovy for creating an empty file ? Equivalent to touch process.complete in Linux. Thanks
Stijn van Dongen
@micans
@af8 stackoverflow suggests
def nf = new File("test.txt")
nf.createNewFile()
Riccardo Giannico
@giannicorik_twitter
@af8 may I ask why you need to use plain groovy to create a file instead of using bash inside a nextflow process? According to nextflow logic, you should write files inside of the "processes" using bash or any other language.
Anthony Ferrari
@af8
Thank you @micans. I was also wondering if there was a more nf-ish way of doing it but this will be great. I was also considering something like
file('test.txt' ).text = ''
@giannicorik_twitter it is to use in the workflow.onComplete method
Shellfishgene
@Shellfishgene
For splitFasta, can I just use the size option or do I have to combine it with by?
Oh, I think I misunderstand the size option...
Any way of splitting the fasta file by bytes, without cutting sequences in half?
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene I'd suggest to use a process.
You can run a tool (or an awk script) to split a fasta in multiple fasta files
Stijn van Dongen
@micans
@AlaaBadredine_twitter I've made another implementation of the A->(B->)->C pattern: https://github.com/micans/nextflow-idioms/blob/master/ab-abc-tap.nf . I think it is the most readable one -- it uses the tap operator (which I noticed in @giannicorik_twitter 's contribution above). The core looks like this:
ch_dummy.flatMap().map { f -> [f.text.trim(), f] }.view()
  .tap { ch_AC }
  .until { !params.includeB }
  .set { ch_AB }

process processB {
  input:  set val(sampleid), file(thefile) from ch_AB
  output: set val(sampleid), file('out.txt') into ch_BC
  script: "(echo 'B process'; cat $thefile; md5sum $thefile) > out.txt"
}

ch_AC.until { params.includeB }.mix(ch_BC).set{ ch_C }
@pditommaso this way there are no extra channel names ...
Shellfishgene
@Shellfishgene
@giannicorik_twitter Will do, thanks
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene :thumbsup: :smile:
Riccardo Giannico
@giannicorik_twitter
@micans seems like you liked the 'tap' trick , uh? :smile: glad to be of any help!
Stijn van Dongen
@micans
:+1:
Yasset Perez-Riverol
@ypriverol
Hi guys, where I can find the documentation for the schema of -param-file
Shellfishgene
@Shellfishgene
Michael Adkins
@madkinsz
@micans Sweet implementation of that pattern. That makes a lot of sense.
@giannicorik_twitter Thanks for the examples! That's helpful.
Does anyone have suggestions for an idiom like:C is a process that collects a set of files and creates some output. A outputs a full set of files to C and once they have all arrived C runs. B also outputs a full set of files to C at another time and C runs completely independently of A's input. C creates output/A/files output/B/files
Stijn van Dongen
@micans
@madkinsz sounds exactly like this: https://github.com/micans/nextflow-idioms/blob/master/collectFile-tuple.nf (with many thanks to @pditommaso as always).
Riccardo Giannico
@giannicorik_twitter

It's kind of complicated, you probably want to combine multiple channels into a single one using some of the nextflow operators ( see here: https://www.nextflow.io/docs/latest/operator.html#combining-operators ) but it's not very clear to me how your 3 channels should merge from your description.

After that it will be something like this:

ch_mergedchannel=  \\some nextflow-foo starting from ch_infiles.collect() , ch_infilesFromA , ch_infilesFromB

process C {
   input:
   file (infiles) from ch_mergedchannel
   output:
   file ("*.files.extensions") into ch_out
   publishDir "output/A" pattern "*.a.txt"
   publishDir "output/B" pattern "*.b.txt"
}
hydriniumh2
@hydriniumh2
So I don't know if anyone else has run into this issue but it seems like running git repos via nextcode run doesn't parse parameters or environmental variables for the config file, but it does for local .nf files
Tobias "Tobi" Schraink
@tobsecret
@hydriniumh2 which github repo?
hydriniumh2
@hydriniumh2
Any repo
Riccardo Giannico
@giannicorik_twitter

@madkinsz I think you want to combine A end B channels to create a channel like this

ch_mergedchannel= [ [A , fileA1.txt] , [A , fileA2.txt] , [B , fileB1.txt] , [B , fileB2.txt] ]

so you will have an instance of C with fileA1.txt and all the collected files , a second instance of C with fileA2.txt and all the collected files, and so on... right?

process C {
   input:
   file (infiles) from ch_infiles.collect()
   val (source) file (infiles) from ch_mergedchannel 
   output:
   file ("*.txt") into ch_out
   publishDir "output/${source}" pattern "*.${source}.txt"
}
Tobias "Tobi" Schraink
@tobsecret
@hydriniumh2 please give an example of one that you ran and it didn't work, so we can reproduce
Tobias "Tobi" Schraink
@tobsecret
If that didn't work, all of nf-core wouldn't work and that's a huge concern
Michael Adkins
@madkinsz
@giannicorik_twitter That makes sense to me, but will C wait to run until both A and B have finished instead of running when A is done and then when B is done?
I've been looking at the channel operators, but basically I want to reuse process C multiple times which seems tricky to do.
hydriniumh2
@hydriniumh2
@tobsecret Ah ok its just includeConfig and manifest statements
Tobias "Tobi" Schraink
@tobsecret
Hmmm, that's odd, because even those seem to work for me. I am testing on nf-core/rnaseq rn with the test and prince profiles. @hydriniumh2, and it completes just fine!
Caspar
@caspargross
Is it possible to specify Bind Paths for Singularity containers in the Nextflow config? I tried using the SINGULARITY_BINDPATH environment variable in the config file, but this does not work for me.
hydriniumh2
@hydriniumh2
@tobsecret I'll try and reproduce it in a repo
rithy8
@rithy8
@pditommaso
rithy8
@rithy8

First, thank you.
Second, I got the same result, but unsure what I got.

What I would like to happen is the process run 4 times.
When I check the work dir, I see only one.
Please explain.

executor >  local (4)
[01/994a1d] process > X (1) [100%] 4 of 4 ✔

>>cd work/01/994a1db445b2f324724cbcdf291d18
 ~/dev/nextflow/sandbox/work/01/994a1db445b2f324724cbcdf291d18
>>ls
total 8
-rw-r--r--  1 rithy8  staff  4 Jun  4 08:16 hello.txt
Stijn van Dongen
@micans
@rithy8 run with the option -with-trace trace.txt; this will give you information about all tasks (and work directory) in the file trace.txt. The log only shows you one work directory per process; each task however gets its own work directory.
rithy8
@rithy8
@micans, thank you.
Stijn van Dongen
@micans
@rithy8 yw; depending on your version of NF you can also use -ansi false or -ansi-log false.
Riccardo Giannico
@giannicorik_twitter
@madkinsz Yes, a process waits for all his channels in input to be completely filled before starting.
If you want to run them separately It's quite uggly but I think the easiest solution is to create two C processes.
process C_A {
   input:
   file (infiles) from ch_infiles.collect()
   file (infiles) from ch_infilesFromA
   output:
   file ("*.txt") into ch_outCA
   publishDir "output/A" 
}
process C_B {
   input:
   file (infiles) from ch2_infiles.collect()
   file (infiles) from ch_infilesFromB
   output:
   file ("*.txt") into ch_outCB
   publishDir "output/B" 
}
Michael Adkins
@madkinsz
@giannicorik_twitter Ah yeah I was hoping not to have to do that. Oh well.