Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:44
    pditommaso commented #3367
  • 13:42
    marcodelapierre commented #3012
  • 13:35
    pditommaso milestoned #3012
  • 13:35
    pditommaso closed #3012
  • 13:35
    pditommaso commented #3012
  • 13:33
    marcodelapierre commented #3012
  • 13:30
    pditommaso milestoned #3472
  • 13:06
    phue commented #3367
  • 13:05
    phue commented #3367
  • 13:01
    marcodelapierre commented #3012
  • 13:01
    marcodelapierre opened #3478
  • 12:33
    l-modolo commented #3367
  • 11:59
    phue commented #3367
  • 11:56
    phue commented #3367
  • 11:55
    phue commented #3367
  • 11:52
    phue commented #3367
  • 11:43
    l-modolo commented #3367
  • 11:38
    phue commented #3367
  • 11:32
    phue commented #3367
  • 11:09
    pditommaso commented #3012
Stijn van Dongen
@micans
:wave: @pditommaso ah I thought there were lots of heavy users providing feedback already! I have a shortage of pipelines to do much abstraction. Anyway, will look nevertheless!
Michael Adkins
@madkinsz
I like what you have @micans. We expect to run this process on all the fastqs in a directory. My problem is this:
Oh geez -- edited to remove unformatted code
Sorry used to markdown.
Stijn van Dongen
@micans
(sorry have to run for dinner, will check later!)
Michael Adkins
@madkinsz
fastq_gz = Channel.fromPath('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*.fastq.gz')


process unzip_fastq {
  tag '$fastq_gz'

  input:
    file fastq_gz from fastq_gz

  output:
    file '*.fastq' into fastq_files

  script:
    """
    gunzip -df $fastq_gz
    """
}
Then I want to merge the lanes using a channel such as
fastq_pairs = Channel
    .fromFilePairs('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*_L00[1-4]_R[1-2]_001.fastq'), size: -1)
but I don't know how to make that channel read from the fastq_files channel
Nor do I understand how I would declare fastq_pairs in a process output directive
Stijn van Dongen
@micans
@madkinsz fromPath and fromFilePairs are channel factory methods; they are channel 'sources', you can't use them as connectors. But you can create file pairs yourself -- the following is not a great example, but it does end up with a process spitting out a pair of files -- https://github.com/cellgeni/rnaseq/blob/master/main.nf#L295 . I would try to make a small example using toy files emulating what you want to achieve.
Michael Adkins
@madkinsz
Thanks @micans. I've just ended up pulling from the publishDir, which I don't love but its good enough for now since I'm just trying to explore nextflow.
rithy8
@rithy8
Hello,

I am using Nextflow version 19.05.0-edge build 5097
I want execute process X four time. However, the process only executed once.
Could someone explain what I did wrong? thanks.

```
nextflow.preview.dsl=2

aaa = Channel.from([[9],[1],[7],[5]])

process X{
input:
val b
script:
"""
echo ${b} > hello.txt
"""
}

X(aaa)

Paolo Di Tommaso
@pditommaso
I got
executor >  local (4)
[43/9ff7f3] process > X (1) [100%] 4 of 4
4 of 4 ..
Stijn van Dongen
@micans
#!/bin/bash
set -euo pipefail
nfversion=19.05.0-edge

NXF_VER=$nfversion nextflow run - <<EOC
nextflow.preview.dsl=2
aaa = Channel.from([[9],[1],[7],[5]])
process X {
  input: val b
  script: "echo \${b} > hello.txt"
}

X(aaa)
EOC
same here (pleasant to see that nextflow run - <<EOC works!)
Riccardo Giannico
@giannicorik_twitter

@madkinsz I believe you're searching for this:

Channel.fromFilePairs("${params.infolder}/*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+_L/)[0]}
        .ifEmpty {error "File ${params.infolder} not parsed properly"}
        .set { ch_fastqgz } 

process mergefastq {
    tag ${sample}
    input:
    set val(sample), file (fastqfiles) from ch_fastqgz  
    """
    ls ${sample}_S*_R1_*.fastq.gz | xargs zcat > ${sample}.R1.fastq
    ls ${sample}_S*_R2_*.fastq.gz | xargs zcat > ${sample}.R2.fastq
    """

}

channel ch_fastqgz contains something like this:

[ [sample1 , [sample1_S001_L001_R1_0001.fastq.gz , sample1_S001_L001_R2_0001.fastq.gz ]] , 
[sample2, [sample2_S001_L001_R1_0001.fastq.gz , sample2_S001_L001_R2_0001.fastq.gz] ]]
Riccardo Giannico
@giannicorik_twitter
Ah, you also asked for you example "how to make a channel read from fastq_files channel"
The answer is you need to use the Operators (see here: https://www.nextflow.io/docs/latest/operator.html)
for example, take your fastq list and create a new channel containing only R1 fastqs:
fastq_pairs.filter{ it =~ /_R1_/ }.tap{fastq_R1only}
lauw04
@lauw04
Hello
I runned a nextflow pipe in a remote server (in2p3) and my job was aborted because of the memory I used. It states : " Max vmem = 20.710G
Max rss = 563.410M" so I thought I used max vmem but actually I really used max rss, the real ram memory. But I don't understand the difference between vmem and rss
Anthony Ferrari
@af8
What is the simple syntax in Groovy for creating an empty file ? Equivalent to touch process.complete in Linux. Thanks
Stijn van Dongen
@micans
@af8 stackoverflow suggests
def nf = new File("test.txt")
nf.createNewFile()
Riccardo Giannico
@giannicorik_twitter
@af8 may I ask why you need to use plain groovy to create a file instead of using bash inside a nextflow process? According to nextflow logic, you should write files inside of the "processes" using bash or any other language.
Anthony Ferrari
@af8
Thank you @micans. I was also wondering if there was a more nf-ish way of doing it but this will be great. I was also considering something like
file('test.txt' ).text = ''
@giannicorik_twitter it is to use in the workflow.onComplete method
Shellfishgene
@Shellfishgene
For splitFasta, can I just use the size option or do I have to combine it with by?
Oh, I think I misunderstand the size option...
Any way of splitting the fasta file by bytes, without cutting sequences in half?
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene I'd suggest to use a process.
You can run a tool (or an awk script) to split a fasta in multiple fasta files
Stijn van Dongen
@micans
@AlaaBadredine_twitter I've made another implementation of the A->(B->)->C pattern: https://github.com/micans/nextflow-idioms/blob/master/ab-abc-tap.nf . I think it is the most readable one -- it uses the tap operator (which I noticed in @giannicorik_twitter 's contribution above). The core looks like this:
ch_dummy.flatMap().map { f -> [f.text.trim(), f] }.view()
  .tap { ch_AC }
  .until { !params.includeB }
  .set { ch_AB }

process processB {
  input:  set val(sampleid), file(thefile) from ch_AB
  output: set val(sampleid), file('out.txt') into ch_BC
  script: "(echo 'B process'; cat $thefile; md5sum $thefile) > out.txt"
}

ch_AC.until { params.includeB }.mix(ch_BC).set{ ch_C }
@pditommaso this way there are no extra channel names ...
Shellfishgene
@Shellfishgene
@giannicorik_twitter Will do, thanks
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene :thumbsup: :smile:
Riccardo Giannico
@giannicorik_twitter
@micans seems like you liked the 'tap' trick , uh? :smile: glad to be of any help!
Stijn van Dongen
@micans
:+1:
Yasset Perez-Riverol
@ypriverol
Hi guys, where I can find the documentation for the schema of -param-file
Shellfishgene
@Shellfishgene
Michael Adkins
@madkinsz
@micans Sweet implementation of that pattern. That makes a lot of sense.
@giannicorik_twitter Thanks for the examples! That's helpful.
Does anyone have suggestions for an idiom like:C is a process that collects a set of files and creates some output. A outputs a full set of files to C and once they have all arrived C runs. B also outputs a full set of files to C at another time and C runs completely independently of A's input. C creates output/A/files output/B/files
Stijn van Dongen
@micans
@madkinsz sounds exactly like this: https://github.com/micans/nextflow-idioms/blob/master/collectFile-tuple.nf (with many thanks to @pditommaso as always).
Riccardo Giannico
@giannicorik_twitter

It's kind of complicated, you probably want to combine multiple channels into a single one using some of the nextflow operators ( see here: https://www.nextflow.io/docs/latest/operator.html#combining-operators ) but it's not very clear to me how your 3 channels should merge from your description.

After that it will be something like this:

ch_mergedchannel=  \\some nextflow-foo starting from ch_infiles.collect() , ch_infilesFromA , ch_infilesFromB

process C {
   input:
   file (infiles) from ch_mergedchannel
   output:
   file ("*.files.extensions") into ch_out
   publishDir "output/A" pattern "*.a.txt"
   publishDir "output/B" pattern "*.b.txt"
}
hydriniumh2
@hydriniumh2
So I don't know if anyone else has run into this issue but it seems like running git repos via nextcode run doesn't parse parameters or environmental variables for the config file, but it does for local .nf files
Tobias "Tobi" Schraink
@tobsecret
@hydriniumh2 which github repo?
hydriniumh2
@hydriniumh2
Any repo
Riccardo Giannico
@giannicorik_twitter

@madkinsz I think you want to combine A end B channels to create a channel like this

ch_mergedchannel= [ [A , fileA1.txt] , [A , fileA2.txt] , [B , fileB1.txt] , [B , fileB2.txt] ]

so you will have an instance of C with fileA1.txt and all the collected files , a second instance of C with fileA2.txt and all the collected files, and so on... right?

process C {
   input:
   file (infiles) from ch_infiles.collect()
   val (source) file (infiles) from ch_mergedchannel 
   output:
   file ("*.txt") into ch_out
   publishDir "output/${source}" pattern "*.${source}.txt"
}
Tobias "Tobi" Schraink
@tobsecret
@hydriniumh2 please give an example of one that you ran and it didn't work, so we can reproduce
Tobias "Tobi" Schraink
@tobsecret
If that didn't work, all of nf-core wouldn't work and that's a huge concern
Michael Adkins
@madkinsz
@giannicorik_twitter That makes sense to me, but will C wait to run until both A and B have finished instead of running when A is done and then when B is done?