Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 19:15

    pditommaso on master

    Remove dsl1 output mode Signed… (compare)

  • 18:11

    pditommaso on master

    Fix tower plugin min nextflow r… Bump nf-tower@1.5.7 Signed-off… Bump nf-wave@0.6.1 Signed-off-… and 2 more (compare)

  • 17:50
    pditommaso commented #2392
  • 17:49
    pditommaso closed #2412
  • 17:49
    pditommaso commented #2412
  • 17:47
    pditommaso closed #2333
  • 17:47
    pditommaso commented #2333
  • 17:45
    pditommaso closed #1843
  • 17:45
    pditommaso commented #1843
  • 17:43
    pditommaso closed #1266
  • 17:43
    pditommaso commented #1266
  • 17:42
    pditommaso closed #2460
  • 17:42
    pditommaso commented #2460
  • 17:40
    pditommaso closed #2999
  • 17:40
    pditommaso commented #2999
  • 16:42

    pditommaso on v22.11.1-edge

    (compare)

  • 15:31
    bentsherman labeled #3443
  • 15:25

    pditommaso on edge-22.11.x

    Fix TowerArchiver resolve envar… Update changelog Signed-off-by… [release 22.11.1-edge] Update t… (compare)

  • 15:19
    bentsherman commented #2893
  • 15:02
    pditommaso synchronize #3445
Stijn van Dongen
@micans
Cool @pditommaso I'll check it out. I think these extra names may be because of the a(b)c optional b process rather than into duplication, but will check for sure.
Paolo Di Tommaso
@pditommaso
ch1 = Channel.fromPath('*.fasta')
ch2 = Channel.fromPath('*.fasta')

process foo {
  input: 
  file x from ch1
  .. 
}

process bar {
  input: 
  file x from ch2
  .. 
}
@micans check it out! I need your feedback to move it on :D
and now :wave: :smile:
Stijn van Dongen
@micans
@madkinsz We link our fastq files in from a process; it gives a lot of control so you can do whatever you want. In our case the starting point is a sample file with IDs so we have explicit control. We then expect to find the fastq files in a directory. e.g. https://github.com/cellgeni/rnaseq/blob/master/main.nf#L269-L292
:wave: @pditommaso ah I thought there were lots of heavy users providing feedback already! I have a shortage of pipelines to do much abstraction. Anyway, will look nevertheless!
Michael Adkins
@madkinsz
I like what you have @micans. We expect to run this process on all the fastqs in a directory. My problem is this:
Oh geez -- edited to remove unformatted code
Sorry used to markdown.
Stijn van Dongen
@micans
(sorry have to run for dinner, will check later!)
Michael Adkins
@madkinsz
fastq_gz = Channel.fromPath('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*.fastq.gz')


process unzip_fastq {
  tag '$fastq_gz'

  input:
    file fastq_gz from fastq_gz

  output:
    file '*.fastq' into fastq_files

  script:
    """
    gunzip -df $fastq_gz
    """
}
Then I want to merge the lanes using a channel such as
fastq_pairs = Channel
    .fromFilePairs('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*_L00[1-4]_R[1-2]_001.fastq'), size: -1)
but I don't know how to make that channel read from the fastq_files channel
Nor do I understand how I would declare fastq_pairs in a process output directive
Stijn van Dongen
@micans
@madkinsz fromPath and fromFilePairs are channel factory methods; they are channel 'sources', you can't use them as connectors. But you can create file pairs yourself -- the following is not a great example, but it does end up with a process spitting out a pair of files -- https://github.com/cellgeni/rnaseq/blob/master/main.nf#L295 . I would try to make a small example using toy files emulating what you want to achieve.
Michael Adkins
@madkinsz
Thanks @micans. I've just ended up pulling from the publishDir, which I don't love but its good enough for now since I'm just trying to explore nextflow.
rithy8
@rithy8
Hello,

I am using Nextflow version 19.05.0-edge build 5097
I want execute process X four time. However, the process only executed once.
Could someone explain what I did wrong? thanks.

```
nextflow.preview.dsl=2

aaa = Channel.from([[9],[1],[7],[5]])

process X{
input:
val b
script:
"""
echo ${b} > hello.txt
"""
}

X(aaa)

Paolo Di Tommaso
@pditommaso
I got
executor >  local (4)
[43/9ff7f3] process > X (1) [100%] 4 of 4
4 of 4 ..
Stijn van Dongen
@micans
#!/bin/bash
set -euo pipefail
nfversion=19.05.0-edge

NXF_VER=$nfversion nextflow run - <<EOC
nextflow.preview.dsl=2
aaa = Channel.from([[9],[1],[7],[5]])
process X {
  input: val b
  script: "echo \${b} > hello.txt"
}

X(aaa)
EOC
same here (pleasant to see that nextflow run - <<EOC works!)
Riccardo Giannico
@giannicorik_twitter

@madkinsz I believe you're searching for this:

Channel.fromFilePairs("${params.infolder}/*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+_L/)[0]}
        .ifEmpty {error "File ${params.infolder} not parsed properly"}
        .set { ch_fastqgz } 

process mergefastq {
    tag ${sample}
    input:
    set val(sample), file (fastqfiles) from ch_fastqgz  
    """
    ls ${sample}_S*_R1_*.fastq.gz | xargs zcat > ${sample}.R1.fastq
    ls ${sample}_S*_R2_*.fastq.gz | xargs zcat > ${sample}.R2.fastq
    """

}

channel ch_fastqgz contains something like this:

[ [sample1 , [sample1_S001_L001_R1_0001.fastq.gz , sample1_S001_L001_R2_0001.fastq.gz ]] , 
[sample2, [sample2_S001_L001_R1_0001.fastq.gz , sample2_S001_L001_R2_0001.fastq.gz] ]]
Riccardo Giannico
@giannicorik_twitter
Ah, you also asked for you example "how to make a channel read from fastq_files channel"
The answer is you need to use the Operators (see here: https://www.nextflow.io/docs/latest/operator.html)
for example, take your fastq list and create a new channel containing only R1 fastqs:
fastq_pairs.filter{ it =~ /_R1_/ }.tap{fastq_R1only}
lauw04
@lauw04
Hello
I runned a nextflow pipe in a remote server (in2p3) and my job was aborted because of the memory I used. It states : " Max vmem = 20.710G
Max rss = 563.410M" so I thought I used max vmem but actually I really used max rss, the real ram memory. But I don't understand the difference between vmem and rss
Anthony Ferrari
@af8
What is the simple syntax in Groovy for creating an empty file ? Equivalent to touch process.complete in Linux. Thanks
Stijn van Dongen
@micans
@af8 stackoverflow suggests
def nf = new File("test.txt")
nf.createNewFile()
Riccardo Giannico
@giannicorik_twitter
@af8 may I ask why you need to use plain groovy to create a file instead of using bash inside a nextflow process? According to nextflow logic, you should write files inside of the "processes" using bash or any other language.
Anthony Ferrari
@af8
Thank you @micans. I was also wondering if there was a more nf-ish way of doing it but this will be great. I was also considering something like
file('test.txt' ).text = ''
@giannicorik_twitter it is to use in the workflow.onComplete method
Shellfishgene
@Shellfishgene
For splitFasta, can I just use the size option or do I have to combine it with by?
Oh, I think I misunderstand the size option...
Any way of splitting the fasta file by bytes, without cutting sequences in half?
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene I'd suggest to use a process.
You can run a tool (or an awk script) to split a fasta in multiple fasta files
Stijn van Dongen
@micans
@AlaaBadredine_twitter I've made another implementation of the A->(B->)->C pattern: https://github.com/micans/nextflow-idioms/blob/master/ab-abc-tap.nf . I think it is the most readable one -- it uses the tap operator (which I noticed in @giannicorik_twitter 's contribution above). The core looks like this:
ch_dummy.flatMap().map { f -> [f.text.trim(), f] }.view()
  .tap { ch_AC }
  .until { !params.includeB }
  .set { ch_AB }

process processB {
  input:  set val(sampleid), file(thefile) from ch_AB
  output: set val(sampleid), file('out.txt') into ch_BC
  script: "(echo 'B process'; cat $thefile; md5sum $thefile) > out.txt"
}

ch_AC.until { params.includeB }.mix(ch_BC).set{ ch_C }
@pditommaso this way there are no extra channel names ...
Shellfishgene
@Shellfishgene
@giannicorik_twitter Will do, thanks
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene :thumbsup: :smile:
Riccardo Giannico
@giannicorik_twitter
@micans seems like you liked the 'tap' trick , uh? :smile: glad to be of any help!
Stijn van Dongen
@micans
:+1:
Yasset Perez-Riverol
@ypriverol
Hi guys, where I can find the documentation for the schema of -param-file
Shellfishgene
@Shellfishgene
Michael Adkins
@madkinsz
@micans Sweet implementation of that pattern. That makes a lot of sense.
@giannicorik_twitter Thanks for the examples! That's helpful.
Does anyone have suggestions for an idiom like:C is a process that collects a set of files and creates some output. A outputs a full set of files to C and once they have all arrived C runs. B also outputs a full set of files to C at another time and C runs completely independently of A's input. C creates output/A/files output/B/files
Stijn van Dongen
@micans
@madkinsz sounds exactly like this: https://github.com/micans/nextflow-idioms/blob/master/collectFile-tuple.nf (with many thanks to @pditommaso as always).
Riccardo Giannico
@giannicorik_twitter

It's kind of complicated, you probably want to combine multiple channels into a single one using some of the nextflow operators ( see here: https://www.nextflow.io/docs/latest/operator.html#combining-operators ) but it's not very clear to me how your 3 channels should merge from your description.

After that it will be something like this:

ch_mergedchannel=  \\some nextflow-foo starting from ch_infiles.collect() , ch_infilesFromA , ch_infilesFromB

process C {
   input:
   file (infiles) from ch_mergedchannel
   output:
   file ("*.files.extensions") into ch_out
   publishDir "output/A" pattern "*.a.txt"
   publishDir "output/B" pattern "*.b.txt"
}
hydriniumh2
@hydriniumh2
So I don't know if anyone else has run into this issue but it seems like running git repos via nextcode run doesn't parse parameters or environmental variables for the config file, but it does for local .nf files
Tobias "Tobi" Schraink
@tobsecret
@hydriniumh2 which github repo?