Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:44
    pditommaso commented #3367
  • 13:42
    marcodelapierre commented #3012
  • 13:35
    pditommaso milestoned #3012
  • 13:35
    pditommaso closed #3012
  • 13:35
    pditommaso commented #3012
  • 13:33
    marcodelapierre commented #3012
  • 13:30
    pditommaso milestoned #3472
  • 13:06
    phue commented #3367
  • 13:05
    phue commented #3367
  • 13:01
    marcodelapierre commented #3012
  • 13:01
    marcodelapierre opened #3478
  • 12:33
    l-modolo commented #3367
  • 11:59
    phue commented #3367
  • 11:56
    phue commented #3367
  • 11:55
    phue commented #3367
  • 11:52
    phue commented #3367
  • 11:43
    l-modolo commented #3367
  • 11:38
    phue commented #3367
  • 11:32
    phue commented #3367
  • 11:09
    pditommaso commented #3012
Michael Adkins
@madkinsz
You're right. That should be reasonable, I'll look into that. Thank you.
Paolo Di Tommaso
@pditommaso
@micans the alternative may work (haven't tried), the second proposal it looks to much creative ..
Michael Adkins
@madkinsz
Can you use the channel factory / builder in the input/output parts of a process?
The connection between those two syntactic forms is kind of unclear
Stijn van Dongen
@micans
@pditommaso the alternative works ... (pretty sure, tested it). It's not that creative ... it unleashes huge possibilities :grin: ... I found the need to introduce extra channel names a little bit annoying ... so I was thinking about ways to get an implicit channel into into.
Paolo Di Tommaso
@pditommaso
you can create as many Channel.fromPath('foo*.fastq') as you need
I found the need to introduce extra channel names a little bit annoying
I understand, but dsl-2 won't require anymore to create channel dups
Michael Adkins
@madkinsz
@pditommaso but how do you use that within a process rather than at the head of a .nf file?
Stijn van Dongen
@micans
Cool @pditommaso I'll check it out. I think these extra names may be because of the a(b)c optional b process rather than into duplication, but will check for sure.
Paolo Di Tommaso
@pditommaso
ch1 = Channel.fromPath('*.fasta')
ch2 = Channel.fromPath('*.fasta')

process foo {
  input: 
  file x from ch1
  .. 
}

process bar {
  input: 
  file x from ch2
  .. 
}
@micans check it out! I need your feedback to move it on :D
and now :wave: :smile:
Stijn van Dongen
@micans
@madkinsz We link our fastq files in from a process; it gives a lot of control so you can do whatever you want. In our case the starting point is a sample file with IDs so we have explicit control. We then expect to find the fastq files in a directory. e.g. https://github.com/cellgeni/rnaseq/blob/master/main.nf#L269-L292
:wave: @pditommaso ah I thought there were lots of heavy users providing feedback already! I have a shortage of pipelines to do much abstraction. Anyway, will look nevertheless!
Michael Adkins
@madkinsz
I like what you have @micans. We expect to run this process on all the fastqs in a directory. My problem is this:
Oh geez -- edited to remove unformatted code
Sorry used to markdown.
Stijn van Dongen
@micans
(sorry have to run for dinner, will check later!)
Michael Adkins
@madkinsz
fastq_gz = Channel.fromPath('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*.fastq.gz')


process unzip_fastq {
  tag '$fastq_gz'

  input:
    file fastq_gz from fastq_gz

  output:
    file '*.fastq' into fastq_files

  script:
    """
    gunzip -df $fastq_gz
    """
}
Then I want to merge the lanes using a channel such as
fastq_pairs = Channel
    .fromFilePairs('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*_L00[1-4]_R[1-2]_001.fastq'), size: -1)
but I don't know how to make that channel read from the fastq_files channel
Nor do I understand how I would declare fastq_pairs in a process output directive
Stijn van Dongen
@micans
@madkinsz fromPath and fromFilePairs are channel factory methods; they are channel 'sources', you can't use them as connectors. But you can create file pairs yourself -- the following is not a great example, but it does end up with a process spitting out a pair of files -- https://github.com/cellgeni/rnaseq/blob/master/main.nf#L295 . I would try to make a small example using toy files emulating what you want to achieve.
Michael Adkins
@madkinsz
Thanks @micans. I've just ended up pulling from the publishDir, which I don't love but its good enough for now since I'm just trying to explore nextflow.
rithy8
@rithy8
Hello,

I am using Nextflow version 19.05.0-edge build 5097
I want execute process X four time. However, the process only executed once.
Could someone explain what I did wrong? thanks.

```
nextflow.preview.dsl=2

aaa = Channel.from([[9],[1],[7],[5]])

process X{
input:
val b
script:
"""
echo ${b} > hello.txt
"""
}

X(aaa)

Paolo Di Tommaso
@pditommaso
I got
executor >  local (4)
[43/9ff7f3] process > X (1) [100%] 4 of 4
4 of 4 ..
Stijn van Dongen
@micans
#!/bin/bash
set -euo pipefail
nfversion=19.05.0-edge

NXF_VER=$nfversion nextflow run - <<EOC
nextflow.preview.dsl=2
aaa = Channel.from([[9],[1],[7],[5]])
process X {
  input: val b
  script: "echo \${b} > hello.txt"
}

X(aaa)
EOC
same here (pleasant to see that nextflow run - <<EOC works!)
Riccardo Giannico
@giannicorik_twitter

@madkinsz I believe you're searching for this:

Channel.fromFilePairs("${params.infolder}/*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+_L/)[0]}
        .ifEmpty {error "File ${params.infolder} not parsed properly"}
        .set { ch_fastqgz } 

process mergefastq {
    tag ${sample}
    input:
    set val(sample), file (fastqfiles) from ch_fastqgz  
    """
    ls ${sample}_S*_R1_*.fastq.gz | xargs zcat > ${sample}.R1.fastq
    ls ${sample}_S*_R2_*.fastq.gz | xargs zcat > ${sample}.R2.fastq
    """

}

channel ch_fastqgz contains something like this:

[ [sample1 , [sample1_S001_L001_R1_0001.fastq.gz , sample1_S001_L001_R2_0001.fastq.gz ]] , 
[sample2, [sample2_S001_L001_R1_0001.fastq.gz , sample2_S001_L001_R2_0001.fastq.gz] ]]
Riccardo Giannico
@giannicorik_twitter
Ah, you also asked for you example "how to make a channel read from fastq_files channel"
The answer is you need to use the Operators (see here: https://www.nextflow.io/docs/latest/operator.html)
for example, take your fastq list and create a new channel containing only R1 fastqs:
fastq_pairs.filter{ it =~ /_R1_/ }.tap{fastq_R1only}
lauw04
@lauw04
Hello
I runned a nextflow pipe in a remote server (in2p3) and my job was aborted because of the memory I used. It states : " Max vmem = 20.710G
Max rss = 563.410M" so I thought I used max vmem but actually I really used max rss, the real ram memory. But I don't understand the difference between vmem and rss
Anthony Ferrari
@af8
What is the simple syntax in Groovy for creating an empty file ? Equivalent to touch process.complete in Linux. Thanks
Stijn van Dongen
@micans
@af8 stackoverflow suggests
def nf = new File("test.txt")
nf.createNewFile()
Riccardo Giannico
@giannicorik_twitter
@af8 may I ask why you need to use plain groovy to create a file instead of using bash inside a nextflow process? According to nextflow logic, you should write files inside of the "processes" using bash or any other language.
Anthony Ferrari
@af8
Thank you @micans. I was also wondering if there was a more nf-ish way of doing it but this will be great. I was also considering something like
file('test.txt' ).text = ''
@giannicorik_twitter it is to use in the workflow.onComplete method
Shellfishgene
@Shellfishgene
For splitFasta, can I just use the size option or do I have to combine it with by?
Oh, I think I misunderstand the size option...
Any way of splitting the fasta file by bytes, without cutting sequences in half?
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene I'd suggest to use a process.
You can run a tool (or an awk script) to split a fasta in multiple fasta files
Stijn van Dongen
@micans
@AlaaBadredine_twitter I've made another implementation of the A->(B->)->C pattern: https://github.com/micans/nextflow-idioms/blob/master/ab-abc-tap.nf . I think it is the most readable one -- it uses the tap operator (which I noticed in @giannicorik_twitter 's contribution above). The core looks like this:
ch_dummy.flatMap().map { f -> [f.text.trim(), f] }.view()
  .tap { ch_AC }
  .until { !params.includeB }
  .set { ch_AB }

process processB {
  input:  set val(sampleid), file(thefile) from ch_AB
  output: set val(sampleid), file('out.txt') into ch_BC
  script: "(echo 'B process'; cat $thefile; md5sum $thefile) > out.txt"
}

ch_AC.until { params.includeB }.mix(ch_BC).set{ ch_C }
@pditommaso this way there are no extra channel names ...
Shellfishgene
@Shellfishgene
@giannicorik_twitter Will do, thanks
Riccardo Giannico
@giannicorik_twitter
@Shellfishgene :thumbsup: :smile:
Riccardo Giannico
@giannicorik_twitter
@micans seems like you liked the 'tap' trick , uh? :smile: glad to be of any help!
Stijn van Dongen
@micans
:+1: