Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:44
    pditommaso commented #3367
  • 13:42
    marcodelapierre commented #3012
  • 13:35
    pditommaso milestoned #3012
  • 13:35
    pditommaso closed #3012
  • 13:35
    pditommaso commented #3012
  • 13:33
    marcodelapierre commented #3012
  • 13:30
    pditommaso milestoned #3472
  • 13:06
    phue commented #3367
  • 13:05
    phue commented #3367
  • 13:01
    marcodelapierre commented #3012
  • 13:01
    marcodelapierre opened #3478
  • 12:33
    l-modolo commented #3367
  • 11:59
    phue commented #3367
  • 11:56
    phue commented #3367
  • 11:55
    phue commented #3367
  • 11:52
    phue commented #3367
  • 11:43
    l-modolo commented #3367
  • 11:38
    phue commented #3367
  • 11:32
    phue commented #3367
  • 11:09
    pditommaso commented #3012
Michael Adkins
@madkinsz
Does nextflow attempt to ignore duplicating input files as output files? e.g. I'm getting an error: Missing output file(s) *.fastq expected by process merge_nextseq_lanes when calling a script that renames files in place. There are many .fastq files in the working directory but it cannot find any?
Paolo Di Tommaso
@pditommaso
input file names are not captured by globs
Michael Adkins
@madkinsz
Is there a way to make them captured?
Or is calling a script to combine/rename some of the files bad practice? I want to take a collection, rename a small subset or combine some, then pass all the resulting files as a new channel
Paolo Di Tommaso
@pditommaso
not a good idea, a task should produce its own outputs
Michael Adkins
@madkinsz
Okay, I don't understand how preprocessing tasks are supposed to work then. I have a tool that needs to operate on all of the fastq files but some of the fastqs require preprocessing first.
Paolo Di Tommaso
@pditommaso
you can have the pre-proc task getting some of the fastqs, and another task getting all fastqs + out of the pre-proc
makes sense?
Michael Adkins
@madkinsz
That does make sense but I don't know how to exclude the ones that would be preprocessed from the all fastqs.
Since preprocessing requires all the fastqs to be collected so that some can be merged
Paolo Di Tommaso
@pditommaso
glob pattern? csv file? you should have a criteria to express that
Michael Adkins
@madkinsz
You're right. That should be reasonable, I'll look into that. Thank you.
Paolo Di Tommaso
@pditommaso
@micans the alternative may work (haven't tried), the second proposal it looks to much creative ..
Michael Adkins
@madkinsz
Can you use the channel factory / builder in the input/output parts of a process?
The connection between those two syntactic forms is kind of unclear
Stijn van Dongen
@micans
@pditommaso the alternative works ... (pretty sure, tested it). It's not that creative ... it unleashes huge possibilities :grin: ... I found the need to introduce extra channel names a little bit annoying ... so I was thinking about ways to get an implicit channel into into.
Paolo Di Tommaso
@pditommaso
you can create as many Channel.fromPath('foo*.fastq') as you need
I found the need to introduce extra channel names a little bit annoying
I understand, but dsl-2 won't require anymore to create channel dups
Michael Adkins
@madkinsz
@pditommaso but how do you use that within a process rather than at the head of a .nf file?
Stijn van Dongen
@micans
Cool @pditommaso I'll check it out. I think these extra names may be because of the a(b)c optional b process rather than into duplication, but will check for sure.
Paolo Di Tommaso
@pditommaso
ch1 = Channel.fromPath('*.fasta')
ch2 = Channel.fromPath('*.fasta')

process foo {
  input: 
  file x from ch1
  .. 
}

process bar {
  input: 
  file x from ch2
  .. 
}
@micans check it out! I need your feedback to move it on :D
and now :wave: :smile:
Stijn van Dongen
@micans
@madkinsz We link our fastq files in from a process; it gives a lot of control so you can do whatever you want. In our case the starting point is a sample file with IDs so we have explicit control. We then expect to find the fastq files in a directory. e.g. https://github.com/cellgeni/rnaseq/blob/master/main.nf#L269-L292
:wave: @pditommaso ah I thought there were lots of heavy users providing feedback already! I have a shortage of pipelines to do much abstraction. Anyway, will look nevertheless!
Michael Adkins
@madkinsz
I like what you have @micans. We expect to run this process on all the fastqs in a directory. My problem is this:
Oh geez -- edited to remove unformatted code
Sorry used to markdown.
Stijn van Dongen
@micans
(sorry have to run for dinner, will check later!)
Michael Adkins
@madkinsz
fastq_gz = Channel.fromPath('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*.fastq.gz')


process unzip_fastq {
  tag '$fastq_gz'

  input:
    file fastq_gz from fastq_gz

  output:
    file '*.fastq' into fastq_files

  script:
    """
    gunzip -df $fastq_gz
    """
}
Then I want to merge the lanes using a channel such as
fastq_pairs = Channel
    .fromFilePairs('/cb/boostershot-basic/Data/Intensities/BaseCalls/CBTEST_Project_0091/*_L00[1-4]_R[1-2]_001.fastq'), size: -1)
but I don't know how to make that channel read from the fastq_files channel
Nor do I understand how I would declare fastq_pairs in a process output directive
Stijn van Dongen
@micans
@madkinsz fromPath and fromFilePairs are channel factory methods; they are channel 'sources', you can't use them as connectors. But you can create file pairs yourself -- the following is not a great example, but it does end up with a process spitting out a pair of files -- https://github.com/cellgeni/rnaseq/blob/master/main.nf#L295 . I would try to make a small example using toy files emulating what you want to achieve.
Michael Adkins
@madkinsz
Thanks @micans. I've just ended up pulling from the publishDir, which I don't love but its good enough for now since I'm just trying to explore nextflow.
rithy8
@rithy8
Hello,

I am using Nextflow version 19.05.0-edge build 5097
I want execute process X four time. However, the process only executed once.
Could someone explain what I did wrong? thanks.

```
nextflow.preview.dsl=2

aaa = Channel.from([[9],[1],[7],[5]])

process X{
input:
val b
script:
"""
echo ${b} > hello.txt
"""
}

X(aaa)

Paolo Di Tommaso
@pditommaso
I got
executor >  local (4)
[43/9ff7f3] process > X (1) [100%] 4 of 4
4 of 4 ..
Stijn van Dongen
@micans
#!/bin/bash
set -euo pipefail
nfversion=19.05.0-edge

NXF_VER=$nfversion nextflow run - <<EOC
nextflow.preview.dsl=2
aaa = Channel.from([[9],[1],[7],[5]])
process X {
  input: val b
  script: "echo \${b} > hello.txt"
}

X(aaa)
EOC
same here (pleasant to see that nextflow run - <<EOC works!)
Riccardo Giannico
@giannicorik_twitter

@madkinsz I believe you're searching for this:

Channel.fromFilePairs("${params.infolder}/*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+_L/)[0]}
        .ifEmpty {error "File ${params.infolder} not parsed properly"}
        .set { ch_fastqgz } 

process mergefastq {
    tag ${sample}
    input:
    set val(sample), file (fastqfiles) from ch_fastqgz  
    """
    ls ${sample}_S*_R1_*.fastq.gz | xargs zcat > ${sample}.R1.fastq
    ls ${sample}_S*_R2_*.fastq.gz | xargs zcat > ${sample}.R2.fastq
    """

}

channel ch_fastqgz contains something like this:

[ [sample1 , [sample1_S001_L001_R1_0001.fastq.gz , sample1_S001_L001_R2_0001.fastq.gz ]] , 
[sample2, [sample2_S001_L001_R1_0001.fastq.gz , sample2_S001_L001_R2_0001.fastq.gz] ]]
Riccardo Giannico
@giannicorik_twitter
Ah, you also asked for you example "how to make a channel read from fastq_files channel"
The answer is you need to use the Operators (see here: https://www.nextflow.io/docs/latest/operator.html)
for example, take your fastq list and create a new channel containing only R1 fastqs:
fastq_pairs.filter{ it =~ /_R1_/ }.tap{fastq_R1only}
lauw04
@lauw04
Hello
I runned a nextflow pipe in a remote server (in2p3) and my job was aborted because of the memory I used. It states : " Max vmem = 20.710G
Max rss = 563.410M" so I thought I used max vmem but actually I really used max rss, the real ram memory. But I don't understand the difference between vmem and rss
Anthony Ferrari
@af8
What is the simple syntax in Groovy for creating an empty file ? Equivalent to touch process.complete in Linux. Thanks
Stijn van Dongen
@micans
@af8 stackoverflow suggests
def nf = new File("test.txt")
nf.createNewFile()
Riccardo Giannico
@giannicorik_twitter
@af8 may I ask why you need to use plain groovy to create a file instead of using bash inside a nextflow process? According to nextflow logic, you should write files inside of the "processes" using bash or any other language.
Anthony Ferrari
@af8
Thank you @micans. I was also wondering if there was a more nf-ish way of doing it but this will be great. I was also considering something like
file('test.txt' ).text = ''