These are chat archives for nextflow-io/nextflow

17th
Apr 2018
Paolo Di Tommaso
@pditommaso
Apr 17 2018 06:33 UTC
Have you
You have to give that script execute permission and invoke directly ie without using the Rscript command
Vladimir Kiselev
@wikiselev
Apr 17 2018 08:08 UTC
great, many thanks, it worked
Pierre Lindenbaum
@lindenb
Apr 17 2018 10:17 UTC

is it possible to capture stdout* in a set **? I'd like to create a set [sample-name,bam-path]. My process :

inputBams=[ file("in1.bam"), file("in2.bam") ]


process sample_and_bam {
    tag "sample for ${bam}"
    echo true

    input:
        file bam from inputBams
    output:
        set stdout,bam into sample_bam
    script:

    """
    samtools view -H "${bam}" |\\
        grep '^@RG'  |\\
        tr "\\t" "\\n" |\\
        grep '^SM\\:'  |\\
        cut -d ':' -f 2
    """
    }

sample_bam.subscribe { println "I say..  $it" }

the sample names remain empty.
thanks

Evan Floden
@evanfloden
Apr 17 2018 10:32 UTC
First step would be specify that the output set shoul dcontain a value and a file. Try:
 output:
        set val(stdout), file(bam) into sample_bam
Pierre Lindenbaum
@lindenb
Apr 17 2018 10:38 UTC
@skptic thanks ! my bad, samtools was not in my path... :-S
tbugfinder
@tbugfinder
Apr 17 2018 15:49 UTC
Hi, my software creates dynamic output filenames which are written to stdout around other information. Do you have an recommendation or example how to postprocess this through the stdout channel (if I read the docs properly)?
Paolo Di Tommaso
@pditommaso
Apr 17 2018 15:51 UTC
You can use a pure bash snippet, capture the stdout with a redirect and post process the file with usual tools ie sed, grep, etc.
Evan Floden
@evanfloden
Apr 17 2018 15:57 UTC
You may also be able use a glob pattern to capture the files themselves and then perform a map on the channel outside the process to post process names etc.
Phil Ewels
@ewels
Apr 17 2018 16:00 UTC
Yes, usually something like *.txt works fine, depending on what else is produced
Grafitos
@grafitos
Apr 17 2018 16:25 UTC
Hi, what the strategy you use to test process individually ? Is there some best practices/methods already implemented ?
Paolo Di Tommaso
@pditommaso
Apr 17 2018 16:29 UTC
The best practice is to test the whole pipeline under different conditions using a CI service
If you want to unit tests your tasks, externalise each command in a script file ie. Bash, Perl, etc
and test it indipendently
Grafitos
@grafitos
Apr 17 2018 16:33 UTC
Ok I see, thank you Paolo
Phil Ewels
@ewels
Apr 17 2018 16:37 UTC
@wikiselev - course material looks great! I would add with the conda Dockerfile that you should have a conda clean command - it deletes all of the downloaded tar files and cuts about 1/3 to 1/2 of the container filesize.
Paolo Di Tommaso
@pditommaso
Apr 17 2018 16:38 UTC
Nice tip
Phil Ewels
@ewels
Apr 17 2018 16:41 UTC
I can’t take credit for it, it was @marcelm who suggested it originally..
Paolo Di Tommaso
@pditommaso
Apr 17 2018 16:42 UTC
:clap:
Jason Yamada-Hanff
@yamad
Apr 17 2018 18:34 UTC
when I use a .groupBy operator on a channel it returns a Map, but I don't understand how Maps are processed through channels. How are the values in a Map delivered as input to a process?
tbugfinder
@tbugfinder
Apr 17 2018 18:53 UTC
Will s3 storageencryption also be added for the s3 workdir?
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:39 UTC
@jbyars
Jason Yamada-Hanff
@yamad
Apr 17 2018 19:39 UTC
ok, for the record, I figured out that groupBy emits a single value with the whole Map. So for now, I've done a messy thing of creating a new channel, and pushing values onto it in the subscribe block of groupBy channel.
grouped = Channel.create()
Channel
  .fromPath('fastq/*')
  .groupBy { file -> (file.fileName =~ /matchypattern/)[0][1] }
  .subscribe {
    it.each { key, value -> grouped << [key, value] }
    grouped.close()
  }
Is there a cleaner way to do this? fromFilePairs looks right but in the real script the files come on an output channel from a process, and fromFilePairs looks like it expects 1) to find files on a normal path, and 2) doesn't provide regex matching
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:42 UTC
GroupBy failed to have a real use case, likely it will be deprecated
Jason Yamada-Hanff
@yamad
Apr 17 2018 19:44 UTC
yeah, so maybe i've gone at this the wrong way. I'm using bcl2fastq. It takes an output directory as a parameter and then a bunch of files (with names tied to sample names) are dumped in that directory.
namely, 4 gzipped fastqs per sample. I'm trying to aggregate those 4 fastqs and then do sample-wise processing.
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:49 UTC
Declare the directory content as output
Then, use a map to transform the file name to pair (sampleId, file)
Then, use groupTuple with size:4
Done
Jason Yamada-Hanff
@yamad
Apr 17 2018 19:58 UTC
@pditommaso thanks
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:59 UTC
:+1:
tbugfinder
@tbugfinder
Apr 17 2018 20:36 UTC
My aws batch ami includes an efs mount which is passed to docker containers by job definition mount options. How do I enable nextflow jobdefinitions to apply mount options?