These are chat archives for nextflow-io/nextflow

17th
Apr 2018
Paolo Di Tommaso
@pditommaso
Apr 17 2018 06:33
Have you
You have to give that script execute permission and invoke directly ie without using the Rscript command
Vladimir Kiselev
@wikiselev
Apr 17 2018 08:08
great, many thanks, it worked
Pierre Lindenbaum
@lindenb
Apr 17 2018 10:17

is it possible to capture stdout* in a set **? I'd like to create a set [sample-name,bam-path]. My process :

inputBams=[ file("in1.bam"), file("in2.bam") ]


process sample_and_bam {
    tag "sample for ${bam}"
    echo true

    input:
        file bam from inputBams
    output:
        set stdout,bam into sample_bam
    script:

    """
    samtools view -H "${bam}" |\\
        grep '^@RG'  |\\
        tr "\\t" "\\n" |\\
        grep '^SM\\:'  |\\
        cut -d ':' -f 2
    """
    }

sample_bam.subscribe { println "I say..  $it" }

the sample names remain empty.
thanks

Evan Floden
@evanfloden
Apr 17 2018 10:32
First step would be specify that the output set shoul dcontain a value and a file. Try:
 output:
        set val(stdout), file(bam) into sample_bam
Pierre Lindenbaum
@lindenb
Apr 17 2018 10:38
@skptic thanks ! my bad, samtools was not in my path... :-S
tbugfinder
@tbugfinder
Apr 17 2018 15:49
Hi, my software creates dynamic output filenames which are written to stdout around other information. Do you have an recommendation or example how to postprocess this through the stdout channel (if I read the docs properly)?
Paolo Di Tommaso
@pditommaso
Apr 17 2018 15:51
You can use a pure bash snippet, capture the stdout with a redirect and post process the file with usual tools ie sed, grep, etc.
Evan Floden
@evanfloden
Apr 17 2018 15:57
You may also be able use a glob pattern to capture the files themselves and then perform a map on the channel outside the process to post process names etc.
Phil Ewels
@ewels
Apr 17 2018 16:00
Yes, usually something like *.txt works fine, depending on what else is produced
Grafitos
@grafitos
Apr 17 2018 16:25
Hi, what the strategy you use to test process individually ? Is there some best practices/methods already implemented ?
Paolo Di Tommaso
@pditommaso
Apr 17 2018 16:29
The best practice is to test the whole pipeline under different conditions using a CI service
If you want to unit tests your tasks, externalise each command in a script file ie. Bash, Perl, etc
and test it indipendently
Grafitos
@grafitos
Apr 17 2018 16:33
Ok I see, thank you Paolo
Phil Ewels
@ewels
Apr 17 2018 16:37
@wikiselev - course material looks great! I would add with the conda Dockerfile that you should have a conda clean command - it deletes all of the downloaded tar files and cuts about 1/3 to 1/2 of the container filesize.
Paolo Di Tommaso
@pditommaso
Apr 17 2018 16:38
Nice tip
Phil Ewels
@ewels
Apr 17 2018 16:41
I can’t take credit for it, it was @marcelm who suggested it originally..
Paolo Di Tommaso
@pditommaso
Apr 17 2018 16:42
:clap:
Jason Yamada-Hanff
@yamad
Apr 17 2018 18:34
when I use a .groupBy operator on a channel it returns a Map, but I don't understand how Maps are processed through channels. How are the values in a Map delivered as input to a process?
tbugfinder
@tbugfinder
Apr 17 2018 18:53
Will s3 storageencryption also be added for the s3 workdir?
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:39
@jbyars
Jason Yamada-Hanff
@yamad
Apr 17 2018 19:39
ok, for the record, I figured out that groupBy emits a single value with the whole Map. So for now, I've done a messy thing of creating a new channel, and pushing values onto it in the subscribe block of groupBy channel.
grouped = Channel.create()
Channel
  .fromPath('fastq/*')
  .groupBy { file -> (file.fileName =~ /matchypattern/)[0][1] }
  .subscribe {
    it.each { key, value -> grouped << [key, value] }
    grouped.close()
  }
Is there a cleaner way to do this? fromFilePairs looks right but in the real script the files come on an output channel from a process, and fromFilePairs looks like it expects 1) to find files on a normal path, and 2) doesn't provide regex matching
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:42
GroupBy failed to have a real use case, likely it will be deprecated
Jason Yamada-Hanff
@yamad
Apr 17 2018 19:44
yeah, so maybe i've gone at this the wrong way. I'm using bcl2fastq. It takes an output directory as a parameter and then a bunch of files (with names tied to sample names) are dumped in that directory.
namely, 4 gzipped fastqs per sample. I'm trying to aggregate those 4 fastqs and then do sample-wise processing.
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:49
Declare the directory content as output
Then, use a map to transform the file name to pair (sampleId, file)
Then, use groupTuple with size:4
Done
Jason Yamada-Hanff
@yamad
Apr 17 2018 19:58
@pditommaso thanks
Paolo Di Tommaso
@pditommaso
Apr 17 2018 19:59
:+1:
tbugfinder
@tbugfinder
Apr 17 2018 20:36
My aws batch ami includes an efs mount which is passed to docker containers by job definition mount options. How do I enable nextflow jobdefinitions to apply mount options?