These are chat archives for nextflow-io/nextflow

10th
Aug 2017
marchoeppner
@marchoeppner
Aug 10 2017 06:55 UTC
Good morning. I was thinking about writing a WGS pipeline - and one thing that would potentially be fun is to split the PE Fastq files into parallel chunks for faster trimming/alignment. But I am not sure how best to go about achieving this. The built-in Channel operator "splitFastq" seems to not be the ideal option (no PE support?) - so some external tool perhaps?
it would have to be reasonably fast so that there is actually going to be a speed gain. I found fastqutils - but that is shocking slow, no point in using it really
Paolo Di Tommaso
@pditommaso
Aug 10 2017 08:38 UTC
you can use two splitFastq to split the read pairs, there was a similar question a few weeks ago
let me see if I find it
see the comment :point_up: June 28, 2017 8:23 AM
however you must guarantee the pair contains the same number of reads
Maxime Garcia
@MaxUlysse
Aug 10 2017 08:59 UTC
@pditommaso Encountered a small strange bug, I made an issue for it
cf #428
Paolo Di Tommaso
@pditommaso
Aug 10 2017 09:21 UTC
weird!
Tobias Neumann
@t-neumann
Aug 10 2017 09:30 UTC

hi, I have this defined array of input files which I want to add as space separated list do some GATK call

knownIndels = ["Mills_and_1000G_gold_standard.indels.hg38.vcf.gz","Homo_sapiens_assembly38.known_indels.vcf.gz"]

further down I wanna make this call

java -jar /biosw/generic-x86_64/gatk/3.11/GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R !{genome} \
    -I !{name}.snt.bam \
    -known !{knownIndels} \
    -o !{name}.target_intervals.list \
    -nt !{params.threads}

Now the knownIndels variable should simply place the two files separated by a space - as it happens with e.g. the fromFilePairs Channel.

What actually happens is that just the array as is is put into the command call (from .command.sh):

java -jar /biosw/generic-x86_64/gatk/3.11/GenomeAnalysisTK.jar     -T RealignerTargetCreator     -R Homo_sapiens_assembly38.fasta     -I test.snt.bam     -known [Mills_and_1000G_gold_standard.indels.hg38.vcf.gz, Homo_sapiens_assembly38.known_indels.vcf.gz]     -o test.target_intervals.list     -nt 10

Anyone knows who to resolve this?

Maxime Garcia
@MaxUlysse
Aug 10 2017 09:34 UTC
@t-neumann try something like known = knownIndels.collect{"-known $it"}.join(' ')
@pditommaso very weird indeed, but always happy to find such bugs
Tobias Neumann
@t-neumann
Aug 10 2017 09:36 UTC
ok that can work. so no way to do this kind of "in place" of unwrapping arrays into space separated files in the script?
Simone Baffelli
@baffelli
Aug 10 2017 09:43 UTC
Morning; is it possible that the when clause in a process definition cannot cope with sets?
I have the following process ( I haven't pasted the shell clause because is too long) :
process stack{

    errorStrategy 'ignore'

    validExitStatus 0, 255, 127//Ignore warning of reference outside

        publishDir "${params.results}/stacked/${stack_name}/${n_stack}/${used_method}"

        input:
            set file(unw_ls:"*.diff"), file(off_ls:"*.off"), val(master_id), val(slave_id), val(bl), val(method) from to_stack
            each ref_pt from ref_pix_stack//repeat it continously becuase ref_pix_stack is only sent once
            each ref_mli from ref_mli_stack
            each n_stack from 5..unw_ls

        when:
            (unw_ls as List).size() < n_stack

        output:
            set file(off_par), file(rate_m), file(sig_rate_m), file(sig_ph), 
            val(stack_id), val(used_method), val(n_stack), val(av_time) into stacked
             set file('rate.bmp'),  file('rate_std.bmp'), file('ph_std.bmp') into rate_ras
        shell:
and I get the following error ERROR ~ No such variable: unw_ls
Simone Baffelli
@baffelli
Aug 10 2017 10:01 UTC
I've tested it, the problem seems to be caused by file(unw_ls:"*.diff")
Simone Baffelli
@baffelli
Aug 10 2017 10:25 UTC
Forget about it, it was anothe rmistake
Shellfishgene
@Shellfishgene
Aug 10 2017 11:37 UTC
I don't understand why this produces the error ERROR ~ No such variable: pair_id for the output line.
Channel
    .fromFilePairs( params.reads )
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
    .set { read_pairs }

process bbduk {
    publishDir "./result/bbduk"

    input:
    set pair_id, file(reads) from read_pairs

    output:
    set pair_id, file "*_R2.clip.fastq.gz" into cleaned_reads

    """
    ~/programs/bbmap/bbduk.sh -Xmx2000m t=6 in1=${reads[0]} in2=${reads[1]} out1=${pair_id}_R1.clip.fastq.gz out2=${pair_id}_R2.clip.fastq.gz ref=~/programs/bbmap/resources/adapters.fa ref=~/programs/bbmap/resources/phix174_ill.ref.fa.gz ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10
    """
}
Simone Baffelli
@baffelli
Aug 10 2017 11:39 UTC
did you try specifying the type of inputs?
Shellfishgene
@Shellfishgene
Aug 10 2017 11:39 UTC
I'm not sure what you mean?
Paolo Di Tommaso
@pditommaso
Aug 10 2017 11:42 UTC
Um, use parenthesis around "*_R2.clip.fastq.gz"
Shellfishgene
@Shellfishgene
Aug 10 2017 11:44 UTC
@pditommaso Is that necessary because it contains wildcards?
Paolo Di Tommaso
@pditommaso
Aug 10 2017 11:52 UTC
No, it's necessary because the parser it's a bit rusty.. actually it's not a real parser
Shellfishgene
@Shellfishgene
Aug 10 2017 11:55 UTC
I copied it from the RNASeq example on the website...
Paolo Di Tommaso
@pditommaso
Aug 10 2017 11:55 UTC
Oh!
But does it work with parenthesis?
Shellfishgene
@Shellfishgene
Aug 10 2017 11:56 UTC
yes!
Paolo Di Tommaso
@pditommaso
Aug 10 2017 11:56 UTC
Could you provide the example w/o?
*the link
Shellfishgene
@Shellfishgene
Aug 10 2017 11:58 UTC
No, it seems I lied, sorry. The file actually isn't there in the example... https://www.nextflow.io/example4.html
Too much copying and pasting around
Paolo Di Tommaso
@pditommaso
Aug 10 2017 11:58 UTC
Ah! :)
Simone Baffelli
@baffelli
Aug 10 2017 12:19 UTC
:)
I'm rusty as well...having problems understanding the beahvior of when
Must be the rain
Simone Baffelli
@baffelli
Aug 10 2017 12:54 UTC
It seems that whenever I use shelll`` and I have some groovy code before the command, that code is executed even when thewhen``` condition for the process is not ture
Paolo Di Tommaso
@pditommaso
Aug 10 2017 12:57 UTC
there's version 0.25.6-SNAPSHOT fixing this
Simone Baffelli
@baffelli
Aug 10 2017 12:58 UTC
Ah, thats a known bug
tried it, does not seem to work
Paolo Di Tommaso
@pditommaso
Aug 10 2017 13:05 UTC
:(
open an issue if so, with a test case to reproduce it
Simone Baffelli
@baffelli
Aug 10 2017 13:06 UTC
let me try again
Simone Baffelli
@baffelli
Aug 10 2017 13:29 UTC
found out, was an error of logic from my side
Maxime Garcia
@MaxUlysse
Aug 10 2017 13:31 UTC
@t-neumann Dunno, maybe something like that could work:
java -jar /biosw/generic-x86_64/gatk/3.11/GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R !{genome} \
    -I !{name}.snt.bam \
    !{knownIndels.collect{"-known $it"}.join(' ')} \
    -o !{name}.target_intervals.list \
    -nt !{params.threads}
Evan Floden
@evanfloden
Aug 10 2017 17:14 UTC

Is there a way to combine the 'retry' and 'finish' in the errorStrategy directive?

Ideally after the final retry, I would like a graceful exit, completing the the running jobs.

Jean-Christophe Houde
@jchoude
Aug 10 2017 17:38 UTC
hi @pditommaso , I have a quick question. I want to bind some variables to be able to print a help message. One of the variables I want to be able to access would be process.publishDir. However, that happens almost at the beginning of my main.nf, and at this point I am not in a process scope, so I can't access it. Is there a work around for this? My colleague ( @fmorency ) seems to remember something, but is now on vacation... Alternatively, if there is a clean way to do so, I would gladly take it :)
Mike Smoot
@mes5k
Aug 10 2017 17:45 UTC
@skptic great idea!
Paolo Di Tommaso
@pditommaso
Aug 10 2017 19:34 UTC
@skptic Yes, you can define something like this errorStrategy { task.attempt < X ? 'retry' : 'finish' }
Evan Floden
@evanfloden
Aug 10 2017 19:34 UTC
Thanks mate! You just made my night
Paolo Di Tommaso
@pditommaso
Aug 10 2017 19:35 UTC
ahaha
not so hot ! :)
@jchoude you can define the path you want to use in a parameter, print it in the help/log, then in the publishDir eg
params.outdir = '/some/path'
println "My out dir: $params.outdir"
: 
process foo {
  publishDir params.outdir
  : 
}