These are chat archives for nextflow-io/nextflow

15th
Mar 2018
Shellfishgene
@Shellfishgene
Mar 15 2018 08:52
I'm still having trouble with picard and temp files. At the beginning of the error report from nf there is a line Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp. Is that from picard tools or nf?
Paolo Di Tommaso
@pditommaso
Mar 15 2018 09:45
@Shellfishgene that's is produced by any JVM instance, it can be either NF or Picard
where is defined that the _JAVA_OPTIONS variable ?
Shellfishgene
@Shellfishgene
Mar 15 2018 09:48
Hmm, it does not appear when I run the picard command just on the console.
It seems to override the setting from the actual command, when I run picard -XX:ParallelGCThreads=4 -Xmx32g -Djava.io.tmpdir=. MarkDuplicates TMP_DIR=. INPUT= etc.. it reports Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp despite me setting it differently in the command. Picard then actually uses /tmp instead of ., but seems to get confused because it then can't find it's own temp files at the end of the run and generates an error.
Paolo Di Tommaso
@pditommaso
Mar 15 2018 09:50
could it be set the computing node env ?
Shellfishgene
@Shellfishgene
Mar 15 2018 09:53
Maybe, I'll have to check. Also I wonder what the picard TMP_DIR option actually does if the java one is used in the end. Or maybe there are even different types of temp files...
Paolo Di Tommaso
@pditommaso
Mar 15 2018 09:54
frankly I don't know. Not a picard guru
Shellfishgene
@Shellfishgene
Mar 15 2018 09:55
Maybe I should just switch to samtools ;)
Paolo Di Tommaso
@pditommaso
Mar 15 2018 09:57
have a look also to sambamba
Alexander Peltzer
@apeltzer
Mar 15 2018 09:57
@Shellfishgene In terms of usability yes - but IIRC, the samtools rmdup method has not been updated for a while ;-) Could use sambamba maybe for some stuff too
Oh Paolo just mentioned it already ^^
broadinstitute/picard#215
Shellfishgene
@Shellfishgene
Mar 15 2018 09:59
Yes, I know samtools rmdup does not do the same thing as picard. Thanks for the sambamba hint, didn't know about that before.
Hmm, but I am setting TMP_DIR
Shellfishgene
@Shellfishgene
Mar 15 2018 10:06
If I run echo $_JAVA_OPTIONS on our cluster I get nothing, no idea where Djava.io.tmpdir is actually set.
Paolo Di Tommaso
@pditommaso
Mar 15 2018 10:06
may it be set in the nextflow config ?
Shellfishgene
@Shellfishgene
Mar 15 2018 10:08
God damn it
It is, right at the top. I copied that from some months ago, apparently I had put that there.
Sorry for wasting your time, I really should use nextflow for everything to really get used to it...
Maxime Garcia
@MaxUlysse
Mar 15 2018 10:17
Making a toy project really helped me with some simple stuff
Shellfishgene
@Shellfishgene
Mar 15 2018 10:24
Sambamba doesn't work for my bam file, too man reference sequences.
Michael L Heuer
@heuermh
Mar 15 2018 13:13
Our markdup isn't as fast as sambamba single node, but scales nicely.
#!/usr/bin/env nextflow

params.srcDir = "$HOME/data"
params.sparkOpts = "--master local[*]"

bamFiles = "${params.srcDir}/**.bam"
bams = Channel.fromPath(bamFiles).map { path -> tuple(path.baseName, path) }

process markdup {
  tag { sample }
  container "quay.io/biocontainers/adam:0.23.0--0"

  input:
    set sample, file(bam) from bams
  output:
    set sample, file("${sample}.mkdup.bam") into markdups

  """
  adam-submit \
    ${params.sparkOpts} \
    -- \
    transformAlignments \
    -single \
    ${bam} \
    ${sample}.mkdup.bam
  """
}

markdups.subscribe{
  println "Transformed ${it.get(0)} alignments into ${it.get(1)} with ADAM mark duplicates."
}
Paolo Di Tommaso
@pditommaso
Mar 15 2018 13:25
:+1:
Michael L Heuer
@heuermh
Mar 15 2018 13:25
Dang it! Can't edit the above any more, but without the -mark_duplicate_reads it ain't gonna mark anything. ;)
Paolo Di Tommaso
@pditommaso
Mar 15 2018 13:26
too late ! :)
Michael L Heuer
@heuermh
Mar 15 2018 13:26
should'a used gist
@pditommaso How about a nextflow runner that runs directly from gitter?
Paolo Di Tommaso
@pditommaso
Mar 15 2018 13:28
:joy:
Michael L Heuer
@heuermh
Mar 15 2018 13:28
All UX problems solved
Phil Ewels
@ewels
Mar 15 2018 14:07
bringing new meaning to the term "push button bioinformatics"..!
Simone Baffelli
@baffelli
Mar 15 2018 16:11
Good afternoon
Again a complicated question...suppose I want to capture multiple output files in a file(pattern) expression. How do I enforce a specific order of the captured files?
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:19
you can't
Simone Baffelli
@baffelli
Mar 15 2018 16:21
Thats a pity
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:21
just a second
Simone Baffelli
@baffelli
Mar 15 2018 16:21
But can I specify several files to be captured and emitted as a set?
I remember we had a similar conversation a while ago but I cant find it
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:23
the point they the are ordered implicitly
Simone Baffelli
@baffelli
Mar 15 2018 16:23
Ok, I guess i found it...with the new version I can pre-define the names in a variable and use it in the output
like that:
process polarimetricAnalysis{

    publishDir './results', pattern: "*.pdf"

    input:
        set val(dateId), val(channelId), val(rxId),
        file(slcPar:"??.slc.par"), file(slc:"??.slc") from sMatrixForPolAnalysis
        each file(roi) from roisForPolAnalysis
        val rlks from params.rlks
        val azlks from params.azlks
    output:
        file(outAnalysis) into polarimetricAnalysisOutput
        set val(dateId), val(rxId), 
        val(outPlotsNames), file(outPlots) into rgbForAnimation
    when:
        rxId == "u"
    shell:
        HH = slc[channelId.findIndexOf({it=="HH"})]
        HV = slc[channelId.findIndexOf({it=="HV"})]
        VH = slc[channelId.findIndexOf({it=="VH"})]
        VV = slc[channelId.findIndexOf({it=="VV"})]
        hPlot = "${dateId}_H.pdf"
        pauliPlot = "${dateId}_pauli.pdf"
        alphaPlot = "${dateId}_alpha.pdf"
        legendPlot = "legend.pdf"
        outPlots = [pauliPlot, hPlot, alphaPlot, legendPlot]
        outPlotsNames = ["pauli", "entropy", "alpha", "legend"]
        '''
        polarimetric_analysis.py scattering !{slcPar[0]} !{HH} !{HV} !{VH} !{VV} !{rlks} !{azlks} !{roi} outAnalysis !{outPlots.join(" ")}
        '''
}
It seems to work
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:26
wow
Simone Baffelli
@baffelli
Mar 15 2018 16:26
what?
surprised by your own software?? :)
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:26
interesting code :)
Simone Baffelli
@baffelli
Mar 15 2018 16:26
bad?
I hope I'm not committing any major groovy/nf blunders
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:29
nono, it's fine in the extend it works
Simone Baffelli
@baffelli
Mar 15 2018 16:30
it does, and I'm trying to make it as neat and clear as possible
perhaps a by idiosyncratic
Caspar
@caspargross
Mar 15 2018 16:32
suppose i have the following set: set id, lr1, l2, [file1, file2, file3 ....] with a variable number of files.
How can i turn this into a channel with distinct objects for each file as so:
set id, lr1, lr2, file1
set id, lr1, lr2, file2
set id, lr1, lr2, file3
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:33
@baffelli has the answer for you :)
Simone Baffelli
@baffelli
Mar 15 2018 16:34
just use the above code and transpose afterwards
I to these kind of things all of the time
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:34
nearly !
Simone Baffelli
@baffelli
Mar 15 2018 16:34
I just tried to guess
Becuase I'm doing exactly this
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:35
@caspargross use the transpose operator (that @baffelli suggested to implement)
Simone Baffelli
@baffelli
Mar 15 2018 16:35
in order to produce a video for each of the output pdf
Caspar
@caspargross
Mar 15 2018 16:35
thanks i will take a look!
Simone Baffelli
@baffelli
Mar 15 2018 16:35
(now fighting with ffmpg)
Caspar
@caspargross
Mar 15 2018 16:38
it works! just added .transpose() to the input of the next process :D Sometimes its easy
Paolo Di Tommaso
@pditommaso
Mar 15 2018 16:39
:v:
Simone Baffelli
@baffelli
Mar 15 2018 16:42
The transpose() operator really is a lifesaver