These are chat archives for nextflow-io/nextflow

18th
Jan 2018
Jimmy Breen
@jimmybgammyknee
Jan 18 2018 01:27
Hi all, I've just started testing the CRG-CNAG/CalliNGS-NF netxflow pipeline for calling RNA-seq variants and im having some issues with some hanging mapping processes. I was running 8 samples in parallel and they just wont finish and head to the next step.
Heres he nextflow log:
Jan-18 11:31:35.582 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 8 -- first: TaskHandler[id: 5; name: 2_rnaseq_mapping_star (4733_ACAGTG_L006); status: RUNNING; exit: -; workDir: /home/jimmyb/Projects/nextflow_gatk_test/work/83/76de6b684925b76bc20c18bef53cfb] Jan-18 11:36:35.677 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 8 -- first: TaskHandler[id: 5; name: 2_rnaseq_mapping_star (4733_ACAGTG_L006); status: RUNNING; exit: -; workDir: /home/jimmyb/Projects/nextflow_gatk_test/work/83/76de6b684925b76bc20c18bef53cfb] Jan-18 11:41:35.753 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 8 -- first: TaskHandler[id: 5; name: 2_rnaseq_mapping_star (4733_ACAGTG_L006); status: RUNNING; exit: -; workDir: /home/jimmyb/Projects/nextflow_gatk_test/work/83/76de6b684925b76bc20c18bef53cfb] Jan-18 11:46:35.842 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 8 -- first: TaskHandler[id: 5; name: 2_rnaseq_mapping_star (4733_ACAGTG_L006); status: RUNNING; exit: -; workDir: /home/jimmyb/Projects/nextflow_gatk_test/work/83/76de6b684925b76bc20c18bef53cfb] Jan-18 11:51:35.927 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 8 -- first: TaskHandler[id: 5; name: 2_rnaseq_mapping_star (4733_ACAGTG_L006); status: RUNNING; exit: -; workDir: /home/jimmyb/Projects/nextflow_gatk_test/work/83/76de6b684925b76bc20c18bef53cfb] Jan-18 11:56:35.931 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 8 -- first: TaskHandler[id: 5; name: 2_rnaseq_mapping_star (4733_ACAGTG_L006); status: RUNNING; exit: -; workDir: /home/jimmyb/Projects/nextflow_gatk_test/work/83/76de6b684925b76bc20c18bef53cfb]
everything seems to have finished fine but wont stop hanging (sorry if this is the wrong place to be asking this btw)
Mike Smoot
@mes5k
Jan 18 2018 01:30
Do you see the .exitcode file in the work dir for that process?
Jimmy Breen
@jimmybgammyknee
Jan 18 2018 01:36
yeah I do. .exitcode for every process
find . -name ".exitcode" ./work/3c/daa3833540d064b83ed6ebe32accdb/.exitcode ./work/cd/2eddc2fe916766f61b75a5080b03f1/.exitcode ./work/ae/b6485fbe3748c67075177cad2df70c/.exitcode ./work/cb/afcbe8762529d5ee73813f019ea3f8/.exitcode ./work/b6/b0cb5a5a5d636c15e9299a86ead16b/.exitcode ./work/2f/290aab3d16bd473d3f61bcede20a16/.exitcode ./work/aa/6f832a52a0321fe7a34cf0fde63c91/.exitcode ./work/52/f136bd137f25b608098e2e903fe5aa/.exitcode ./work/97/7bb0cc6e6b100326b90a5f1825db9e/.exitcode ./work/0b/5d12b404578e60bf90b84e7d856952/.exitcode ./work/8e/53f6c828758c79f6a44b435816f33f/.exitcode ./work/2b/b284ade02d02a6f16bf0250ff2cd4c/.exitcode ./work/21/8b49d4d8dd8c88d271b08c4dda2ffa/.exitcode ./work/41/9502efa1bff5f7b0ab3283a44d3ccf/.exitcode ./work/c1/13d6cf9da95f7a2756cff27f9118a8/.exitcode ./work/0c/a20f2692734235e33c2ad808706e19/.exitcode ./work/9d/931b585ee4019580a7031a8e42464a/.exitcode
Jimmy Breen
@jimmybgammyknee
Jan 18 2018 01:41
actually i lie, for that particular process i dont
is it possible to coerce the pipeline to kick off the next steps by adding an exitcode to that directory?
Mike Smoot
@mes5k
Jan 18 2018 03:09
Well, if there's no exit code, then I'd guess that process is still running. If you're sure that the process isn't running, then you can just kill the pipeline and restart it with the -resume flag and it should pick up where it left off.
Jimmy Breen
@jimmybgammyknee
Jan 18 2018 03:30
great thanks Mike, will see how it goes
Paolo Di Tommaso
@pditommaso
Jan 18 2018 08:21
that's a very memory and time consuming pipeline, make sure to have enough resources
Mamana
@mypandos
Jan 18 2018 08:25
@pditommaso I am getting this ERROR ~ Cannot cast object 'null' with class 'null' to class 'long'. Try 'java.lang.Long' instead with NF 0.27.0, has someone else experienced this?
Paolo Di Tommaso
@pditommaso
Jan 18 2018 08:26
can you post the full error stack trance in the log file? even better open an issue on GH for that?
Mamana
@mypandos
Jan 18 2018 08:28
ok
Alexander Peltzer
@apeltzer
Jan 18 2018 09:05

I'll ask here and its probably a dumb question but hey ;-)

/*
 * STEP 2 - Map with BWA Mem
 */

process bwamem {
    tag "$name"

    input:
    set val(name), file(reads) from trimmed_reads
    file(bwa_index) from bwa_index

    output:
    set val(name), file("${name}_bwa.sam") into raw_aln_sam
    file '.command.log' into bwa_stdout


    script:
    rg="\'@RG\\tID:${params.run_id}\\tSM:${params.run_id}\\tPL:illumina\'"
    """
    bwa mem \\
    -R $rg \\
    -t ${task.cpus} \\
    -k 2 \\
    $params.gfasta \\
    $reads \\
    > ${name}_bwa.sam
    # Print version number to standard out
    echo "BWA Version:"\$(bwa 2>&1)
    """
}

Problem: The channel contains multiple files (name, FP_Read1, FP_Read2) and should map these with BWAMem. It works fine, but if I have a large cluster, it should ideally take all the input files (say [n] samples), and should map these in parallel, e.g. on different cluster nodes, which does not happen yet. Code for that is here: https://github.com/apeltzer/NGI-ExoSeq/blob/ab510cf8ccc0c4c85fdf19e14a7ff2a3b2f68484/PairedSingleSampleWF.nf#L312

Might be something very easy, but I'm kind of clueless right now
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:08
which does not happen yet
and what happens instead ?
Alexander Peltzer
@apeltzer
Jan 18 2018 09:10
It maps some files with BWAMem once (random order, but I guess that the input channel does not sort input files and thats absolutely fine and okay with us), but not all of them
So if i have two samples, I might have one sample running through the entire pipeline - whereas the second sample runs through FastQC, Trimming and then stops running
Which is unintended
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:11
I think, I found it
this a classic mistake
instead of Channel.fromPath("${params.bwa_index}") use file("${params.bwa_index}")
when Channel.fromPath it creates a multi value channels, hence the process will expected for more than one index ..
Alexander Peltzer
@apeltzer
Jan 18 2018 09:13
Oh my oh my
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:13
but only one file is provide therefore only one bwamem process is excuted
Alexander Peltzer
@apeltzer
Jan 18 2018 09:13
I owe you another beer
Makes absolutely sense
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:14
can't wait for :beers: :)
I need to write a big section about that in the doc
Maxime Garcia
@MaxUlysse
Jan 18 2018 09:17
@apeltzer You seems to owe beers to lots of people ;-)
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:19
LOL
Alexander Peltzer
@apeltzer
Jan 18 2018 09:21
@MaxUlysse Haha ;-) I'm German so thats fine :-P
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:22
invite us at the oktoberfest :satisfied:
Alexander Peltzer
@apeltzer
Jan 18 2018 09:22
:-P
Alexander Peltzer
@apeltzer
Jan 18 2018 09:39
Can you copy the .command.log file with a prefix / name instead of just using the same name into an output channel?
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:39
?
Alexander Peltzer
@apeltzer
Jan 18 2018 09:40
I have a multiple files with same name collision in MultiQC at the end now, so your suggestion fixed all issues ;-)
But some tools produce output on STDOUT, which I copy to a channel for multiqc in the end... renaming them so there is no collision between multiple .command.log files to something like $name.command.log would be good
Its going down the same road as this one here : nextflow-io/nextflow#516
Paolo Di Tommaso
@pditommaso
Jan 18 2018 09:44
(tel)
Alexander Peltzer
@apeltzer
Jan 18 2018 09:46
No worries, I'm using your suggestion from that thread. Will reply if that works ;-)
Simone Baffelli
@baffelli
Jan 18 2018 10:16
Hello
Paolo Di Tommaso
@pditommaso
Jan 18 2018 10:16
welcome back !
:)
Simone Baffelli
@baffelli
Jan 18 2018 10:16
Happy belated 2018
i'm back with my crazy questions
Paolo Di Tommaso
@pditommaso
Jan 18 2018 10:16
long holidays ! ;)
Simone Baffelli
@baffelli
Jan 18 2018 10:16
can I use "errorStrategy" to set the value of an output channel?
not really, busy with conference abstracts
did not have lots of time to write or change my pipelines
Paolo Di Tommaso
@pditommaso
Jan 18 2018 10:17
can I use "errorStrategy" to set the value of an output channel?
would say no
Simone Baffelli
@baffelli
Jan 18 2018 10:18
I want to somehow tag the failing channel values
and send them to a list of files known to be causing failure
that way I can use the list to cleanup our data server a bit
turns out we have plenty of empty images
Paolo Di Tommaso
@pditommaso
Jan 18 2018 10:22
failing processes does not produce any output ..
Simone Baffelli
@baffelli
Jan 18 2018 10:23
then maybe I should make it look like it does not fail
Paolo Di Tommaso
@pditommaso
Jan 18 2018 10:25
well yes, maybe there should be a new feature to handle this
Simone Baffelli
@baffelli
Jan 18 2018 10:25
it would be quite convienent
to allow processes to produce some sort of extra output for example
an "error" channel
that always returns an output no matter whether the process fails or not
that could allow limited "on the fly" pipeline reconfiguration
Simone Baffelli
@baffelli
Jan 18 2018 10:32
can I access the process exitstatus?
from the shell or script directive?
ok I found that out ;)
Phil Ewels
@ewels
Jan 18 2018 13:45
We were talking about the same thing the other day internally.. I suggested using .exitcode as an output and using one of the NF configs (I forget the name) to allow failure exitcodes.
Then you should be able to filter the output channels by the exitcode if they’re grouped
Would be interested to hear if you came up with a different solution!
Paolo Di Tommaso
@pditommaso
Jan 18 2018 13:47
yes it should be possible to output the task existStatus
I need to check how to do it
Paolo Di Tommaso
@pditommaso
Jan 18 2018 13:57
ok, I've found a way ..
Paolo Di Tommaso
@pditommaso
Jan 18 2018 14:10
:v:
Michael L Heuer
@heuermh
Jan 18 2018 14:52
@pditommaso There might be enough folks here at https://www.open-bio.org/wiki/Codefest_Winter_2018 interested in Spark on CWL & WDL to nail something down; I of course am more interested in Nextflow ;)
Paolo Di Tommaso
@pditommaso
Jan 18 2018 14:54
unfortunately I won't join that even
keep the NF flag high ! :)
Michael L Heuer
@heuermh
Jan 18 2018 14:55
Will do!
Albert Vernon Smith
@avsmith
Jan 18 2018 21:34
If I have an associative array, how can I randomly select a subset grouped by one of the items. For example, in the following, how could I select 2 for each of the id?
assocarray = [ ["id":1, "num":1], ["id":1, "num":2], ["id":1, "num":3], ["id":1, "num":4], ["id":1, "num":5], 
    ["id":2, "num":1], ["id":2, "num":2], ["id":2, "num":3], ["id":2, "num":4], ["id":2, "num":5]]
    .groupBy {it.id}
Michael L Heuer
@heuermh
Jan 18 2018 21:56
Completely off-topic, does anyone know how to write a bash completion script?
Chris Fields
@cjfields
Jan 18 2018 22:06
@heuermh I thought I saw you on the codefest gitter