These are chat archives for nextflow-io/nextflow

8th
Jun 2017
Karin Lagesen
@karinlag
Jun 08 2017 07:30
....any way to get the slurm job id as a tag for the process...?
Paolo Di Tommaso
@pditommaso
Jun 08 2017 08:11
unfortunately no
that info is only available in the trace report
column native_id
Thanh Lê
@thanhleviet
Jun 08 2017 08:14
hi, I want to use a specific version of java (say java7) within nextflow process, is it correct to config it like this: env.JAVA=“/path/to/java7:$PATH"
Paolo Di Tommaso
@pditommaso
Jun 08 2017 08:15
for tasks executed by NF, not nextflow itself, right ?
Thanh Lê
@thanhleviet
Jun 08 2017 08:15
yes, not by nextflow
Paolo Di Tommaso
@pditommaso
Jun 08 2017 08:16
ok, yes, that works
Thanh Lê
@thanhleviet
Jun 08 2017 08:16
ok, meaning that when I execute a tool : java -jar picard.jar blah blah, it should work?
Paolo Di Tommaso
@pditommaso
Jun 08 2017 08:17
yes, it should
Thanh Lê
@thanhleviet
Jun 08 2017 08:17
great, thanks @pditommaso
Paolo Di Tommaso
@pditommaso
Jun 08 2017 08:17
:+1:
Mamana
@mypandos
Jun 08 2017 13:43
Hi @pditommaso
I have a process where inputs are taken from a list
input:
val 'GROUP_POP' from GROUP_POPS_ALL
Paolo Di Tommaso
@pditommaso
Jun 08 2017 13:44
val need a variable identifier e.g.
input:
val GROUP_POP from GROUP_POPS_ALL
ie. GROUP_POP no 'GROUP_POP'
Mamana
@mypandos
Jun 08 2017 13:56
@pditommaso Ok. Every time the list is changed, the process reruns even for old values. Is that how nextflow works?
Phil Ewels
@ewels
Jun 08 2017 13:57
Yes, GROUP_POPS_ALL is an input - if the input changes (eg. different items in the list), then the process will rerun
Mamana
@mypandos
Jun 08 2017 14:00
Any strategy to force the process to only rerun on new values?
Paolo Di Tommaso
@pditommaso
Jun 08 2017 14:02
how is your process exactly ?
Mamana
@mypandos
Jun 08 2017 14:06
input:
        val GROUP_POP from GROUP_POPS_ALL
output:
        set val(GROUP_POP), file("${GROUP_POP}.vcf.gz"), file("${GROUP_POP}.vcf.gz.tbi") into merge_pop
 script:
        files = file("${params.work_dir}/data/POP_ANN/{${params.GROUP_POPS[GROUP_POP]}}_consensus_snpeff_dbsnp.vcf.gz").join(' ')
        """
        bcftools merge ${files} | bgzip -c > ${GROUP_POP}.vcf.gz
        bcftools index --tbi ${GROUP_POP}.vcf.gz
        """
params.GROUP_POPS_ALL is something like
params.GROUP_POPS_ALL = [ AFR:[ YRI, LWK ], EUR:[ CEU, TSI ]]
Phil Ewels
@ewels
Jun 08 2017 14:12
It would probably be better to create an input channel for the files to be merged instead of defining the filenames inside the script like that
then your desired behaviour of only rerunning for changed inputs will work
you should be able to use pretty similar logic inside a channel factory (I'll leave it to @pditommaso to provide a kickass example, I still have to spend a while playing with trial and error for these)
Paolo Di Tommaso
@pditommaso
Jun 08 2017 14:23
exactly, this not the way that is supposed to be used
a process should receive the data as needed, you should not create file/paths in that way
Mike Smoot
@mes5k
Jun 08 2017 15:59

hi @pditommaso I'm seeing a debug message every 5 minutes in .nextflow.log in a pipeline that I believe should be complete, but isn't:

DEBUG n.processor.TaskPollingMonitor - !! executor slurm > tasks to be completed: 0 -- first: null

Am I right in thinking that there's probably a channel still open?

Mike Smoot
@mes5k
Jun 08 2017 16:15
Actually, I take that back - the pipeline should definitely not be complete, but it's stalled nonetheless and I don't see any errors or exceptions in .nextflow.log. Hmmm.
Paolo Di Tommaso
@pditommaso
Jun 08 2017 18:41
yes, that's a possible issue
Mike Smoot
@mes5k
Jun 08 2017 18:45
Do you know any way of seeing which channels may be open?
Paolo Di Tommaso
@pditommaso
Jun 08 2017 18:46
struggling with that right now
anyhow is this error deterministic ?
Mike Smoot
@mes5k
Jun 08 2017 19:02
It's a big pipeline and a large data set, so I haven't tried again, so I'm not sure how deterministic it is. Incidentally, this is the same pipeline that was giving me problems exhausting the threadpool. In this case I'm executing on a slurm cluster and not locally, though, so I'm not sure the problems are related.
Paolo Di Tommaso
@pditommaso
Jun 08 2017 19:04
in my experience this genre of problem is deterministic, you should be able to replicate locally
Mike Smoot
@mes5k
Jun 08 2017 19:06
Good to know. I'll keep debugging as I'm able and let you know if I find anything interesting. Thanks!
Paolo Di Tommaso
@pditommaso
Jun 08 2017 19:07
once you are able to replicate, you can try to bisect the execution by putting a return and see if it work
a bit rustic but it could help to identify the problem
Mamana
@mypandos
Jun 08 2017 22:19
@pditommaso Will try that. thanks.
amacbride
@amacbride
Jun 08 2017 23:17
Hi @pditommaso , I'm noticing if I launch several nextflow processes quickly, I see file locking errors on the history file. I know that the history command is deprecated, but is there a way to disable it?
(My workaround is to artificially slow the rate at which I'm launching NF processes, but it's not a great solution for me.)
(It's on NFS3, so file locking is already problematic.)