These are chat archives for nextflow-io/nextflow

9th
Dec 2016
Johan Viklund
@viklund
Dec 09 2016 07:18
I have a problem that might be related to nextflow-io/nextflow#261
"Task scheduling may hang while checking for job completion"
but I don't have that many files in my task directory nor do I have that log line in my .nextflow.log
All processes have finished, there's nothing in the queue and the tracefile says that everything is COMPLETED
To me it feels like a NFS problem. The resultDir do not contain any output files (even though steps dependent on files that are supposed to be in the resultDIr are run)
And, oh, I have copy for my result files
Version 0.22.6 build 4116
Johan Viklund
@viklund
Dec 09 2016 07:45
and all the work directories look fine to me
Johan Viklund
@viklund
Dec 09 2016 07:53
it might also be interesting that the work directory and the results directory are on different filesystems, both mounted by NFS everywhere though, but it can happen that they exibit different performance and/or problems
(and we do have problems with some of our NFS volumes sometimes as they tend to fill up)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 07:54
but the executions hangs?
Johan Viklund
@viklund
Dec 09 2016 07:54
well, nextflow never exits
it continues to log information about the batch queue in the .nextflow.log
so it's doing something
(I have plenty of n.processor.TaskPollingMonitor - !! executor slurm > tasks to be completed: 0 -- first: null lines in the log)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 07:56
The resultDir do not contain any output files
they are supposed to be copied by a publishDir ?
Johan Viklund
@viklund
Dec 09 2016 07:56
yes
(misremembered the name of the directive)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 07:57
and files should by copied or symlinked ?
Johan Viklund
@viklund
Dec 09 2016 07:57
This doesn't happen always, it's only sometimes, and it seems to be more common when I have short jobs (but that might just be that I run more short jobs)
exactly, that is the behaviour I want
Paolo Di Tommaso
@pditommaso
Dec 09 2016 07:59
wait, what's the publishDir mode? symlink (default) or copy
Johan Viklund
@viklund
Dec 09 2016 07:59
copy
Paolo Di Tommaso
@pditommaso
Dec 09 2016 08:02
could you make a test using the default i.e. symlinks ?
Johan Viklund
@viklund
Dec 09 2016 08:03
Sure, I can do that
I'm doing a different test right now though, but I will do that during the day
Paolo Di Tommaso
@pditommaso
Dec 09 2016 08:04
ok
when you run it add -trace nextflow to create a detailed log
nextflow -trace nextflow run .. etc
Johan Viklund
@viklund
Dec 09 2016 08:05
oh, yes, I also added all the fields to the trace :)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 08:05
great
Allen Kao
@shkao
Dec 09 2016 11:04
Hi guys! Can I ask two questions regarding running nextflow with sge as executor?
  1. Is there a way to change how nextflow generate the .command.run for qsub? I need to change -l slots=4 to -pe serial 4 whenever I assign cpus 4;
  2. Would it be possible to control the number of concurrent jobs while running with sge as executor?
    Thanks in advance!...
Paolo Di Tommaso
@pditommaso
Dec 09 2016 11:08
About 1, I think you need to specify penv 'serial'
About 2 set the queue size by using -qs n on the command line
Allen Kao
@shkao
Dec 09 2016 11:11
Got it, thanks a lot! :+1:
Paolo Di Tommaso
@pditommaso
Dec 09 2016 11:13
Not 100% sure about the first, let me know if doesn't work
Allen Kao
@shkao
Dec 09 2016 11:16
Yes it works, by setting penv 'serial', it became -pe serial n in .command.run.
Paolo Di Tommaso
@pditommaso
Dec 09 2016 11:17
Awesome
Allen Kao
@shkao
Dec 09 2016 11:17
Thank you!
Paolo Di Tommaso
@pditommaso
Dec 09 2016 11:18
Welcome
Johan Viklund
@viklund
Dec 09 2016 13:34
It looks like all output files are actually created (that was a bug on my part) but the nextflow process never exits
and the trace file lists all processes as COMPLETED
so maybe not NFS after all
I'm trying the symlink option anyway
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:35
can you share the log file ?
Johan Viklund
@viklund
Dec 09 2016 13:35
the .nextflow.log?
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:35
yep
Johan Viklund
@viklund
Dec 09 2016 13:35
just a moment
(I increased the logging frequency from the queue earlier when trying to figure out what was happening)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:38
I need some time to analyze it, if you can re-run with the trace option that would be great
you can -resume for that
Johan Viklund
@viklund
Dec 09 2016 13:39
is that different from enabling tracing in the nextflow.config?
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:39
yes
Johan Viklund
@viklund
Dec 09 2016 13:40
I didn't know that
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:40
the trace in the config produce that jobs executions log
nextflow -trace nextflow run ..
produces a more detailed log NF file
Johan Viklund
@viklund
Dec 09 2016 13:41
that's the deep debugging option then
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:41
yep
Johan Viklund
@viklund
Dec 09 2016 13:41
I wondered where all those log.trace commands went...
but I can't resume since I started a different test after the one above (unless it's possible to resume a specific run)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:42
ok, not a problem
Johan Viklund
@viklund
Dec 09 2016 13:43
but I'll run it again like that
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:43
could you point out which pipeline produced that log ?
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:43
ok, I will let u know
Allen Kao
@shkao
Dec 09 2016 13:44
Sorry @pditommaso! Another question similar to 1 earlier. Is there a way to generate h_vmem=10G instead of virtual_free=10G while using sge as executor and assigning memory 10GB? (virtual_free seems requestable but not consumable here).
Johan Viklund
@viklund
Dec 09 2016 13:45
Oh, didn't realise that nextflow was important in -trace nextflow :)
(I just thought it was a filename)
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:46
@shkao for advanced options you will need to use clusterOptions instead of memory
Allen Kao
@shkao
Dec 09 2016 13:47
I see, can I also assign dynamic variable in the '' for clusterOptions?
Paolo Di Tommaso
@pditommaso
Dec 09 2016 13:47
@viklund nope, that's enabled trace level logging for the specified class/package names
yes, like other directives
but you will need to use SGE native flags ie. virtual_free=.. etc
Allen Kao
@shkao
Dec 09 2016 13:49
OK, thanks again!
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:02
@viklund in the log I see a process named normalize_vcf (possible causing the problem)
but I'm unable to find it here https://github.com/NBISweden/wgs-structvar
?
Johan Viklund
@viklund
Dec 09 2016 14:04
ahh
that was a recent refactoring I did
not pushed
sorry about that
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:04
no pb
Johan Viklund
@viklund
Dec 09 2016 14:04
I'll create a branch for the current state
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:04
ok
that's the current code
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:13
Hi guess the problem is this branch
when workflowSteps does NOT contain normalize
ch_normalize_vcf won't produce any content, thus normalize_vcf will never be executed . .
you should refactor like this
Johan Viklund
@viklund
Dec 09 2016 14:15
I need to close the channel?
Oh
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:15

if ( 'normalize' in workflowSteps ) {
    ch_masked_vcfs_vep.mix( ch_intersections ).set { ch_normalize_vcf }
}
else {
    ch_masked_vcfs_vep.mix( ch_intersections ).set { ch_annotate }
    Channel.empty(). set { ch_normalize_vcf } 

}
or simply
ch_normalize_vcf = Channel.empty()
in place
Channel.empty(). set { ch_normalize_vcf }
Johan Viklund
@viklund
Dec 09 2016 14:20
Thap
Thanks*
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:20
:v:
Johan Viklund
@viklund
Dec 09 2016 14:27
There's no close() op on channels?
Paolo Di Tommaso
@pditommaso
Dec 09 2016 14:28
yes
Félix C. Morency
@fmorency
Dec 09 2016 15:19
im sure you already told me, but is there a .join-ish operator?
Félix C. Morency
@fmorency
Dec 09 2016 15:23
mmm not really. something that would wait for the output of all processes of the same name and emit the resulting list
Paolo Di Tommaso
@pditommaso
Dec 09 2016 15:24
the resulting list of what ?
all outputs ?
Félix C. Morency
@fmorency
Dec 09 2016 15:26
process A {
  ...
  input:
  set id, "bla" from ...

  output:
  "${id}_someFile.txt" into out_a
  ...
}

out_a.join().into{all_out_a}
Paolo Di Tommaso
@pditommaso
Dec 09 2016 15:27
umm
Félix C. Morency
@fmorency
Dec 09 2016 15:27
say I call A with id 0..10. I would like all_out_a to contain [0_someFile.txt, 1_someFile.txt, 2_someFile.txt, ..., 10_someFile.txt]
Oh yea
Paolo Di Tommaso
@pditommaso
Dec 09 2016 15:27
yep
Félix C. Morency
@fmorency
Dec 09 2016 15:27
Testing
would "${id}_someFile.txt" into out_a.toList() work?
Paolo Di Tommaso
@pditommaso
Dec 09 2016 15:30
that's supposed to be an output ?
Félix C. Morency
@fmorency
Dec 09 2016 15:30
yes
Paolo Di Tommaso
@pditommaso
Dec 09 2016 15:30
no
I mean, what would use the result of toList() ?
Félix C. Morency
@fmorency
Dec 09 2016 15:32
ok ill create a new channel
yup, perfect
thanks @pditommaso
Paolo Di Tommaso
@pditommaso
Dec 09 2016 15:34
welcome