These are chat archives for nextflow-io/nextflow

28th
Feb 2019
lastwon1216
@lastwon1216
Feb 28 00:53
is it possible to run nextflow with multiple processes on same node?
using sge as executor
Alexander Peltzer
@apeltzer
Feb 28 05:09
Yes
lastwon1216
@lastwon1216
Feb 28 05:58
@apeltzer could you please tell me how you run it?
Alexander Peltzer
@apeltzer
Feb 28 06:16
Well we don’t have a shared memory process - just as you would submit normal jobs
lastwon1216
@lastwon1216
Feb 28 07:22
@apeltzer i see, how do you run multiple processes on same node??
Paolo Di Tommaso
@pditommaso
Feb 28 07:37
job allocation is delegated to the batch scheduler you are using
(provided you are using one)
Alexander Peltzer
@apeltzer
Feb 28 08:18
^^ As Paolo already mentioned it
Tim Dudgeon
@tdudgeon
Feb 28 08:49
@pditommaso What is the expected behaviour when submitting mulitple workflows using the ignite executor?
I'm finding that if I submit a single workflow it completes fine, but while that is running if I start a second identical but independent workflow from the same head node I find that it fails to run. The initial process appears to run and get executed (e.g. the result files are created) but the nextflow process does not seem to complete and move on to the next steps.
Jonathan Manning
@pinin4fjords
Feb 28 09:25
Is it possible to dynamically load configuration produced by a process (e.g. things in the params scope) into the global namespace? Or is that just plain dirty?
Paolo Di Tommaso
@pditommaso
Feb 28 10:12
@tdudgeon there's no resource accounting, therefore each daemon will try to use all cpu/mem.
also you should provide an unique cluster seed to avoid different instances mix together
Daniel E Cook
@danielecook
Feb 28 10:16
I am going to be giving a brief tutoria
Whoops.
Paolo Di Tommaso
@pditommaso
Feb 28 10:16
wrong channel? :D
Daniel E Cook
@danielecook
Feb 28 10:17
Actually right channel but did not mean to send that yet. I will follow up soon!
Paolo Di Tommaso
@pditommaso
Feb 28 10:18
ahah, can't wait now! :satisfied:
Daniel E Cook
@danielecook
Feb 28 10:42
Ok! As I was going to say but figured I'd wait until I got into work... I'm going to be giving a tutorial to my colleagues on Nextflow. Does anyone have any teaching slides they would care to share? Also, is there a central location for that kind of thing (e.g. dropbox folder)?
Paolo Di Tommaso
@pditommaso
Feb 28 10:43
ahah nice
umm, no there's no central location but there's some material in the github repos for the event we organised in the past
I know already that the first question is going to be: are there modules in NF ?
worth mentioning this nextflow-io/nextflow#984
Daniel E Cook
@danielecook
Feb 28 10:46
Yeah I saw that looking forward to when it is ready!
Paolo Di Tommaso
@pditommaso
Feb 28 10:46
cool
just to be sure :)
Daniel E Cook
@danielecook
Feb 28 10:47
Will there be another hackathon this fall?
I know there is the NF-core one in April
I can't make that unfortunately
Paolo Di Tommaso
@pditommaso
Feb 28 10:47
yes, course 17-18 and the main event on 19-20 Sept
we are going to put it out in a couple of weeks
Daniel E Cook
@danielecook
Feb 28 10:48
Great! Thanks
Paolo Di Tommaso
@pditommaso
Feb 28 10:49
:+1:
@KevinSayers you can't miss this year ;)
Kevin Sayers
@KevinSayers
Feb 28 12:19
@pditommaso :thumbsup: will try my best
Tim Dudgeon
@tdudgeon
Feb 28 12:35

@pditommaso I'm not sure I fully understand. You say that "each daemon will try to use all cpu/mem". By daemon I assume you mean the nodes where the nextflow background process is running (using the nextflow node -bg ... option). In which case, good, I want it to use all available cpu/mem.
The issue I'm having is that if I have 2 workflows to execute from the head node the if I submit one, wait for it to finish, then submit the second it works fine. And the daemon nodes process all the tasks OK.
But if I start the second workflow while the first one is running then it starts the first process, it generates the output files, but that process never finishes and hands over to the next process in the workflow. Even after the first workflow has completely finished. The nextflow log file contains this:

Feb-28 12:29:28.852 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor ignite > tasks to be completed: 1 -- pending tasks are shown below
~> TaskHandler[id: 1; name: sdsplit; status: SUBMITTED; exit: -; error: -; workDir:

The cluster seed you mention refers to starting the daemon processes?

Michael L Heuer
@heuermh
Feb 28 13:49
And I imagine there will be some folks from here at the BOSC hackathon, 26-27 July?
Alexander Peltzer
@apeltzer
Feb 28 13:52
Certainly me
Ido Tamir
@idot
Feb 28 14:02
hello! I tried to use a module that loads conda and then use anaconda in the same process. It did not work.
withName: 'trim' {
    module = [ 'anaconda2/5.1.0' ]
    conda = ['cutadapt=1.18-1']
  }

 bash: conda: command not found
What is the best way to load this module and then use conda in the process?
micans
@micans
Feb 28 15:50
I have a pipeline that was rendered very poorly with -with-dag, to my suprise (https://micans.org/kade/guitar.png). Our bigger pipeline was rendered very nicely (https://github.com/cellgeni/rnaseq - bottom), but this one looks bruised and beaten (with bits missing). Anything buttons that I can push?
Oleksandr Moskalenko
@moskalenko
Feb 28 15:57
Hi. What's the status of CWL support in Nextflow? The only things that seem to come up from a search are a July 2017 blog post about cwl2nxf and the github page for the cwl2nxf tool, which says it's no longer developed.
Paolo Di Tommaso
@pditommaso
Feb 28 16:01
that post was a call for contribution from the CWL community, but has been no reply
Oleksandr Moskalenko
@moskalenko
Feb 28 16:02
Thanks. So, currently there is no support? I'm asking to figure out what to recommend to a user. I usually recommend nextflow for singularity and SLURM support i.e. it runs nicely on our cluster, but a new user asked for something that will run CWL spec, so I started searching.
Paolo Di Tommaso
@pditommaso
Feb 28 16:05
there are plenty of CWL runners, therefore there's no need to have NF to be yet another one
but frankly I have no suggestion on which one to use
Michael L Heuer
@heuermh
Feb 28 16:15
New user? Perhaps you can talk them out of CWL then ;)
Paolo Di Tommaso
@pditommaso
Feb 28 16:16
ahah, I'm not that bad :joy:
everyone is responsible for their own pains :satisfied:
Oleksandr Moskalenko
@moskalenko
Feb 28 16:26
:)
Paolo Di Tommaso
@pditommaso
Feb 28 16:29
@moskalenko I remember well that you wrote about a SIGBUS error once ?
Oleksandr Moskalenko
@moskalenko
Feb 28 16:44
right
Paolo Di Tommaso
@pditommaso
Feb 28 16:44
solved the problem ?
Oleksandr Moskalenko
@moskalenko
Feb 28 16:44
I think so.
Paolo Di Tommaso
@pditommaso
Feb 28 16:44
what was it ?
Oleksandr Moskalenko
@moskalenko
Feb 28 16:46
I'd need to dig back into the records since it was so long ago. Sorry. I think Nextflow works well at this point. I have multiple customers who are successfully running Nextflow under SLURM on our cluster either with our environment modules or by managing dependencies with Singularity. I'm trying to learn some nextflow to automate some of our internal reference data management workflows.
It's a fantastic tool. Thank you for making it.
Paolo Di Tommaso
@pditommaso
Feb 28 16:47
ok, no pb
just curiosity
lastwon1216
@lastwon1216
Feb 28 16:47

@pditommaso sorry for beginner question, so if my current command line I am using for nextflow is

nextflow run rnaseqpipe.nf --genometype 'Mouse' --reads 'test3.fastq.gz' --genomeDir 'reference_genome' --singleEnd

and in order to run this script on same node would be something like this?

qsub -l h=<node number> nextflow run rnaseqpipe.nf --genometype 'Mouse' --reads 'test3.fastq.gz' --genomeDir 'reference_genome' --singleEnd
Paolo Di Tommaso
@pditommaso
Feb 28 16:47
thanks a lot !
lastwon1216
@lastwon1216
Feb 28 16:52
yes, I was able to look at that page, so added executor 'sge' under process and cpus and mem. But since I am using nextflow run rnaseqpipe.nf to run nextflow, processes within the script runs on different nodes.
Paolo Di Tommaso
@pditommaso
Feb 28 16:53
NF has not special feature to bind the execution to a specific node
what you can do is to add the -l h=<node number> option in the clusterOptions directive for that process
Evan Floden
@evanfloden
Feb 28 16:55
I think the question is more basic. You want something like this maybe?:
nextflow run rnaseqpipe.nf --genometype 'Mouse' --reads 'test3.fastq.gz' --genomeDir 'reference_genome' --singleEnd -process.executor=sge
lastwon1216
@lastwon1216
Feb 28 17:11
ah I think i got it, thank you!
Michael Chimenti
@mchimenti
Feb 28 18:04
Hello NF folks...could anyone please point me to an example of a template SGE submission script for a NF pipeline?
Evan Floden
@evanfloden
Feb 28 18:07
try:
process {
    executor = 'sge'
    queue = 'your_queue'
    memory = '12.GB'
    time = '12h'
}
Michael Chimenti
@mchimenti
Feb 28 18:09
that's the NF config, correct?
I guess I was asking for a what the actual qsub script might look like if you want parallel execution on an SGE cluster
Evan Floden
@evanfloden
Feb 28 18:10
Yeah. That is the part. Nextflow does the qsub itself and automagically parrelilises the jobs.

The best way to handle this is with profiles. So:

profile {
    your_cluster {
        process {
            executor = 'sge'
            queue = 'your_queue'
            memory = '12.GB'
            time = '12h'
        }
    }
}

then:

nextflow run <your_pipeline> -profile your_cluster
Michael Chimenti
@mchimenti
Feb 28 18:13
ah, that's where I was confused. So I never actually run 'qsub' myself. But doesn't this mean that the main NF job actually runs on the head node? Isn't that a "no-no"?
plz forgive my noob questions
Jonathan Manning
@pinin4fjords
Feb 28 18:14
You can qsub the master job, assuming the nodes are allowed to submit jobs
Evan Floden
@evanfloden
Feb 28 18:16
No worries! I run Nextflow on the head and is generally not a problem as it is just orchestrating tasks. But you can also submit it need be.
micans
@micans
Feb 28 18:16
Perhaps it's also possible to get an interactive session on a worker node? (that's what I do usually on LSF executor). Still need nodes to be able to submit
Michael Chimenti
@mchimenti
Feb 28 18:16
well, I'll try it your way and see if HPC complains, haha
@micans, that seems a bit of a hassle every time you want to run a pipeline
Jonathan Manning
@pinin4fjords
Feb 28 18:18
The memory usage can actually get quite high if you have 100s of 1000s of jobs, as I do sometimes. Best to run on a worker node rather than getting in the habit of using the head
micans
@micans
Feb 28 18:18
well I type 'fash' and get a node (fash custom function) ... I can manage that. For robust pipelines we will probably move to more automatic/batch systems, but for developing pipelines I find it quite handy.
Michael Chimenti
@mchimenti
Feb 28 18:20
yeah, well I was doing some dev work on an ATACseq pipeline on the head node, running with a very small toy dataset and I got an "exit status 137" during peak calling, so it doesn't take much to get your job killed for mem consumption on our HPC. That makes me concerned that head node submits for main NF jobs probably will have issues
micans
@micans
Feb 28 18:21
agree with @pinin4fjords ... there is safety on a worker node, both for you and for colleagues depending on head nodes.
I usually ask for 4 or 5 gig memory, although I never really measured anything.
Evan Floden
@evanfloden
Feb 28 18:22
Exit status of a task and returned via NF, or NF died?
Michael Chimenti
@mchimenti
Feb 28 18:24
of a task
Feb-27 16:08:44.925 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process genrich (1) terminated with an error exit status (137)
Evan Floden
@evanfloden
Feb 28 18:26
Ok, well good luck with the tests and drop back here if you have any other questions!
Michael Chimenti
@mchimenti
Feb 28 18:26
yeah, thanks all. I'm sure I'll figure it out, just going through the "noob" learning curve
micans
@micans
Feb 28 18:30
I prefer novice ... we're all novices one way or another. By the way I will soon start using nf-core atacseq on LSF, I may have questions for you :-)
Michael Chimenti
@mchimenti
Feb 28 18:33
thanks @micans, I feel better now :) Well I'm trying to write my own pipeline, but of course I'll help if I can.
micans
@micans
Feb 28 18:34
:+1:
Michael Chimenti
@mchimenti
Feb 28 20:01
Evan, the approach you mentioned above worked for me. A "java" process was created on the head node that used very little CPU, but allocated itself a huge share of virtual memory (37GB, I think). The job completed without being killed.
Tobias "Tobi" Schraink
@tobsecret
Feb 28 20:04
You can limit the amount of memory it allocates:
NXF_OPTS='-Xms512m -Xmx2G' nextflow run
Michael Chimenti
@mchimenti
Feb 28 20:08
:+1: