These are chat archives for nextflow-io/nextflow

12th
Oct 2015
Sascha Steinbiss
@satta
Oct 12 2015 08:47
hey hey
has anyone seen this before: "Cannot obtain the semaphore to fork operator's body"
I've had a NF run fail out of the blue with this error...
Paolo Di Tommaso
@pditommaso
Oct 12 2015 08:51
@satta Hi, I've seen that rare circumstances. However as far as I remember that should not be the real cause of the problem. Likely the pipeline stopped for another reason (failed job)
Could you share the log file so I can give a look to the stack trace
Sascha Steinbiss
@satta
Oct 12 2015 08:52
let me dig for it, it's a web service's user's run
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:02
um, what's that a single cpu machine ?
Sascha Steinbiss
@satta
Oct 12 2015 09:26
two
but in a VM
since this machine runs both Nextflow and the web interface to accept/prepare/present data I have limited the number of simultaneous nextflow processes to one
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:27
and I guess that is you huge pipeline ?
Sascha Steinbiss
@satta
Oct 12 2015 09:27
yes
I wanted it to run sequentially
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:28
how are you limiting the number of simultaneous nextflow processes to one?
because in the log I'm reading
DEBUG nextflow.Session - Executor pool size: 2
Sascha Steinbiss
@satta
Oct 12 2015 09:29
executor {
    name = 'local'
    queueSize = 1
    pollInterval = '3sec'
}
interesting...
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:29
ah
let me think
I would try the following config
process {
  executor = 'local'
  maxForks = 1 
}
executor {
    pollInterval = '3sec'
}
Sascha Steinbiss
@satta
Oct 12 2015 09:32
ah
ok
out of curiousity, can you briefly explain the difference?
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:33
yes
setting maxForks = 1 each process can run only one task at time
Sascha Steinbiss
@satta
Oct 12 2015 09:34
and queueSize affects only the number of processes, not how many tasks they can start in parallel?
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:35
instead queueSize define the overall number of tasks that can be queued and so executed in parallel
Sascha Steinbiss
@satta
Oct 12 2015 09:35
hm k
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:35
yep
Sascha Steinbiss
@satta
Oct 12 2015 09:36
ok, I changed that
Paolo Di Tommaso
@pditommaso
Oct 12 2015 09:39
eventually you may also try
process {
  executor = 'local'
  maxForks = 1 
}
executor {
    pollInterval = '3sec'
    queueSize = 1 
}
but launching it with the following command line params -pool-size 5
that increased that number of thread in the executor pool to 5
Michael L Heuer
@heuermh
Oct 12 2015 19:27
how much interest might there be in a Nextflow executor based on YARN? I know little of either but now have access to a large YARN cluster
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:11
@heuermh At CRG we do not have Yarn nor a Spark cluster, so I don't have an urgent interest on that however I think it could be interesting integration
Some time ago we spoke about that and I've made some tests in order to support HDFS file system. That would be quite straightforward thanks to this project https://github.com/damiencarol/jsr203-hadoop
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:17
An executor for YARN would not be much different from the Ignite one, but it should be take advantage of the data locality concept provided by HDFS
Michael L Heuer
@heuermh
Oct 12 2015 21:24
I'm looking at YARN docs now and I don't see a general-purpose submit job command, only something to submit a job via a jar file
good lord, I can't keep all this stuff straight. Ignite is like Tachyon http://tachyon-project.org/ ?
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:27
Actually is Tachyon that is like Ignite .. :)
Ignite is what is was GridGain, it's an impressive framework
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:35
Is it YarnClient the API that you are looking at ?
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:37
Actually, there's a Java api that allows to submit jobs
Have a look at this book, chapter 10
Michael L Heuer
@heuermh
Oct 12 2015 21:39
thanks, am looking
Michael L Heuer
@heuermh
Oct 12 2015 21:50
so to run an arbitrary bash script like generated by Nextflow, there'd need to be a YarnClient class with a main that puts the bash script command into the ContainerLaunchContext, then built into a jar and submitted . . . ?
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:51
it looks so
actually I was hoping easier ..
otherwise it could be possible to add to nextflow command that given a script file, creates an uberjar containing the groovy runtime and the compiled script class
and launching it as any other hadoop application
Paolo Di Tommaso
@pditommaso
Oct 12 2015 21:58
doing that the client side should not be required
Michael L Heuer
@heuermh
Oct 12 2015 22:04
idk, maybe the jar containing the YarnClient can always be the same, and the script is loaded via LocalResources . . . ?
Paolo Di Tommaso
@pditommaso
Oct 12 2015 22:05
Actually I don't know, I'm a bit confused by this api
here there are some interesting links
Paolo Di Tommaso
@pditommaso
Oct 12 2015 22:16
this looks nice
Michael L Heuer
@heuermh
Oct 12 2015 22:17
was looking at twill, yeah, still not sure where the System.exec goes :)
Paolo Di Tommaso
@pditommaso
Oct 12 2015 22:17
:)
Michael L Heuer
@heuermh
Oct 12 2015 22:17
then while the interface here makes me want to throw up a little bit (xml & el, ick) maybe some of the underlying code might be useful http://oozie.apache.org/docs/4.2.0/DG_ShellActionExtension.html
Paolo Di Tommaso
@pditommaso
Oct 12 2015 22:24
Here there's a minimal app that executes system command
Michael L Heuer
@heuermh
Oct 12 2015 22:35
good find
Paolo Di Tommaso
@pditommaso
Oct 12 2015 22:37
what it looks weird is that ContainerLaunchContext is able to run a unix command
I was thinking that it was a Java oriented api ..
ok, I need to leave.
bye
Michael L Heuer
@heuermh
Oct 12 2015 22:39
sure, thanks!