These are chat archives for nextflow-io/nextflow

30th
Nov 2015
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:14
anybody here using nextflow docker support on a remote docker machine/cluster? I run into this issue nextflow-io/nextflow#73
I think nextflow now assumes that you can mount your homefolder into a docker container, which is not the case when you use a remote docker engine using docker machine.
which is a pitty, I think nextflow could be very useful here.
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:16
To use Docker in a remote machine you will need a shared file system
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:16
do you have any recommendations
I was looking into volume sharing on a openstack cluster, but that became a bit of a dead end
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:17
It depends, are you using a grid engine/resource manager ?
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:17
we could do that if we want to :)
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:18
well, but it is a cloud environment or a in premises cluster?
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:19
both, i’m looking into a solution where we are platform agnostic and switch our pipeline between platforms
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:20
ok
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:20
I was looking into this one
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:21
well, there's a debate along these tools
in my opinion they are excellent for services orchestration but almost useless for workflow tasks scheduling
in my opinion the best solution in a local cluster is to use a batch scheduler + a shared file system
for example Linux SLURM + NFS
In the cloud you can use Ec2 instances + EFS (elastic file system, for the lucky people that have access to the beta)
in all cases you will need to install a docker engine in each cluster node
and nextflow can manage the task executions through docker
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:25
yes for now we assume we will start and stop our own instances with docker installed.
unfortunatly no access to EFS
SLURM is new for me, still have to look into that
I couldn’t find a way to do specify custom mount points for docker containers using nextflow, am i missing a piece of documentation?
so when you say services orchestration I think about docker-compose? what do you mean actually with these tools? rexray, netflox or something else
*nextflow
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:28
yes, docker compose, swarm kubernetes, etc
these are tools designed to keep up and run services
I'm not find them useful for workflow executions
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:29
exactly. that is why I look into nextflow
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:30
regarding docker custom mounts are not supported (by design)
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:30
so how do you get your data in?
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:30
nextflow infers the required mounts based the inputs you declared in your processes
so there's no special configuration when using containers
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:31
ok nice. But that assumes you can access the filesystem on the system running the docker engine right?
the nice thing with rexray is that you can directly mount remote block devices or images into the container, without the need for mounting them on the host manually. But that is maybe just a bit more comfy
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:35
Yes, the host should mount the shared file system.
This rexray feature sounds interesting I will give it a look
Currently with nextflow it is possible to specify some extra options to configure the engine or the run command line. Look at runOptions and engineOptions at this link
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:41
The rexray seems cool in principle, however the only storage that I see can be useful (at least for me) is AWS S3.
Anyway when using S3 you will never have an efficient random access, because it is implemented over http
thus is not really different from copy the data from S3 to the node local storage, execute your task and copy back the result to S3.
a model that is already implemented with nextflow without using extra layers
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:44
yeah thing is for us is that we most likely will use openstack and opennebula
where we can fire up our own network storage or use existing block storage solutions
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:45
for which there's a rexray support ?
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:45
but I don’t know how the future will look like, to I prefer to keep that undecided
openstack, you can directly attach network block storage devices to docker containers which is pretty need
using docker volumes plugins
neat*
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:47
that sounds somehow mounting Amazon EBS volumes to docker containers, is that right?
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:47
could be! never done that, let me read about that
Paolo Di Tommaso
@pditommaso
Nov 30 2015 09:49
anyway it could be something that could be integrated into nextflow
You may also be interested to ElastiCluster
Gijs Molenaar
@gijzelaerr
Nov 30 2015 09:59
ah cool, thx, didn’t knew about that one.
it is a bit unfortunate that the biggest cluster we can use runs opennebula, which has minimal support
with most 3rd party tools
Gijs Molenaar
@gijzelaerr
Nov 30 2015 10:06
anway, thanks for your time, i have some new tools and ideas to play with
Paolo Di Tommaso
@pditommaso
Nov 30 2015 10:06
:)
that's good, happy hacking !
Paolo Di Tommaso
@pditommaso
Nov 30 2015 16:11
Nextflow 0.16.3 is out.
Thanks to @robsyme to spot out a nasty issue in the launcher script nextflow-io/nextflow#90.