These are chat archives for nextflow-io/nextflow

28th
May 2016
Anthony Underwood
@aunderwo
May 28 2016 06:45
Loving what I see about nextflow. We are looking for a platform to scale to running 10,000s of workflows for pathogen genomics. Ideally we'd like to use something that is OpenStack aware since we have a very large OpenStack infrastructure. Can nextflow use Docker with OpenStack. Are there any examples of groups using this kind of set up?
Paolo Di Tommaso
@pditommaso
May 28 2016 07:31
@aunderwo I'm not aware of that however I don't see any problem regarding that. As long Docker is available in any computing node, it works out of the box.
What is important to know is what job scheduler and, even more, what shared storage are you using (or planning to use) in your infrastructure (if any).
Anthony Underwood
@aunderwo
May 28 2016 10:28
@pditommaso we currently have a UGE cluster that connects to a Lustre shared parallel file system. However with a mind to the future we are building a large OpenStack virtual environment. We have already proved that we can have an OpenStack image for a compute nose that we can add to the UGE cluster. In this environment I guess we can add nodes manually on demand and run nextflow docker instances on both raw tin or virtual nodes. I was wondering if anybody has used nextflow with a virtual environment such as OpenStack to instantiate an instance and run a nextflow workflow on this. As you say it's all about scheduling. In a totally virtual environment there needs to be some process to manage instances. In your opinion would instantiating a separate VM for each nextflow workflow be too costly?
@pditommaso would it be more efficient to have VMs running all the time
Anthony Underwood
@aunderwo
May 28 2016 10:36
and then use nextflow to submit docker jobs to these VMs. I've not used docker before and am not aware how you would manage a queue of nextflow 'submissions' when there is a finite infrastructure available. E.g 500 nextflow workflows to run but only enough resource to process 100 at a time. Any advice you can give would be greatly appreciated
Paolo Di Tommaso
@pditommaso
May 28 2016 10:41
nextflow + UGE work pretty well and actually it's the main platform on which it has been developed
Spawning on demand VM nodes it's something for which nextflow doesn't have direct support at this time. But I guess it can be managed indirectly by the UGE scheduler.
It's something that can be done, but it depends on the granularity of the jobs in your workflow.
For short live tasks it's surely not a good idea.
Paolo Di Tommaso
@pditommaso
May 28 2016 10:51
You may be interested in this white paper we have published with Univa
Anthony Underwood
@aunderwo
May 28 2016 17:51
thanks for info. will have a good read of the white paper. Looking for something that will help address our future software and infrastructure architecture needs. Nextflow looks like it may solve some of the our future use cases.
Paolo Di Tommaso
@pditommaso
May 28 2016 17:55
I hope so. Don't hesitate to ask for more info if needed.