These are chat archives for nextflow-io/nextflow

11th
May 2018
Radoslaw Suchecki
@bioinforad_twitter
May 11 2018 03:17

Following @pditommaso screencast I am using nextflow to cloud create a cluster on AWS EC2. So far so good, several worker nodes are instantiated along with the master node. No problem ssh-ing into the master node. However, when I run a test ./nextflow run examples/blast.nf -with-docker I get the following error from ignite:

N E X T F L O W  ~  version 0.29.0
Pulling nextflow-io/examples ...
 downloaded from https://github.com/nextflow-io/examples.git
Launching `nextflow-io/examples` [curious_goldberg] - revision: 27afa1c086 [master]
[warm up] executor > ignite
ERROR ~ org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder.setShared(Z)V

 -- Check script 'blast.nf' at line: 52 or see '.nextflow.log' file for more details

The config used for setting up the cluster:

cloud {
  imageId = 'ami-054c4e0bad8549c37' //a clone of the AMI used in the screencast, to have it available in local aws region
  subnetId = 'subnet-57eba230'  
  sharedStorageId = 'fs-d21be5eb' //EFS volume
  sharedStorageMount = '/mnt/efs'
  instanceType = 't2.micro'
  userName = 'radsuchecki'
}
Paolo Di Tommaso
@pditommaso
May 11 2018 05:56
are you using java 8 or another version ?
Radoslaw Suchecki
@bioinforad_twitter
May 11 2018 06:09
yep, Java 8, in fact, as mentioned above the AMI on which this occurs is a clone of the one you made available in aws eu-west-1
Paolo Di Tommaso
@pditommaso
May 11 2018 06:13
weird, you open an issue including the complete .nextflow.log file ?
Radoslaw Suchecki
@bioinforad_twitter
May 11 2018 06:15
will do
Paolo Di Tommaso
@pditommaso
May 11 2018 06:17
thanks
Radoslaw Suchecki
@bioinforad_twitter
May 11 2018 06:19
In the mean time I have switched to try the k8s setup so expect more questions soon :-)
Paolo Di Tommaso
@pditommaso
May 11 2018 06:19
:+1:
cloud or local cluster ?
Radoslaw Suchecki
@bioinforad_twitter
May 11 2018 06:38
k8s? gcloud
Vladimir Kiselev
@wikiselev
May 11 2018 15:18
so, have we agreed on using one large Docker image vs multiple Docker images, one per each process?
I am thinking about k8s implementation, where the Docker image has to be present in every pod
I assume k8s should be able to reuse already pulled images, but not sure...
so, I am advised to have many small Docker images, one per process
Venkat Malladi
@vsmalladi
May 11 2018 15:35
That is what I am leaning towards
@wikiselev what I am moving towards making: https://github.com/medforomics/udaws-containers
Mike Smoot
@mes5k
May 11 2018 15:59
@wikiselev our group quite happily uses different containers per process. Our main use case for docker was to isolate dependencies. We install so many tools on our internal systems that dependencies between tools inevitably conflict, so using docker to manage this has been a fairly successful approach for us. Given our many, many standalone containers simply dropping them into nextflow processes as needed has worked really well for us. YMMV!
Vladimir Kiselev
@wikiselev
May 11 2018 16:01
Thanks @vsmalladi and @mes5k! So, do you use biocontainers or FROM archlinux to make those images as small as possible?
Mike Smoot
@mes5k
May 11 2018 16:10
We build our own images, mostly from ubuntu:16.04. Image size is only a concern for me once images get over ~2GB in size (e.g. antismash). When I've gone to any trouble minimizing image size it just causes more problems for me. I've found it easier to periodically remove old images or just allocate more space to /var/lib/docker. We use biocontainers is some places, but there are issues: nextflow-io/nextflow#499 that preclude their (easy) use in nextflow.
Venkat Malladi
@vsmalladi
May 11 2018 16:17
We build out own images form ubuntu:18.04
Not so much worried about size at this point
Mike Smoot
@mes5k
May 11 2018 16:21
My sense is that image size might matter at places like Google where they deal with tremendous loads and it makes a difference how fast a containers can be spun up and moved around, but for our much (much) smaller operation it hasn't become an issue.
That being said, it's probably not a good idea to embed your 20GB database in your image, as convenient as that would be.
Venkat Malladi
@vsmalladi
May 11 2018 19:40
agreed
Vladimir Kiselev
@wikiselev
May 11 2018 21:45
@mes5k great, many thanks for sharing your experience!
thanks @vsmalladi, too!