These are chat archives for nextflow-io/nextflow

21st
Oct 2016
Mike Smoot
@mes5k
Oct 21 2016 17:15
@pditommaso Hi Paolo, I'm wondering if I should make the cloud config option bootStorageSize something like 50GB to account for docker images and containers in /var/lib/docker? Have you run into problems with docker consuming too much space in your cloud instances?
Paolo Di Tommaso
@pditommaso
Oct 21 2016 17:16
you have docker images of 50 GB?!
Mike Smoot
@mes5k
Oct 21 2016 17:18
No! But we have many containers with some piplelines using as many as 10 different ones.
Then running multiple pipelines and it's easy to see us consuming that space quickly.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 17:19
in that case yes
Mike Smoot
@mes5k
Oct 21 2016 17:19
great, just making sure I'm thinking along the correct lines.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 17:20
we are talking about the space to store the docker image not the container runs, right?
Mike Smoot
@mes5k
Oct 21 2016 17:21
/var/lib/docker usually has both, as far as I know
Paolo Di Tommaso
@pditommaso
Oct 21 2016 17:21
has both
what ?
Mike Smoot
@mes5k
Oct 21 2016 17:22
/var/lib/docker/containers and /var/lib/docker/images
Paolo Di Tommaso
@pditommaso
Oct 21 2016 17:23
ok yes, but the containers are removed on task terminations
Mike Smoot
@mes5k
Oct 21 2016 17:23
right
Given our large number of dockerized tools, it's not hard for us to run out of space because of new images and new versions of images
Paolo Di Tommaso
@pditommaso
Oct 21 2016 17:25
makes sense, larger it's better than run out of space ;)
amacbride
@amacbride
Oct 21 2016 19:04
@pditommaso The alignment output is used by three different processes.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:05
ok, so each of them use a different output file
amacbride
@amacbride
Oct 21 2016 19:10
Yup. There's a later step that merges the BAM files, and then it branches those (duplicate) channels to 3 parallel processes.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:11
:+1:
amacbride
@amacbride
Oct 21 2016 19:11
(btw, I love the --with-dag output -- I had been doing flow diagrams by hand in GraphViz, and this is much easier when I'm making changes.)
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:11
great, I'm happy you are find it useful
Mike Smoot
@mes5k
Oct 21 2016 19:43
Hi Paolo, I'm trying to get a pipeline running with nextflow cloud and apart from a few hacks needed because of our VPC, things are working. At the moment I can start a pipeline on the master node, but no tasks seem to be propagating to the workers. When I look in .nextflow.log I see Ignite think that it has 4 cpus, when it should have 12 (I'm starting 3 m4.xlarge). I see that ignite sees both workers and I see a message about joining the workers with the master, but then nothing obvious that joining either worked or failed.
I see one unknown host exception from the master node, but that's before the ignite stuff, so I'm not sure if it's impacting anything.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:46
um, can you share the log file ?
Mike Smoot
@mes5k
Oct 21 2016 19:46
Sure, one sec
Mike Smoot
@mes5k
Oct 21 2016 19:53
The main problem I've been dealing with in our VPC is that amazonaws.com domain names don't resolve because of how the VPC is configured. So, for example, to mount EFS, I added the mount point to /etc/fstab in the AMI because the way you (and how Amazon recommends) that you discover the hostname of the EFS server doesn't work for me. So far I've been able to get everything working apart from ignite.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:55
it cannot resolve the local address !?
java.net.UnknownHostException: ip-172-19-0-23: ip-172-19-0-23: Name or service not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
    at nextflow.daemon.IgGridFactory.getLocalAddress(IgGridFactory.groovy:341)
    at nextflow.daemon.IgGridFactory.findCloudIpAddresses(IgGridFactory.groovy:353)
    at nextflow.daemon.IgGridFactory.discoveryConfig(IgGridFactory.groovy:320)
    at nextflow.daemon.IgGridFactory.config(IgGridFactory.groovy:129)
    at nextflow.daemon.IgGridFactory.start(IgGridFactory.groovy:115)
    at nextflow.file.igfs.IgFileSystemProvider.getGridFor(IgFileSystemProvider.groovy:150)
    at nextflow.file.igfs.IgFileSystemProvider.getIgniteFileSystem(IgFileSystemProvider.groovy:172)
    at nextflow.file.igfs.IgFileSystemProvider.newFileSystem(IgFileSystemProvider.groovy:126)
    at nextflow.file.igfs.IgFileSystemProvider.newFileSystem(IgFileSystemProvider.groovy)
    at nextflow.file.FileHelper$_getOrCreateFileSystemFor_closure7.doCall(FileHelper.groovy:574)
Mike Smoot
@mes5k
Oct 21 2016 19:55
Right, I think that's because of the VPC configuration.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:56
weird
Mike Smoot
@mes5k
Oct 21 2016 19:56
high security
and high irritation! :)
Paolo Di Tommaso
@pditommaso
Oct 21 2016 19:56
let me check my config
have you created a security group ?
Mike Smoot
@mes5k
Oct 21 2016 20:03
the cluster nodes appear to have a default security group
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:04
something like this?
Screen Shot 2016-10-21 at 22.04.45.png
the important one should be All inbound in the same security group
Mike Smoot
@mes5k
Oct 21 2016 20:07
Yes, for the cluster it's "All Traffic" for both inbound and outbound
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:10
does it specify a VPN ID ?
Mike Smoot
@mes5k
Oct 21 2016 20:10
Do you mean VPC ID?
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:10
yes sorry
Mike Smoot
@mes5k
Oct 21 2016 20:11
Yes it does. Let me make sure they're all the same.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:11
also make sure you are using a subnet in that VPC
Mike Smoot
@mes5k
Oct 21 2016 20:12
ok
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:13
do you mean the config is fine ?
Mike Smoot
@mes5k
Oct 21 2016 20:14
Sorry, same vpc with the same subnet and the subnet is in the vpc
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:16
umm bad
I would suggest to launch a single instance manually and verify why it can't mount EFS
Mike Smoot
@mes5k
Oct 21 2016 20:18
I'm fairly certain that the inability to mount is because I can't resolve the host name. This command fails for me: sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone).fs-67ea11ce.efs.us-west-2.amazonaws.com:/ efs
whereas this one sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 172.19.0.61:/ efs works fine
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:19
that must work
Mike Smoot
@mes5k
Oct 21 2016 20:20
Anyway, if you don't see anything obviously wrong, then I'm going to guess that this is related to how our VPC is configured and talk to the people who set that up.
A separate question: how do you handle root privileges for docker? Right now I do a sudo chmod o+rw /var/run/docker.sock to allow the default user to run docker, but I needed to do this manually. Did you automate this in some way?
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:22
um, what AMI are you using ?
Mike Smoot
@mes5k
Oct 21 2016 20:22
One I created.
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:23
I mean what distro?
Mike Smoot
@mes5k
Oct 21 2016 20:23
centos
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:23
the docker installer should create a user dockergroup
Mike Smoot
@mes5k
Oct 21 2016 20:25
Hmmm. Didn't follow those instructions! I just yum installed a docker-engine package.
Let me give that a try.
Much nicer approach!
Now I just need to wait another 2 hours to create the AMI... :)
Paolo Di Tommaso
@pditommaso
Oct 21 2016 20:28
two hours!? why so long ?
Mike Smoot
@mes5k
Oct 21 2016 20:40
Not sure, but creating the last one took that long.