These are chat archives for nextflow-io/nextflow

12th
Nov 2016
Jason Byars
@jbyars
Nov 12 2016 06:35
@pditommaso Amazon cloud setup and tear down worked just fine. I only ran into a couple minor glitches with the Amazon autoscaling. In one situation with spot instances, 19 instances allocated when maxInstances was set to 11. The other issue occurred when I specified the subnetId and securityGroup. The cluster would create, but would throw an EC2Instances error complaining about GroupName when autoscaling triggered. I looked into it a little and the usual culprit is providing the security group name instead of id. We provide the id, so this seems a bit strange. I have the logs to go over later. The real issue I ran into was the I/O cap on EFS for more than a handful of instances. What was the idea to get around that?
Paolo Di Tommaso
@pditommaso
Nov 12 2016 09:37
@jbyars If you can upload the logs somewhere (eventually a GH issue) I will give a look to the autoscaling problem.
The "EC2Instances error complaining about GroupName" issue looks more a documentation problem, doesn't it?
Paolo Di Tommaso
@pditommaso
Nov 12 2016 09:43
Then regarding EFS I/O limit, I haven't yet run a real benchmark but it looks quite improbable that it performs worst than a NFS server locally configured in the cloud cluster.
Said that, my advice is to choose the Max performance I/O mode with the nextflow scratch directive set to true, so that NF jobs work on the instance local storage instead of the shared file system.
Paolo Di Tommaso
@pditommaso
Nov 12 2016 10:26
@jbyars Just run into https://github.com/google/cadvisor
cadvisor2.png
cadvisor1.png
Jason Byars
@jbyars
Nov 12 2016 14:47
oh I agree EFS shouldn't perform any worse than NFS. This was a worst case scenario test, split 100 bam files based on read name, so almost all I/O. In the past, the problem I ran into with the scratch directive was input files from s3 buckets were still copied to the shared NFS space by the master node. Is there a clean way around that now, so large input files can be pulled directly to worker scracth space, without ever touching EFS?
Paolo Di Tommaso
@pditommaso
Nov 12 2016 14:54
I see your point, unfortunately it's not possible at this time. It could be a nice improvement
Jason Byars
@jbyars
Nov 12 2016 14:54
@pditommaso My only idea is the Channel with the large files just hold the s3urls and the first step in the script section is to grab a copy of the files. Thank you for pointing out cadvisor.
Paolo Di Tommaso
@pditommaso
Nov 12 2016 14:55
there's a lot of space for improvement in handling large file, this is a good suggestion
yeah, cadvisor looks cool
Jason Byars
@jbyars
Nov 12 2016 14:57
I'll play around with that idea. This might not make scripts that much more complicated. Just remind me, what do I need to set to have the scratch folder automatically clean up on the workers at the end of the each job?
Paolo Di Tommaso
@pditommaso
Nov 12 2016 14:58
just process.scratch = true
Jason Byars
@jbyars
Nov 12 2016 14:58
great, thanks! I'll get you the logs a little later and let you know how it goes.
Paolo Di Tommaso
@pditommaso
Nov 12 2016 14:59
great, I want to investigate what went wrong with the autoscaling
Jason Byars
@jbyars
Nov 12 2016 19:59
if scratch = true, is there a way to get the scratch directory before the script section runs? My idea was to use beforeScript to fetch big files directly to worker scratch directories before running the script section. Unfortunately, beforeScript appears to run in the normal shared work directory, not the scratch dir. This is really only an issue because I'm using an executable container and need to fetch before the container runs.