These are chat archives for nextflow-io/nextflow

12th
Dec 2016
Denis Moreno
@Galithil
Dec 12 2016 13:02
Does the amazon cloud part of nextflow rely on amazon SQS ? If not, how are the jobs transferred between instances ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:03
Nope
NF deploys its own clustering engine
Denis Moreno
@Galithil
Dec 12 2016 13:05
I read that, but I was curious as to the implementation
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:05
ok, it's based on Ignite clustering engine
Denis Moreno
@Galithil
Dec 12 2016 13:05
thanks
I'm wondering how hard it would be to be able to specify instances sizes depending on which process is being run
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:06
each node runs a NF process which runs a Ignite daemon managing the jobs deployment
Denis Moreno
@Galithil
Dec 12 2016 13:07
so the nodes need to be there before the jobs are submitted in the general case, but the autoscale lets you spawn more nodes, right ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:07
yep
for instance sizes you mean instance types ?
Denis Moreno
@Galithil
Dec 12 2016 13:08
and the current approach lets you specify one kind of instance to scale to, it can't depend on which process you try to run
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:08
yeah, I see your point
Denis Moreno
@Galithil
Dec 12 2016 13:08
it's not a big deal
big instances can run small jobs anyway
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:09
it shouldn't be too difficult, and actually it's a planned improvment
Denis Moreno
@Galithil
Dec 12 2016 13:09
nice
I'll just go with the standard approach, and update eventally
Denis Moreno
@Galithil
Dec 12 2016 13:34
can I specify a profile or a special configuration file in the scope nextflow cloud ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:38
yes -p
actually I've just noticed that it's not printed in the cloud help
you can even use a different config file with nextflow -c <config file> cloud
Denis Moreno
@Galithil
Dec 12 2016 13:40
in nextflow cloud create, -c is used to specify the number of instances. I guess positional arguments matter
positions *
you get the idea
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:41
yes, it does
Denis Moreno
@Galithil
Dec 12 2016 13:44
interesting, ERROR ~ Not a valid config attribute: keyFile
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:50
too bad
I need to fix that
you can use keyHash in place of that
Denis Moreno
@Galithil
Dec 12 2016 13:51
sure thing
Denis Moreno
@Galithil
Dec 12 2016 13:59
Just to confirm, I'm supposed to copy the content of my .pem file in keyHash, right ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:59
yes
Denis Moreno
@Galithil
Dec 12 2016 13:59
I get a key mismatch, so I'm obviously doing something wrong
Paolo Di Tommaso
@pditommaso
Dec 12 2016 13:59
but if you are using the key given by AWS it's not needed
Denis Moreno
@Galithil
Dec 12 2016 14:00
you mean the secret key the IAM users ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:00
nope
ec2 keys
the easiest way is to not specify key and user at all
Denis Moreno
@Galithil
Dec 12 2016 14:04
how do your restrict access to your instances, then ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:05
it creates an user account with the name as your local user and copy your default ssh public key
so only you will be able to access
Denis Moreno
@Galithil
Dec 12 2016 14:06
but you need to ssh once to make that happen, don't you ? or is this part of the setup ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:07
it's managed by the NF deployment
you won't need to do anything
Denis Moreno
@Galithil
Dec 12 2016 14:07
you still need credentials to spawn ec2 instances
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:08
AWS accessKey and secretKey of course
Denis Moreno
@Galithil
Dec 12 2016 14:09
I have those, but I get 403 forbidden. That's probably a problem with my aws setup rather than with nextflow then
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:09
likely
you will need ec2 and S3 full permissions
Denis Moreno
@Galithil
Dec 12 2016 14:10
s3 as well ? I was planning to use efs instead
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:11
well, if don't need to access to any s3 bucket it's not needed
Denis Moreno
@Galithil
Dec 12 2016 14:11
ok
Denis Moreno
@Galithil
Dec 12 2016 14:28
seems to be working now
nice
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:28
:+1:
Denis Moreno
@Galithil
Dec 12 2016 14:51
how does nextflow mount the efs filesystem ? is it a mount call, or is it done through something else ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:52
yes it is
Denis Moreno
@Galithil
Dec 12 2016 14:52
I can see that it's mounted with the dns
I believe that causes amazon to charge for the data transfer as in-between regions
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:53
yes, it's mounted as specified by the AWS documentation
ah
Denis Moreno
@Galithil
Dec 12 2016 14:53
I got caught as well
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:54
interesting point
Denis Moreno
@Galithil
Dec 12 2016 14:54
Mounted via the dns, and then got charged for inbetween region transfer, although they were both in ireland
googled it, and the main answer is "mount with ip and yo'll be fine"
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:54
evil !
Denis Moreno
@Galithil
Dec 12 2016 14:54
I don't know how much of it is true yet
Well, both are covered in the aws doc
dns is first, though ;)
around 15% of my total costs were data going in and out of the efs, more than the actual storage
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:55
it's needed to be investigated
Denis Moreno
@Galithil
Dec 12 2016 14:56
I'll get to it soon enough, if you're fine with waiting a few days
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:56
sure
Denis Moreno
@Galithil
Dec 12 2016 14:56
I can just remount my efs with the ip and see if I still get charged
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:56
you may want to open an issue for that
that would be great
Denis Moreno
@Galithil
Dec 12 2016 14:57
I'll do it when I have concrete stuff, for now it's just "I have these weird charges on my bill"
Paolo Di Tommaso
@pditommaso
Dec 12 2016 14:57
nice
Denis Moreno
@Galithil
Dec 12 2016 15:21
How does apache ignite handle jobs that rely on environment variables ? One of the workers is missing an env variable :x
Paolo Di Tommaso
@pditommaso
Dec 12 2016 15:23
you should define env vars in the nextflow.config file
then it will propagate them in the tasks
Denis Moreno
@Galithil
Dec 12 2016 15:24
I can try that
thanks
Denis Moreno
@Galithil
Dec 12 2016 15:56
I'm trying to use a script from the bin folder of our main nextflow folder (in aws cloud setup), but it looks like the workers do not see it. Is this related to the fact that I call nextflow on my main.nfinstead of letting nextflow download the whole thing ?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 18:55
yes
otherwise you need to make your project available on a shared path
amacbride
@amacbride
Dec 12 2016 19:42

Hi all, I was wondering if anyone had experimented with the AWS x1.32xlarge instances? I'm seeing a weird Nextflow error that I hadn't seen with other (smaller) instance types. During an initial pipeline step, when NF is downloading data from S3, I'm seeing "Timeout waiting for connection from pool" errors from the underlying Java SDK.

Is there anything in the S3 implementation in NF that scales with the number of available processors? I was wondering if with 128 processors, it might be hitting limits that it wouldn't have with fewer processors. If someone could point me towards the relevant code I can go exploring myself.

Paolo Di Tommaso
@pditommaso
Dec 12 2016 19:44
um, weird
could you include the complete .nextflow.log or at least the complete stack trace ?
(by using paste.bin or a similar service)
amacbride
@amacbride
Dec 12 2016 20:22
I will next time I try running it -- at $13/hr, I didn't leave it running after it errored out.
Paolo Di Tommaso
@pditommaso
Dec 12 2016 20:23
I guess so
amacbride
@amacbride
Dec 12 2016 21:02
Hmm, an unrelated error. Any reason why NF might be having trouble downloading dependencies?
CAPSULE: Downloading dependency io.nextflow:nxf-httpfs:pom:0.22.6 CAPSULE: Transfer failed: capsule.org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact io.nextflow:nxf-httpfs:pom:0.22.6 in central (https://repo1.maven.org/maven2/) (for stack trace, run with -Dcapsule.log=verbose)
Paolo Di Tommaso
@pditommaso
Dec 12 2016 21:03
uh this is bad
let me check
amacbride
@amacbride
Dec 12 2016 21:05
I wasn't sure how to pass Java -D vars to nextflow, is there an obvious way to do it?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 21:05
-D ..
the same
amacbride
@amacbride
Dec 12 2016 21:12

Stack trace with debug turned on (same as above, basically)

http://pastebin.com/FCjbd4rQ

Paolo Di Tommaso
@pditommaso
Dec 12 2016 21:14
yes thanks, I'm fixing that . .
you can rollback on 0.22.5 in the meanwhile
NXF_VER=0.22.5 nextflow run .. etc
amacbride
@amacbride
Dec 12 2016 21:16
I needed the bugfix for #259, so I have to stick with 0.22.6 for now
Paolo Di Tommaso
@pditommaso
Dec 12 2016 21:17
ok, should be fine now
amacbride
@amacbride
Dec 12 2016 21:33
It worked once I deleted the existing .nextflow directory.
Now onto the original problem :)
Paolo Di Tommaso
@pditommaso
Dec 12 2016 21:34
sure, sorry I didn't tell you .. :/
ok
amacbride
@amacbride
Dec 12 2016 21:34
no worries, I'm just thrilled that you're so responsive and helpful. it's much appreciated!
Paolo Di Tommaso
@pditommaso
Dec 12 2016 21:34
:)
yes, I used to work between an interruption and the following ;)
amacbride
@amacbride
Dec 12 2016 22:10

OK, it's consistent, at least -- it's failed every time, and works fine on a smaller instance. It looks like ~60 file copy requests were executing in parallel when it died.

http://pastebin.com/kyYt2nNS

Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:12
it looks a problem with aws sdk
could you include also the header of the log file ?
amacbride
@amacbride
Dec 12 2016 22:16
Where in the source do you actually interface with S3? I saw a reference to s3fs, and it seems to be dying here, in newInputStream, right before it dives into the SDK: https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/blob/master/src/main/java/com/upplication/s3fs/S3FileSystemProvider.java
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:19
the S3 sdk is wrapped as java file system
java.nio.file.Files.newInputStream(Files.java:108)
amacbride
@amacbride
Dec 12 2016 22:20
What info would be helpful from the header? I can't paste the whole thing (some proprietary info), but I should be able to mask out things you don't need.
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:21
in the first line there are some info about the hardware
I want to use it to open an issue on the aws-java-sdk
amacbride
@amacbride
Dec 12 2016 22:21
Dec-12 21:34:02.767 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=128; memory=1.9 TB; capacity=200; pollInterval=100ms; dumpInterval=5m
Dec-12 21:34:02.860 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 200; pollInterval: 1s; dumpInterval: 5m
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:22
nope
something like
Dec-12 23:05:06.538 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 0.22.6 build 4117
  Modified: 12-12-2016 22:04 UTC (23:04 CEST)
  System: Mac OS X 10.11.5
  Runtime: Groovy 2.4.7 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_80-b15
  Encoding: UTF-8 (UTF-8)
  Process: 15669@shiny.local [192.168.1.46]
  CPUs: 8 - Mem: 16 GB (2 GB) - Swap: 1 GB (897 MB)
at the very beginning
This message was deleted
amacbride
@amacbride
Dec 12 2016 22:23
(how do you get the code block? I thought it was backticks, but apparently not
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:24
``` at the beginning and the end
:)
on bottom right there's a help
amacbride
@amacbride
Dec 12 2016 22:25
Dec-12 21:34:01.398 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 0.22.6 build 4116
  Modified: 04-12-2016 21:50 UTC 
  System: Linux 3.19.0-74-generic
  Runtime: Groovy 2.4.7 on OpenJDK 64-Bit Server VM 1.7.0_121-b00
  Encoding: UTF-8 (UTF-8)
  Process: 36056@ip-10-101-9-244 [10.101.9.244]
  CPUs: 128 - Mem: 1.9 TB (1.9 TB) - Swap: 0 (0)
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:25
:+1:
amacbride
@amacbride
Dec 12 2016 22:25
facepalm
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:25
ahaha
there should be an emoticon for that
It's my go-to.
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:27
:)
amacbride
@amacbride
Dec 12 2016 22:28
So as a practical matter, I suppose I could use maxForks or something similar to limit the number of concurrent S3 file transfers, though that seems inelegant.
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:28
yes
it could be a workaround
amacbride
@amacbride
Dec 12 2016 22:30
Each of these particular tasks requires 8 FASTQ files (4 lanes, bidirectional), and 60 seems to be the magic connection pool number, so I'll try limiting to 6 or 7 and see what happens.
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:31
I see, I will let you know if I found something
amacbride
@amacbride
Dec 12 2016 22:31
Does NF obey the maxForks directive when using SLURM or another resource manager?
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:32
yes
amacbride
@amacbride
Dec 12 2016 22:32
groovy!
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:32
of course!
amacbride
@amacbride
Dec 12 2016 22:32
:)
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:32
NF is full groovy inside ;)
amacbride
@amacbride
Dec 12 2016 22:33
both groovy and far out
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:43
gotcha!
I think I've found the problem
Paolo Di Tommaso
@pditommaso
Dec 12 2016 22:53
it was missing a connection close
you may want to give a try to this snapshot
NXF_VER=0.23.0-SNAPSHOT nextflow run .. etc
amacbride
@amacbride
Dec 12 2016 23:34
Cool, I will give it a try. (Limiting the connections to 7 seems to be working at the moment, but it's not giving me any throughput benefit of the larger instance.)