These are chat archives for nextflow-io/nextflow

17th
Mar 2017
Karin Lagesen
@karinlag
Mar 17 2017 08:26
got it to run, now I just need to figure out how to interact with slurm...
I'm testing out what it says here: https://hpc.nih.gov/apps/nextflow.html
here they say the following: The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.
it seems a bit counter-intuitive, running nextflow from an interactive node
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:41
I did't know that page
Karin Lagesen
@karinlag
Mar 17 2017 08:42
it looked useful, so I thought I\d test it out
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:43
we launch NF on the login node, but in some systems this is not allowed
Karin Lagesen
@karinlag
Mar 17 2017 08:44
but would launching from an interactive node actually work....? or from a sbatch script?
I could see an sbatch script working, you can kickstart other slurm scripts from one as for instance with arrayrun
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:45
but would launching from an interactive node actually work....? or from a sbatch script?
it depends by your cluster? are you using the same system as @huguesfontenelle ?
Karin Lagesen
@karinlag
Mar 17 2017 08:45
sort of
I am using the system that afaik hughes system mirrors
so I just fired off an email a few minutes ago to figure out how he does things
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:47
well, are you able to submit a slurm job from an interactive node ?
Karin Lagesen
@karinlag
Mar 17 2017 08:47
heh, haven\t tried :)
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:47
if yes you can run NF from there
Karin Lagesen
@karinlag
Mar 17 2017 08:47
thought never struck me
I'll try that out a bit later today (meetings first though)
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:48
however when setting the cluster executor what is happening is that NF launches tasks by using the sbatch command
Karin Lagesen
@karinlag
Mar 17 2017 08:48
exactly
so I need to figure out where it is available
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:49
thus, it needs to launch in a node where this is allowed, usually the login node
Karin Lagesen
@karinlag
Mar 17 2017 08:49
but, if I understand you correctly: as long as sbatch is available, I should be good?
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:49
yes
Karin Lagesen
@karinlag
Mar 17 2017 08:49
ok, thanks, will try things out a bit later today then.
Paolo Di Tommaso
@pditommaso
Mar 17 2017 08:50
then you can choose, login node, interactive node or a slurm job itself
Karin Lagesen
@karinlag
Mar 17 2017 12:08
ok, got things to run with slurm
but: I get
ar-17 13:06:30.477 [main] DEBUG nextflow.Session - Work-dir: /cluster/home/karinlag/tmp/work [fhgfs]
Mar-17 13:06:30.477 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /cluster/home/karinlag/tmp/bin
code is as I found earlier on the nih site
hmmm
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:36
try a simpler thing
nextflow run hello
Karin Lagesen
@karinlag
Mar 17 2017 12:38
runs nicely when I use the local executor, but fails when I do with slurm
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:39
what's the error message?
Karin Lagesen
@karinlag
Mar 17 2017 12:39

[karinlag@abel tmp]$ nextflow -C nextflow.conf run hello
N E X T F L O W ~ version 0.23.4
Launching nextflow-io/hello [amazing_perlman] - revision: 6b9515aba6 [master]
[warm up] executor > slurm
ERROR ~ Error executing process > 'sayHello (3)'

Caused by:
Failed to submit job to grid scheduler for execution

Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:39
umm Failed to submit job to grid scheduler for execution
what's the content of the .nextflow.log file ?
Karin Lagesen
@karinlag
Mar 17 2017 12:40
Mar-17 13:38:16.871 [main] DEBUG nextflow.cli.Launcher - $> /work/projects/nn9305k/bin/nextflow -C nextflow.conf run hello
Mar-17 13:38:17.059 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 0.23.4
Mar-17 13:38:17.237 [main] DEBUG nextflow.scm.AssetManager - Listing projects in folder: /usit/abel/u1/karinlag/.nextflow/assets
Mar-17 13:38:17.263 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.292 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.496 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.496 [main] INFO nextflow.cli.CmdRun - Launching nextflow-io/hello [amazing_perlman] - revision: 6b9515aba6 [master]
Mar-17 13:38:17.513 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /cluster/home/karinlag/tmp/nextflow.conf
Mar-17 13:38:18.128 [main] DEBUG nextflow.config.ConfigBuilder - Setting config profile: 'standard'
Mar-17 13:38:18.231 [main] DEBUG nextflow.Session - Session uuid: fbf45a20-233c-4532-9945-65aa727fe6a3
Mar-17 13:38:18.232 [main] DEBUG nextflow.Session - Run name: amazing_perlman
Mar-17 13:38:18.232 [main] DEBUG nextflow.Session - Executor pool size: 32
Mar-17 13:38:18.257 [main] DEBUG nextflow.cli.CmdRun -
Version: 0.23.4 build 4170
Modified: 24-02-2017 09:38 UTC (10:38 CEST)
System: Linux 2.6.32-642.15.1.el6.x86_64
Runtime: Groovy 2.4.9 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13
Encoding: UTF-8 (UTF-8)
Process: 17605@login-0-1.local [10.110.21.7]
CPUs: 32 - Mem: 63 GB (24.5 GB) - Swap: 31.2 GB (31.2 GB)
Mar-17 13:38:18.275 [main] DEBUG nextflow.Session - Work-dir: /cluster/home/karinlag/tmp/work [fhgfs]
Mar-17 13:38:18.276 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/bin
(how can I get a dark background on stuff like this?)
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:41
use triple ` at the beginning and the end
Karin Lagesen
@karinlag
Mar 17 2017 12:41
testing
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:42
triple ` and new line
Karin Lagesen
@karinlag
Mar 17 2017 12:43
test again
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:43
good
that's all the content of the log file ?
Karin Lagesen
@karinlag
Mar 17 2017 12:43
[karinlag@abel tmp]$ cat .nextflow.log
Mar-17 13:38:16.871 [main] DEBUG nextflow.cli.Launcher - $> /work/projects/nn9305k/bin/nextflow -C nextflow.conf run hello
Mar-17 13:38:17.059 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.23.4
Mar-17 13:38:17.237 [main] DEBUG nextflow.scm.AssetManager - Listing projects in folder: /usit/abel/u1/karinlag/.nextflow/assets
Mar-17 13:38:17.263 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.292 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.496 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.496 [main] INFO  nextflow.cli.CmdRun - Launching `nextflow-io/hello` [amazing_perlman] - revision: 6b9515aba6 [master]
Mar-17 13:38:17.513 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /cluster/home/karinlag/tmp/nextflow.conf
Mar-17 13:38:18.128 [main] DEBUG nextflow.config.ConfigBuilder - Setting config profile: 'standard'
Mar-17 13:38:18.231 [main] DEBUG nextflow.Session - Session uuid: fbf45a20-233c-4532-9945-65aa727fe6a3
Mar-17 13:38:18.232 [main] DEBUG nextflow.Session - Run name: amazing_perlman
Mar-17 13:38:18.232 [main] DEBUG nextflow.Session - Executor pool size: 32
Mar-17 13:38:18.257 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 0.23.4 build 4170
  Modified: 24-02-2017 09:38 UTC (10:38 CEST)
  System: Linux 2.6.32-642.15.1.el6.x86_64
  Runtime: Groovy 2.4.9 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13
  Encoding: UTF-8 (UTF-8)
  Process: 17605@login-0-1.local [10.110.21.7]
  CPUs: 32 - Mem: 63 GB (24.5 GB) - Swap: 31.2 GB (31.2 GB)
Mar-17 13:38:18.275 [main] DEBUG nextflow.Session - Work-dir: /cluster/home/karinlag/tmp/work [fhgfs]
Mar-17 13:38:18.276 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/bin
Mar-17 13:38:18.717 [main] DEBUG nextflow.Session - Session start invoked
Mar-17 13:38:18.724 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Mar-17 13:38:18.725 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Mar-17 13:38:18.949 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-17 13:38:19.144 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
Mar-17 13:38:19.144 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
Mar-17 13:38:19.162 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
Mar-17 13:38:19.165 [main] INFO  nextflow.executor.Executor - [warm up] executor > slurm
Mar-17 13:38:19.173 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 100; pollInterval: 5s; dumpInterval: 5m 
Mar-17 13:38:19.176 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: TaskPollingMonitor
Mar-17 13:38:19.176 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: slurm)
Mar-17 13:38:19.205 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: slurm
Mar-17 13:38:19.206 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
Mar-17 13:38:19.279 [main] DEBUG nextflow.Session - >>> barrier register (process: sayHello)
Mar-17 13:38:19.301 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > sayHello -- maxForks: 32
Mar-17 13:38:19.352 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Mar-17 13:38:19.353 [main] DEBUG nextflow.Session - Session await
Mar-17 13:38:19.379 [Actor Thread 1] DEBUG nextflow.processor.TaskProcessor - <sayHello> Poison pill arrived
Mar-17 13:38:
a bit more readable
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:44
still missing something it seems
Karin Lagesen
@karinlag
Mar 17 2017 12:45
but, but ...grfm
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:45
you may want to use pastebin.com to upload the log
Karin Lagesen
@karinlag
Mar 17 2017 12:46
[karinlag@abel tmp]$ cat .nextflow.log
Mar-17 13:38:16.871 [main] DEBUG nextflow.cli.Launcher - $> /work/projects/nn9305k/bin/nextflow -C nextflow.conf run hello
Mar-17 13:38:17.059 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.23.4
Mar-17 13:38:17.237 [main] DEBUG nextflow.scm.AssetManager - Listing projects in folder: /usit/abel/u1/karinlag/.nextflow/assets
Mar-17 13:38:17.263 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.292 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.496 [main] DEBUG nextflow.scm.AssetManager - Git config: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git
Mar-17 13:38:17.496 [main] INFO  nextflow.cli.CmdRun - Launching `nextflow-io/hello` [amazing_perlman] - revision: 6b9515aba6 [master]
Mar-17 13:38:17.513 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /cluster/home/karinlag/tmp/nextflow.conf
Mar-17 13:38:18.128 [main] DEBUG nextflow.config.ConfigBuilder - Setting config profile: 'standard'
Mar-17 13:38:18.231 [main] DEBUG nextflow.Session - Session uuid: fbf45a20-233c-4532-9945-65aa727fe6a3
Mar-17 13:38:18.232 [main] DEBUG nextflow.Session - Run name: amazing_perlman
Mar-17 13:38:18.232 [main] DEBUG nextflow.Session - Executor pool size: 32
Mar-17 13:38:18.257 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 0.23.4 build 4170
  Modified: 24-02-2017 09:38 UTC (10:38 CEST)
  System: Linux 2.6.32-642.15.1.el6.x86_64
  Runtime: Groovy 2.4.9 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13
  Encoding: UTF-8 (UTF-8)
  Process: 17605@login-0-1.local [10.110.21.7]
  CPUs: 32 - Mem: 63 GB (24.5 GB) - Swap: 31.2 GB (31.2 GB)
Mar-17 13:38:18.275 [main] DEBUG nextflow.Session - Work-dir: /cluster/home/karinlag/tmp/work [fhgfs]
Mar-17 13:38:18.276 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /usit/abel/u1/karinlag/.nextflow/assets/nextflow-io/hello/bin
Mar-17 13:38:18.717 [main] DEBUG nextflow.Session - Session start invoked
Mar-17 13:38:18.724 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Mar-17 13:38:18.725 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Mar-17 13:38:18.949 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-17 13:38:19.144 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
Mar-17 13:38:19.144 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
Mar-17 13:38:19.162 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
Mar-17 13:38:19.165 [main] INFO  nextflow.executor.Executor - [warm up] executor > slurm
Mar-17 13:38:19.173 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 100; pollInterval: 5s; dumpInterval: 5m 
Mar-17 13:38:19.176 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: TaskPollingMonitor
Mar-17 13:38:19.176 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: slurm)
Mar-17 13:38:19.205 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: slurm
Mar-17 13:38:19.206 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
Mar-17 13:38:19.279 [main] DEBUG nextflow.Session - >>> barrier register (process: sayHello)
Mar-17 13:38:19.301 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > sayHello -- maxForks: 32
Mar-17 13:38:19.352 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Mar-17 13:38:19.353 [main] DEBUG nextflow.Session - Session await
Mar-17 13:38:19.379 [Actor Thread 1] DEBUG nextflow.processor.TaskProcessor - <sayHello> Poison pill arrived
Mar-17 13:38:19.386 [Actor Thread 2] DEBUG nextflow.processor.StateObj - <sayHello> State before poison: StateObj[submitted: 4; completed: 0; poisoned: false ]
Mar-17 13:38:19.484 [Actor Thread 5] DEBUG nextflow.executor.GridTaskHandler - Launching process > sayHello (3) -- work folder: /cluster/home/karinlag/tmp/work/80/4072dc35358c152697722b81f7428f
Mar-17 13:38:19.719 [Actor Thread 6] DEBUG nextflow.executor.GridTaskHandler - Launching process > sayHello (4) -- work folder: /cluster/home/karinlag/tmp/work/6f/4133b41eabd6e6c832bec8666fa500
Mar-17 13:38:19.736 [Actor Thread 5] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: /cluster/home/karinlag/tmp/work/80/4072dc35358c152697722b81f7428f/.command.log
Mar-17 13:38:19.737 [Actor Thread 5] ERROR nextflow.processor.TaskProcessor - Error executing process > 'sayHello (3)'

Caused by:
  Failed to submit job to grid scheduler for execution

Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  (empty)

Work dir:
  /cluster/home/karinlag/tmp/work/80/4072dc35358c152697722b81f7428f

Tip: view the complete command output by changing to the process work dir and entering the command: 'cat .command.out'
Mar-17 13:38:19.741 [Actor Thread 5] DEBUG nextflow.Session - Session aborted -- Cause: Error submitting process 'sayHello (3)' for execution
Mar-17 13:38:19.749 [Thread-3] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm)
Mar-17 13:38:19.752 [main] DEBUG nextflow.Session - Session await > all process finished
Mar-17 13:38:19.752 [main] DEBUG nextflow.Session - Session await > all barriers passed
Mar-17 13:38:19.755 [Actor Thread 1] DEBUG nextflow.processor.TaskProcessor - <sayHello> After stop
Mar-17 13:38:19.810 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
[karinlag@abel tmp]$
there
apparently there\s a limit to how much I can paste in in one go
Paolo Di Tommaso
@pditommaso
Mar 17 2017 12:47
:)
ok, for some reason is failing sbatch
what you can do is change in the /cluster/home/karinlag/tmp/work/80/4072dc35358c152697722b81f7428f
and launch in manually to troubleshot it
sbatch .command.run
if you open the .command.run you will find on the of the file the SLURM directives
Karin Lagesen
@karinlag
Mar 17 2017 12:49
ok
I\m doing this now with hello
the cluster specifies that we should source this one script before any tasks we run
i.e. source file; whatever commands we're running with sbatch
I had that included with the other script, the one from nih
but that might not matter at all yet, since form what I understand I'm not even managing to call sbatch?
Karin Lagesen
@karinlag
Mar 17 2017 12:58
had messed up my cluster options, now got it running :)
Paolo Di Tommaso
@pditommaso
Mar 17 2017 13:04
:v:
amacbride
@amacbride
Mar 17 2017 18:42
Can anyone help me brainstorm a solution to a problem I've having? Essentially, I need to rate-limit the foreign-file download step, as no amount of parameter tweaking seems to be helping with the S3 connection pool timeout issue I'm seeing. (It keeps trying to download 128 FASTQ files from S3 at the same time, which is just not going to happen.) It seems to me that the buffer operator might help, but I'm not quite sure how to fit it in.
My initial channel takes a directory, transforms the file names there and extracts attributes, then passes a list of tuples to my alignment step that (implicitly) pulls the files from S3 using the file() operator.
Mike Smoot
@mes5k
Mar 17 2017 18:45
@amacbride when you say "alignment step" that's a process that attempts to use the file, right?
amacbride
@amacbride
Mar 17 2017 18:46
What would be ideal would be to take the tuples from that channel and pass them into the alignment step in groups of 8 -- that would limit the simultaneous downloads (I think), as well as improve my overall throughput (since it will be taking in groups of 8 that are all to be processed together.)
Paolo Di Tommaso
@pditommaso
Mar 17 2017 18:47
put a maxForks on the process that download that files
amacbride
@amacbride
Mar 17 2017 18:47
@mes5k Yes. The input spec is:
input:
        set sample_name, sample_id, lane_id, file(read1), file(read2) from rawfastqs
Mike Smoot
@mes5k
Mar 17 2017 18:48
Yup, then I was going to suggest maxForks just like @pditommaso
Paolo Di Tommaso
@pditommaso
Mar 17 2017 18:48
otherwise have you taken in consideration to download them beforehand launching the pipeline ?
amacbride
@amacbride
Mar 17 2017 18:49
I thought about that, but it them limits me when I'm pulling from our local SAN. Is it possible to make maxForks conditional?
shrug sure, but that defeats the whole purpose of making it transparent.
Paolo Di Tommaso
@pditommaso
Mar 17 2017 18:50
maxForks can be condition but I would define it a config file profile specific for that
amacbride
@amacbride
Mar 17 2017 18:50
(as in, I then have to special-case all sort of things based on the origin of the files)
Paolo Di Tommaso
@pditommaso
Mar 17 2017 18:51
indeed, instead or having a condition logic, define a separate config file or profile
out of curiosity have you tried to decrease S3 maxConnections ?
amacbride
@amacbride
Mar 17 2017 18:54
^^ yes, it doesn't help. I was still seeing NF try to download everything at once, and was still getting the connection pool timeout.
Paolo Di Tommaso
@pditommaso
Mar 17 2017 18:55
I see, I need to improve that part
amacbride
@amacbride
Mar 17 2017 18:55
I will explore maxForks a bit, I guess I can live with slightly reduced throughput for greater reliability.
Paolo Di Tommaso
@pditommaso
Mar 17 2017 18:56
sorry if I ask again, why downloading them is not an option ?
amacbride
@amacbride
Mar 17 2017 18:56
(Even a large local or AWS instance can't handle more than 14-15 simultaneous alignments, so setting maxForks in that vicinity is probably a reasonable compromise.)
@pditommaso I'm just trying to avoid it because it trades a nice clean solution (where NF handles it all transparently) for one with several janky special cases (various combinations of AWS vs local compute, local vs remote storage, explicitly running aws s3 sync, having to clean up temp files afterwards, worrying about retries, etc.)
Mike Smoot
@mes5k
Mar 17 2017 19:00
How are you ending in up with 100+ simultaneous downloads if you're only running 14-15 alignment processes concurrently?
Paolo Di Tommaso
@pditommaso
Mar 17 2017 19:06
I understand that. But as before, it would not require a different implement, you could just specify the input reads path as a command line option or a config file
amacbride
@amacbride
Mar 17 2017 19:09
I'm trying to figure that one out myself. I have a channel that emits 120 tuples (the contents of the directory, transformed into tuples of the form (sample_name,sample_id,lane_id,read1_path,read2_path)
That's read by an alignment process that outputs aligned bams to an alignment channel.
Next, the alignment channel runs into a merge node that does a sorted groupTuple of size 4 to merge the 4 lanes for each individual sample.
It seems as though when I increased the parallelism of the alignment step (which I did recently), I'm hitting this implicit limit on the number of simultaneous S3 downloads. So I'm trying to to mitigate it without having to rework it.
Paolo Di Tommaso
@pditommaso
Mar 17 2017 19:12
anyhow I'm planning to re-implement the S3 file system library, it could have sense to add a throttle mechanism
amacbride
@amacbride
Mar 17 2017 19:13
I suppose I could go back to putting maxForks back on these nodes, but it sort of goes against (in my case) letting SLURM worry about scheduling.
That would be cool. I will look into workarounds in the short term.
Mike Smoot
@mes5k
Mar 17 2017 19:14
I'd sort out why so many are running concurrently. Maybe use the cpus directive to give each process more cpu, which can have the effect of limiting the number of concurrent processes.
amacbride
@amacbride
Mar 17 2017 19:17
@mes5k Righto -- but that's relying on a side-effect, and then doesn't describe the system accurately. (It would certainly work, as I've used similar tricks elsewhere, but it goes against the spirit of declaring the actual resources used and letting the scheduler execute things efficiently.)
Mike Smoot
@mes5k
Mar 17 2017 19:21
I guess it depends on how your aligner runs. We tend to allocate lots of cpu to bwa
And I guess it matters what you're aligning
amacbride
@amacbride
Mar 17 2017 19:23
We're using STAR, and more CPU helps, but it's not as critical as memory.
RNAseq
My ulterior motive in getting an ordered stream is also that once they're downloaded and aligned (with each sample taking 8 files, which maps to 4 alignment steps) they're merged into sample-oriented BAM files. So it doesn't help me overall to download and align a bunch of files from different samples, if one of the downstream steps blocks on having all files for a particular sample available.
amacbride
@amacbride
Mar 17 2017 19:29
If I can do some traffic shaping upstream (download/align in blocks corresponding to the files required by a sample), I can send things into the later pipeline steps more quickly, for an overall efficiency gain.
(But since it's in a commercial setting, I have to guarantee reliability to the greatest extent I can.)