These are chat archives for nextflow-io/nextflow

30th
Apr 2015
Andrew Stewart
@andrewcstewart
Apr 30 2015 00:10
hey @pditommaso if I wanted to stash the nextflow log files into a database automatically after every run, where would be the best place to do that?
Paolo Di Tommaso
@pditommaso
Apr 30 2015 12:18
Hi @andrewcstewart, I would wrap it in bash script that import the log in db. Is that too much rustic ?
Michael L Heuer
@heuermh
Apr 30 2015 16:21

I read a file into a variable to be shared across several processes ref = file("${params.reference}") where it defaults to a file in the git repo params.reference = "${baseDir}/tutorial/ref/chr6-ex.fa". Thus it ends up in a directory /home/nextflow/.nextflow/assets/git/repo. Unfortunately this isn't accessible to Docker. Other files in that directory do end up accessible to Docker as symlinks in the Nextflow working directory since they come in via input: section of Nextflow processes.

Is this an error or do I need to put that file into a channel so that it can also be symlinked and come into Nextflow processes via input:? If so I'll need some way of having a channel that emits the same value as many times as necessary.

See nmdp-bioinformatics/flow#18
Paolo Di Tommaso
@pditommaso
Apr 30 2015 16:27
it is not needed a channel but you need to declare it as a file in the input: section
for example:
process foo {
  input: 
  file ref 

  """
  ... 
  """
}
Michael L Heuer
@heuermh
Apr 30 2015 16:32
ok and I can do that in multiple processes?
Paolo Di Tommaso
@pditommaso
Apr 30 2015 16:32
as long it is not a channel yes
Michael L Heuer
@heuermh
Apr 30 2015 16:32
awesome, thank you!
Paolo Di Tommaso
@pditommaso
Apr 30 2015 16:33
did you manage to solve that problem you were reporting ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 16:42
@pditommaso That works. I just wondering if there was already some kind of hook space for that sort of thing.
Michael L Heuer
@heuermh
Apr 30 2015 16:42
not exactly, we've been pushing ahead with the slurm executor across multiple nodes and are having moderate success so far. I still want to circle back and try to find a test case that reproduces the issue later.
Paolo Di Tommaso
@pditommaso
Apr 30 2015 16:43
This message was deleted
Paolo Di Tommaso
@pditommaso
Apr 30 2015 17:40
@andrewcstewart Yes, you are right. That could be useful also for some kind of notification
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:28
hey @pditommaso , that also reminds me.. cleaning up docker containers might be is another step that might be useful to put there
although that could probably live in a processes' own hook script
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:29
do you mean docker rm <container>
?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:29
yeah
does that seem like the right place to put it? in the process after-script?
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:29
it's already done by default
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:29
ah
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:30
👍
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:30
Ive been running into a ton of docker related issues with disk space and inodes lately
not nextflow specific problems, but probably worth warning people about
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:30
using nxf ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:30
(yeah, but outside of nxf too.. it's all related to docker's devicemapper)
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:31
it sounds strange
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:31
I just wrapped up my final semester's work, so Im gunna try to find some time to write up that blog post for you
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:31
nextflow appends a docker rm at the end of the task script
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:31
so I can add some warnings about disk issues there
you're right, I see that now in the .command.run
the problems Im experiencing aren't necessarily unexpected
im running 100's of samples in parallel.. though regulated by SGE
but im running some pretty disk-intensive tasks within process containers
so its not surprising that I'm eating up all the inodes :D
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:33
what error does docker report ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:36
well there are several different ones ive been battling through
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:37
We have experienced kernel crash with Docker > 1.0
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:39
I just updated: Docker version 1.5.0, build a8a31ef/1.5.0
to incorporate some of the fixes mentioned in that issue ando thers
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:40
what kernel are you using (< 3.0 could be a problem) ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:41
2.6.32-504.1.3.el6.x86_64
centos 6.5
old, I know. IT insists. :/
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:42
same problem here :(
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:43
Ive been simultaneously testing on both local hardware and on ec2
so I can test in more recent kernels
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:43
what kind of testing ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:44
my nxf pipeline
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:44
Are you experiencing problems also with ec2 ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:44
totally different ones
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:45
strange on ec2 it works smoothly in my case
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:47
on the docker side, I believe our problems are more related to what we're trying to run within docker
most bioinformatics software is crap
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:47
:)
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:47
(except nextflow of course :D)
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:47
obviously!
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:47
but alignment and variant calling tools..
absolutely no consideration for resource utilization
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:48
have you tried to run some perf benchmarks native vs docker ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:48
to an extent
for some of these tools I just end up running them outside of docker
which makes the pipeline not as portable
but docker is still very young technology
so we're playing with fire :)
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:50
yep, that's also the reason why it's hot!
:)
I'm starting to have a look to rkt
it seems promising
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:50
rkt?
ohhhh rocket
coreos's thing
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:51
yep
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:51
yeah that looks cool
I need to hire some more engineers so we can try everything.
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:52
:)
Have you ever considered to open an office in Bcn ?
:)
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:53
I hear its a beautiful city
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:54
I can confirm that
Andrew Stewart
@andrewcstewart
Apr 30 2015 18:55
I want to go there some time and then do el Camino de Santiago.
Paolo Di Tommaso
@pditommaso
Apr 30 2015 18:56
it's very popular but to tell the true I've never been there
Andrew Stewart
@andrewcstewart
Apr 30 2015 19:51
@pditommaso you know another idea on the logs might be to just save them to s3 or something like that
Paolo Di Tommaso
@pditommaso
Apr 30 2015 19:52
what do you mean exactly ?
Andrew Stewart
@andrewcstewart
Apr 30 2015 19:59
similar to how -w s3:// can use s3 directly for working directory
could do something like -with-trace s3://
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:00
well, with log do you mean all the intermediate files produced in the pipeline workdir
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:00
no no sorry
just things like .nextflow.log or actually just the trace
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:01
I think the trace file could work in that way
not sure about the log, I need to check
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:01
i imagine itd be the same story with -work though right?
ie, you'd need to use the cluster manager that you're using..
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:06
no, the .nextflow.log cannot be written to s3 directly (due to internal log libraries..)
the trace file could, but I've just check and currently cannot.
But it's easy to patch it. I next version it will work
with a new shiny report ..
:)
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:07
shiny report?
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:08
something like this
timeline.png
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:09
that
is
awesome
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:09
LOL
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:10
I know you said that the -work s3:// wouldn't work outside of the context of the cluster manager you're using..
but would you entertain a pull request if I figured out aw ay?
(Id have to dust off my java skillz)
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:10
of course
we need committers :)
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:23
or i could bribe my local java devs
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:24
however nextflow is mainly developed in groovy
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:25
ah
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:25
it's much more fun than Java
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:25
well then maybe I can do it
hey.. the new s3 functionality with channels.. does that apply to writing out files too?
ie can I use a channel to write directly to S3 ?
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:26
yep
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:26
(rathe than doing so in a process )
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:26
yes
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:26
do you have an example of what that looks like?
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:29
files_channel.subscribe { file -> file.copyTo('s3://bucket/etc/...') }
Andrew Stewart
@andrewcstewart
Apr 30 2015 20:39
awesome
I wonder if its possible to write object metadata that way too
Paolo Di Tommaso
@pditommaso
Apr 30 2015 20:40
that would be nice but currently s3 tags are not implemented in the S3 file system adaptor
Andrew Stewart
@andrewcstewart
Apr 30 2015 21:30
hm
subscribe always hangs my session