These are chat archives for nextflow-io/nextflow

12th
Aug 2016
Johan Viklund
@viklund
Aug 12 2016 06:51
Is there anyway I can get the path to main.nf (like workflow.projectDir works for the github integration) but when running the workflow like this nextflow run ../otherdir/main.nf?
Paolo Di Tommaso
@pditommaso
Aug 12 2016 07:56
yes $baseDir
Johan Viklund
@viklund
Aug 12 2016 07:56
nice, thx
Paolo Di Tommaso
@pditommaso
Aug 12 2016 07:57
welcome
Johan Viklund
@viklund
Aug 12 2016 07:57
didn't find that in this table https://www.nextflow.io/docs/latest/metadata.html
is there a reason to have both baseDir and projectDir?
Paolo Di Tommaso
@pditommaso
Aug 12 2016 08:00
I was wondering the same ..
Johan Viklund
@viklund
Aug 12 2016 08:00
:D
Paolo Di Tommaso
@pditommaso
Aug 12 2016 08:00
:)
I need to dig a bit in the code but baseDir is by def the path where the main script is located
projectDir I think is defined only when the project is cloned by a git repo
Johan Viklund
@viklund
Aug 12 2016 08:04
and in that case it is equal to baseDir
Paolo Di Tommaso
@pditommaso
Aug 12 2016 08:04
yes
Paolo Di Tommaso
@pditommaso
Aug 12 2016 08:34
I check and yes, projectDir is meant to report the path the the project has been cloned
at the same time it's supposed to be the same as baseDir
currently projectDir reports null when running just a script instead of a project repo, but it doesn't make much sense
I've just patched it so in the next version it will report the script path in any case
anyway in your case just use baseDir
Paolo Di Tommaso
@pditommaso
Aug 12 2016 12:45
Nextflow Friday release is out
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 12:56
Indeed it is Friday afternoon - maybe that is why I can not cope with my problem
Paolo Di Tommaso
@pditommaso
Aug 12 2016 12:56
and middle august !
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 12:56
I have a directory with 100s of files, and a list of patterns
How can I make a process that uses only a subset of files - matching only to the actual pattern, and iterate through all the patterns
Paolo Di Tommaso
@pditommaso
Aug 12 2016 12:58
and iterate through all the patterns
what do you mean ?
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 12:58
august is fine
Paolo Di Tommaso
@pditommaso
Aug 12 2016 12:58
not if you work in front of the beach ! ;)
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 12:59
let's say [ foo, baz, bim] is a list of patterns, and some of the files in the dir matches only one of them
no beach, though we are not short of water here
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:00
What if you filter them?
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 13:04
I have a process that is expecting a list of files: I want to run my process N times, where N is the number of patterns
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:04
N times for each file?
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 13:05
the other way around
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:05
well, is not the same ?
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 13:06
OK, it is maybe simpler if I give the actual problem, I have VCF files, obtained them by spreading the variant call process through intervals
and now I want to merge them
I have a set of VCFs for the primary tumor, an other for relapse1 , yet other for relapse2...relapseN (that is unlikely, but I do not know N)
so I have to merge first the primary, and the relapse1, etc
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:08
I would approach in this way: create two channel from the same list of files
the first channel emits just the files
the channel extrapolate the pattern
then make a cartesian product of the two with the cross operator
the resulting channel will feed your process
does it make sense ?
Szilveszter Juhos
@szilvajuhos
Aug 12 2016 13:13
will give a try, it is at least similar what i need
Mokok
@Mokok
Aug 12 2016 13:46
Hi @pditommaso
Is the afterScript() executed even if the task ends in error ?
What if the snippet executed in afterScript() end in error ?
Can I use task variables as snippet parameters ?
Where is executed the snippet ? (nextFlow side or executor side ?)
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:47
Is the afterScript() executed even if the task ends in error ?
yes
What if the snippet executed in afterScript() end in error ?
ignored
Can I use task variables as snippet parameters ?
NO
Where is executed the snippet ? (nextFlow side or executor side ?)
same as process script
Mokok
@Mokok
Aug 12 2016 13:51
nice answer, but for the param's :)
But....i could use 'env' and use such environment var in the snippet, right ?
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:52
you can use script global var, not task local ones
Mokok
@Mokok
Aug 12 2016 13:55
not sure what to understand
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:56
global_var_x = 1 

process foo {
   afterScript " your script can use $global_var_x "
   input: 
   val local_x from ...
   """
   echo $local_x
   """
}
I'm sure this clarify
Mokok
@Mokok
Aug 12 2016 13:57
hoooo ok, didn't even thought about that ^^
was thinking about :
process foo {
afterScript " your script can use $local_x "
input:
env local_x from ...
"""
echo $local_x
"""
}
Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:58
no
Mokok
@Mokok
Aug 12 2016 13:59

okey, but it's fine since i still can manage this global var

thanks

Paolo Di Tommaso
@pditommaso
Aug 12 2016 13:59
welcome !
Mike Smoot
@mes5k
Aug 12 2016 17:22
@pditommaso What do you mean by "different storage driver" ?
Paolo Di Tommaso
@pditommaso
Aug 12 2016 17:23
Docker implements different drivers to manage to container file system
it's a kind of esoteric topic, there's not so much documentation about that
Mike Smoot
@mes5k
Aug 12 2016 17:39
Ah yes, I understand. We've been using devicemapper, but we haven't explored too much with others. For the time being I think we've found a workaround using docker exec rather than docker run, but I fear that approach is going to lead to other issues... Maybe I need to explore conda as an alternative...
Paolo Di Tommaso
@pditommaso
Aug 12 2016 17:40
conda? docker exec? was not related to nextflow this problem?
Mike Smoot
@mes5k
Aug 12 2016 17:44
The root of the problem is that we're using docker containers to isolate different versions of tools. However, if nextflow spawns too many docker containers at once with docker run, we're getting strange docker errors (docker ticket linked above). So instead of using docker run, we're docker exec'ing into an already running containers for each process to run its command. This seems to work, but I fear unforeseen problems. Since the root of the problem is using docker to wrap tools, I'm wondering if conda might be a better approach. That's all. No problem with nextflow, I was just curious if anyone had seen this problem.
Paolo Di Tommaso
@pditommaso
Aug 12 2016 17:46
I see, but I'm just curious how you are docker exec'ing into an already running containers nextflow processes ?
Mike Smoot
@mes5k
Aug 12 2016 17:49
Well, instead of using a container directive, we call a shell script that checks for the presence of a running container and starts the container if it doesn't exist and then runs what we'd normally do with docker run with docker exec.
Paolo Di Tommaso
@pditommaso
Aug 12 2016 17:50
big hack! :)
since the issue you are reporting seems related to device mapper, I would give a try to overlay or overlay2 device mapper instead
it's just a matter of specifying it on the docker daemon command line
Mike Smoot
@mes5k
Aug 12 2016 17:52
Yes, and that's what scares me! :) However, this actually solves two problems for us. First is the docker bug noted above, but second is the slow startup time for some docker containers. We've got a few large (e.g. 3GB) docker containers that take several seconds to start and/or stop. Shrinking those containers would be challenging, so here we are with docker exec.
I agree that we should try overlay/overlay2.
Paolo Di Tommaso
@pditommaso
Aug 12 2016 17:53
but are u using a different image for each process/step in your pipeline?
Mike Smoot
@mes5k
Aug 12 2016 18:00
Yes and sometimes multiple images within a process/step e.g. samtools view run in one container and some other processing run in a separate container to avoid the IO associated with giant files.
Lots of piping happening.
Paolo Di Tommaso
@pditommaso
Aug 12 2016 18:02
I see, I'm not fond to this pattern
Mike Smoot
@mes5k
Aug 12 2016 18:09
I'm not either! This is causing us huge headaches, but the alternative is creating several hundred GB temp files (e.g. the output of samtools view). I would love to hear a workaround if you've got one.
I guess the other alternative is creating fat docker images that contain whatever combinations of tools we need.
Paolo Di Tommaso
@pditommaso
Aug 12 2016 18:26
yes exactly, I prefer to create a single fat docker image containing everything is need by the pipeline
you have a little overhead downloading it but it's much easier to keep the dependencies under control
also you said that you are running your pipeline on a single machine, thus the docker image size should not be a big problem
but this approach would allow you to launch all the pipeline execution within the container
that would solve your problem
Mike Smoot
@mes5k
Aug 12 2016 18:44
Interesting! We've been working hard at keeping docker containers minimal, but maybe we'll just create a big fat one. Thanks for the advice!