These are chat archives for nextflow-io/nextflow

4th
Mar 2015
Andrew Stewart
@andrewcstewart
Mar 04 2015 19:55
Looks like setting NXF_WORK is only possible via env variable? Is it possible to set this somehow in nextflow.config ?
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:21
No, with the config file is not possible
however you can set it by using the -w command line option
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:28
ah
Is there a list of all the command line options that aren't listed in nextflow -h ?
im at 0.11.2.2449.. I think thats the latest
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:38
um, I think you are looking for nextflow <command> -h
$ nextflow run -h
Launch a pipeline execution
Usage: run [options] name of pipeline to run
  Options:
    -E
       Exports all the current system environment
       Default: false
    -cache
       enable/disable processes caching
       Default: true
    -e.
       Add the specified variable to execution environment
       Syntax: -e.key=value
       Default: {}
    -h
       Print command usage
       Default: false
    -hub
       Service hub where pipeline is hosted - It can be either 'github' or
       'bitbucket'
       Default: github
    -latest
       Pull latest changes before run
       Default: false
    -lib
       Library extension path
    -process.
       Set process default options
       Syntax: -process.key=value
       Default: {}
    -qs, -queue-size
       Max number of processes that can be executed in parallel by each executor
    -resume
       Execute the script using the cached results, useful to continue
       executions that stopped by an error
    -r, -revision
       Revision of pipeline to run (either a branch, tag or commit SHA number)
    -test
       Test function with the specified name
    -user
       Private repository user name
    -with-docker
       Enable process execution in a Docker container
    -with-drmaa
       Enable DRMAA binding
    -with-trace
       Trace execution to the specified file
    -without-docker
       Disable process execution with Docker
       Default: false
    -w, -work-dir
       Directory where intermediate results are stored
       Default: work
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:40
ahhhhh
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:40
no, the latest version is 0.12.3
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:41
I better upgrade!
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:41
yes!
:)
it's free!
:)
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:42
for now
:D
so when nextflow creates a new directory under work for a process, is that dir being created by the current user?
(Im running into some umask-ish type problems while trying to relocate my nextflow work directory to a different NFS mount)
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:45
yes, of course
um
are you using docker, right?
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:46
its not a nextflow problem, but just trying to understand exactly what nextflow is doing during process setup
yes
and that's where the errors are occurring
(the process inside of docker trying to write to the mounted working directory which is physically located on this nfs mount)
Im just not sure why it didn't have the same problem when it was writing to my $HOME/work
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:48
you have to take in account that docker process run as root
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:48
(which is also on an nfs mount, thought different drive)
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:48
thus the files it creates have root permission
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:49
the problem is with root@docker even being able to write in the first place though
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:49
at the end you get an error ?
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:50
I ran a couple experiments and get two different errors, depending on what im trying
Error response from daemon: Cannot start container X: lstat /nfs/nextflow/work/60: permission denied
which just means docker can't even mount that location to a new container
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:51
I would suggest to add process.scratch = true in your config
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:51
then the other one I get will be something like "EXITING because of FATAL ERROR: could not create output file: ./Log.out" (the process im running generates Log.out)
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:53
when doing that the docker is mounted to temp directory in the node local storage
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:54
I dont think that's an option. The files involved here are fairly large (and there are lots of them).. the whole point of moving work to this nfs mount was to get them off the local drive
I need a sys admin
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:55
I see, I think that
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:56
(btw, I'm hiring, if you know anyone in the US west coast :D)
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:56
I saw that :)
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:58
Ah I think I might understand whats going on.. somewhat
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:58
most of the people I know are on the EU west cost ;)
what do you mean ?
Andrew Stewart
@andrewcstewart
Mar 04 2015 20:59
so I know that I can't change permissions via sudo on these nfs mounts unless im on the host that's actually sharing them
and as docker is running as root, its encountering the same problem
Paolo Di Tommaso
@pditommaso
Mar 04 2015 20:59
ah
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:00
I thnk at least
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:00
welcome
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:01
Thats basically what you were saying before I think?
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:03
yes, though I think it should be possible to manage the docker container being able to write in the NFS
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:04
Right. And that's what's confusing me, because my homedir is also on nfs.
nextflow creates a new directory under work with 'drwxr-xr-x' .. and docker can write files to that directory
but when that happens on this other nfs mount, docker can't
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:05
I think it depends how users/permission are managed at network "level"
but I'm not a sysadm
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:06
me neither :D
ok, at least I have a better sense of what to talk to our sysadmin about
(we have one, but he can't spend a ton of time on us)
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:07
they are a limited resource
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:10
haha
Ive got an idea
ill just keep writing to my home directory until it hoses that drive
then he'll be FORCED to address my issues
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:12
LOL
it sounds a good plan
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:13
interesting..
so im running this pipeline via SGE as well
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:14
what do you mean ?
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:14
I have a default all.q and I also created a small.q to limit the number of high memory jobs at any given time
(using the SGE executor)
so ChannelA -> ProcessA @ small.q -> ChannelB -> ProcessB @ all.q
(make sense?)
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:16
you can do that, if it makes sense in your pipeline
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:16
as jobs are finishing in ProcessA, I would have expected to see the outputs of those jobs appear in ProcessB
(so really this is a channels question, not an SGE question)
does ProcessB not kick off until all of ProcessA is done?
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:18
no, ProcessB is started as soon there's a set of inputs available
if the only input is form ProcessA, it is triggered as soon ProcessA emits an output
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:19
I wonder what the holdup is
Could it have something to do with the fact that the input for ProcessB is actually ChannelB.groupTuple() ?
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:20
ay, this change everything
because, to group them it needs to collect all of them, thus wait for ProcessA termination
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:24
ah dang
I wonder if there's another way of doing it
im running a bunch of alignments and then collecting the alignments from the same read pairs / samples
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:28
I'm thinking, not so easy ..
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:30
trying to think of what could be used as a trigger for a 'completed set'
probably the simplest thing to do would be to just lump all fastq files for a single sample into the same channel element
you lose parallelization within that set...
but you gain it outside of sample sets then at least
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:34
basically, parallelise per sample, instead of read set
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:34
if using EC2 or something , it would at least be possible to parallelize the alignments within the sge job too
right
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:34
it makes sense
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:46
I think what im looking for is like.. "Channel arrays"
or sub-Channels
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:46
!
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:48
haha
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:48
a channel can handle any object, so you can use for list of items
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:48
right but then I have to introduce pipeline logic into the process logic
which makes the process code not so tidy
Paolo Di Tommaso
@pditommaso
Mar 04 2015 21:50
um, why don't you share a skeleton of your pipeline, otherwise it's difficult to discuss about it
Andrew Stewart
@andrewcstewart
Mar 04 2015 21:50
Right
Andrew Stewart
@andrewcstewart
Mar 04 2015 22:07
Ok