These are chat archives for nextflow-io/nextflow

12th
Oct 2016
Lukas Jelonek
@lukasjelonek
Oct 12 2016 06:30
Hey, I have a process that may or may not create an output file. Is it possible to exclude these empty results from being further processed? Currently I set the errorStrategy to ignore, but this leads to a warning and ignores more than I'd like to, especially error codes from the tool.
Paolo Di Tommaso
@pditommaso
Oct 12 2016 07:28
@lukasjelonek currently it's not possible but we were talking yesterday to add an optional flag
let me see what I can do
Lukas Jelonek
@lukasjelonek
Oct 12 2016 08:13
That would be great.
Mokok
@Mokok
Oct 12 2016 08:36
hi @pditommaso ! Is there an easy way to find all the processes that a nextflow run involves in case of local executor ? (is there a special tag ? a range of pid ? something to grep in a 'ps' cmd ?)
Paolo Di Tommaso
@pditommaso
Oct 12 2016 08:40
you can use nextflow log <run name>
Mokok
@Mokok
Oct 12 2016 09:12
thk i gonna give a look
Paolo Di Tommaso
@pditommaso
Oct 12 2016 09:15
it prints the process workDirs
but you can specifies what data you need
eg. nextflow log last -f name
check also
nextflow log -l
Mokok
@Mokok
Oct 12 2016 09:22
mh but it only works after the job is done, right ? (i'd like something like a 'qstat' whenever i want while it runs)
Paolo Di Tommaso
@pditommaso
Oct 12 2016 09:22
not at this time
contributions are welcome :)
Mokok
@Mokok
Oct 12 2016 09:23
ok :D
Paolo Di Tommaso
@pditommaso
Oct 12 2016 12:19
just released version 0.22.3
@lukasjelonek I've added an option to handle optional output files, eg.
process foo {
  output: 
  file 'missing.txt' optional true into something
  :  
}
Mokok
@Mokok
Oct 12 2016 12:28
so how does it react when a task has this "something" channel as input and the file has not been produced ??
Paolo Di Tommaso
@pditommaso
Oct 12 2016 12:29
it does not .. !
Mokok
@Mokok
Oct 12 2016 12:32
nice, you rock ! (Out of curiosity, does it works as an ignored "missing error" or all the pieces of code depending on this optional data are ignored ? is it logged ?)
Paolo Di Tommaso
@pditommaso
Oct 12 2016 12:33
not sure to understand what do you mean
the command is supposed to return a zero exit status in any case
but you have the option to ignore a not existing file
Mokok
@Mokok
Oct 12 2016 12:40

I was just curious about how you handled it.
For example:
process bar {
input:
file 'missing.txt' from something_optional
val x from an_other_channel

 """
cat $missing.txt
echo $x
 """

}

is the 'cat' line not run because it depends on a channel that is tagged as optional ? (echo is run anyway)

erk sorry for the ugly rendering ^^
Mokok
@Mokok
Oct 12 2016 12:48
(it was only to feed my curiosity, you can still say it's black magic :) )
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:06
em, no you can set optional only on input files ..
Evan Floden
@evanfloden
Oct 12 2016 13:18
@pditommaso you mean output no?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:18
oops, yes!
Evan Floden
@evanfloden
Oct 12 2016 13:18
Cool, just making sure I am following!
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:19
my fault, I'm falling asleep :grin:
let me update faq
Félix C. Morency
@fmorency
Oct 12 2016 13:47
hi
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:48
hi
Félix C. Morency
@fmorency
Oct 12 2016 13:49
I have a Dockerized slurm cluster and I want to play with the Nextflow docker support. How will Nextflow handle the Docker-in-Docker thing?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:50
um, nextflow does not provide any special support for d-in-d
it only needs the dockerclient in the running environment
said that as far as I know the most common way to handle d-in-d is to install the docker client in your container(s)
Félix C. Morency
@fmorency
Oct 12 2016 13:52
https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/ recommends exposing the docker socket to the container. Do you know if that would work in that context?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:52
I was exactly referring that
never tried bit it should work
Félix C. Morency
@fmorency
Oct 12 2016 13:53
Ok. More tests for me then :)
Paolo Di Tommaso
@pditommaso
Oct 12 2016 13:53
enjoy ! :)
Félix C. Morency
@fmorency
Oct 12 2016 14:52
mmm do i need to share my nfs directory with the docker container used by the nextflow pipeline as well?
nextflow is complaining about a .command.sh that is not found
Paolo Di Tommaso
@pditommaso
Oct 12 2016 14:55
a bit complex configuration
can I see the error message
Félix C. Morency
@fmorency
Oct 12 2016 14:56
Paolo Di Tommaso
@pditommaso
Oct 12 2016 14:57
wait, I think it's not finding /opt/imeka/dummy2.sh
Félix C. Morency
@fmorency
Oct 12 2016 14:59
mmm it works well when I use the local executor. and this script is located inside the docker container
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:00
also in the container running the slurm daemon ?
Félix C. Morency
@fmorency
Oct 12 2016 15:01
no, only in the image process.$bar.container = 'imk_dummy2'
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:03
are u adding the -with-docker option on the nextflow cmd line?
Félix C. Morency
@fmorency
Oct 12 2016 15:03
no. im using a configuration file containing docker.enabled = true
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:03
I see
what is the output of cat /slurm-data/dummy/work/ce/9fd2351df8cad63d64555e3d101ed7/.command.run
Félix C. Morency
@fmorency
Oct 12 2016 15:05
This message was deleted
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:08
it looks fine
frankly I don't know if it's a problem with the mount of /slurm-data/dummy/work/ce/9fd2351df8cad63d64555e3d101ed7/ thus it cannot find .command.sh
or if it cannot find dummy2.sh
you should be able to launch that container in interactive mode and see what it's wrong
docker run -it -v /slurm-data/dummy/work/ce/9fd2351df8cad63d64555e3d101ed7:/slurm -data/dummy/work/ce/9fd2351df8cad63d64555e3d101ed7 -v "$PWD":"$PWD" -w "$PWD" 10.10.103.21:5000/imk_dummy2 bash
Félix C. Morency
@fmorency
Oct 12 2016 15:11
will try that and debug
dummy2.sh is there and i can run it
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:15
can u access .command.sh ?
Félix C. Morency
@fmorency
Oct 12 2016 15:17
interesting. the .../work/ce/9fd2351df8cad63d64555e3d101ed7 is empty in the container
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:17
so that's the problem
Félix C. Morency
@fmorency
Oct 12 2016 15:17
but it's not empty in reality.
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:17
that's a d-i-d container right?
Félix C. Morency
@fmorency
Oct 12 2016 15:18
yes. I expose the docker socket of the host on the slurm node, but not in the nextflow container
might be the problem. let me try
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:19
I guess so
Félix C. Morency
@fmorency
Oct 12 2016 15:24
still empty
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:25
I have no experience with convoy
is this supposed to be your prod configuration?
Félix C. Morency
@fmorency
Oct 12 2016 15:25
will be at some point. im currently evaluating the possibility of porting our pipeline to nextflow
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:27
d-i-d creates a lot of security concerns
Félix C. Morency
@fmorency
Oct 12 2016 15:28
even by exposing the docker socket?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:28
in particular !
looking a blog post about that ..
Félix C. Morency
@fmorency
Oct 12 2016 15:32
it seems there are issue with NFS volumes. see docker/docker#4213
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:32
ah
Félix C. Morency
@fmorency
Oct 12 2016 15:33
i like the idea of having a dockerized cluster and having one docker image per pipeline node
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:35
I agree, the weak point IMO in this config is the convoy-nfs
under heavy load you would need a robust files system
Félix C. Morency
@fmorency
Oct 12 2016 15:36
im using this atm because im also evaluating rancher and the convoy-nfs stack came with it. easy way of sharing volumes between container stacks
this is not a final component
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:36
makes sense
Félix C. Morency
@fmorency
Oct 12 2016 15:37
im interested in your findings on d-in-d security concerns
though docker security increased a lot recently, not sure that post it's still relevant
Félix C. Morency
@fmorency
Oct 12 2016 15:39
thanks. bookmarked
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:40
welcome
Félix C. Morency
@fmorency
Oct 12 2016 15:40
do you have any fs recommendation that i should look into?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:41
give a look to Ceph
the best one quite surely is Lustre, but it's for HPC storage, not sure it fits your deployment scenario
is it an on-premises cluster or in the cloud ?
Félix C. Morency
@fmorency
Oct 12 2016 15:45
will be in-house for the moment. cloud is too expensive for our usage
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:47
I guess so, I was asking because aws provides its own nfs solution
Félix C. Morency
@fmorency
Oct 12 2016 15:47
efs?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:47
yep
Félix C. Morency
@fmorency
Oct 12 2016 15:48
yeah it looks great.. and is supported by convoy ;)
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:48
well, we are using it and the good news is that you won't need convoy nor slurm
Félix C. Morency
@fmorency
Oct 12 2016 15:50
man that looks cool
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:51
thanks!
Félix C. Morency
@fmorency
Oct 12 2016 15:51
this is what I like about nextflow - you can change the underlying technology (fs, scheduler, etc) with very minor modification
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:52
yes, this is a core idea
surely it's not optimised as it can be a MPI app, but you gain a lot in portability
Félix C. Morency
@fmorency
Oct 12 2016 15:58
just tried running my nested container using --volumes-from and i can see the files
Paolo Di Tommaso
@pditommaso
Oct 12 2016 15:59
I see
you can add that by using the runOptions setting
Félix C. Morency
@fmorency
Oct 12 2016 16:01
it requires the container id as a parameter
Paolo Di Tommaso
@pditommaso
Oct 12 2016 16:02
um, not so good
Félix C. Morency
@fmorency
Oct 12 2016 17:15
is there a way to avoid outputting the -v path/to/work/id/:path/to/work/id/ in the .command.run?
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:17
um no, that's needed to mount the task work dir
Félix C. Morency
@fmorency
Oct 12 2016 17:17
i would like to use --volumes-from instead but giving it as a runOptions doesn't work per-say. i think the -v override ^
i did a dummy "data container" and would like to try the --volumes-from alternative to -v. it works in interactive mode so far
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:19
I don't think the -v overrides it, are you sure?
it should be possible to mount both data and host volumes
Félix C. Morency
@fmorency
Oct 12 2016 17:21
you're right
the -v "$PWD":"$PWD" is the culprit
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:23
ah
but without that you won't be able to access the task work dir
Félix C. Morency
@fmorency
Oct 12 2016 17:26
it should from the --volumes-from
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:28
but that means using a data volume instead of host mount
would that be a convoy volume?
Félix C. Morency
@fmorency
Oct 12 2016 17:29
correct. shared data is mounted via data volume (where I can see my files) instead of host mount (where nothing works)
yes, it is
i created a third container with only the convoy volume and i can mount this correctly in my dind containers using --volumes-from
its the only way where i can see my files so far
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:32
this is a corner case, and I'm a bit skeptic on the scalability of this approach
Félix C. Morency
@fmorency
Oct 12 2016 17:32
you are probably right. i just wanted to test this to see if it works :)
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:32
however if you can detail exactly how is the docker cmd line you are expecting I can try to figure out how to integrate it
BTW I've noticed that in the docker images you are using there was a gluster config
Félix C. Morency
@fmorency
Oct 12 2016 17:35
yes but its disabled
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:36
why not re-enabling it?
IMO would be a better alternative to docker volumes
Félix C. Morency
@fmorency
Oct 12 2016 17:36
cool. i guess i could go read on this. never played with glusterfs
thanks for your time
Paolo Di Tommaso
@pditommaso
Oct 12 2016 17:38
welcome
Félix C. Morency
@fmorency
Oct 12 2016 20:09
it also works using named volume instead of inline mount points
Paolo Di Tommaso
@pditommaso
Oct 12 2016 20:10
um, but named volumes cannot be mount across multiple hosts, right?
Félix C. Morency
@fmorency
Oct 12 2016 20:11
i just tested and i can see the files
ah you mean hosts... umm
Paolo Di Tommaso
@pditommaso
Oct 12 2016 20:12
yep
Félix C. Morency
@fmorency
Oct 12 2016 20:22
because i use convoy-nfs it is accessible across all my hosts. an agent runs on each host and i can see the mountpoints for the same named volume on my two hosts
Félix C. Morency
@fmorency
Oct 12 2016 20:31
it make sense in the convoy world