These are chat archives for nextflow-io/nextflow

22nd
Jul 2016
Mokok
@Mokok
Jul 22 2016 08:29

Hello

i'd like to know what are the NextFLow good practices in terms of UserPolicy...because using Torque, i can't use qsub as root. But using NextFlow, i get { ERROR ~ .nextflow.history (Permission denied) } when not using root/sudo

(i moved nextflow runnable in user/bin dir)
(and i'm lanching cmd through ssh sudo-granted user)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 09:01
@Mokok nextflow does need any special permission to submit jobs
simply use your user as usual
Mokok
@Mokok
Jul 22 2016 09:24
still get this permission denied on .nextflow.history
don't know why
Paolo Di Tommaso
@pditommaso
Jul 22 2016 09:25
can you open the .nextflow.log file and copy&paste the error stack trace?
Lukas Jelonek
@lukasjelonek
Jul 22 2016 09:30
@Mokok Do you have write permission in the directory where you are executing the pipeline, i.e. your $PWD?
Mokok
@Mokok
Jul 22 2016 09:31
i'm wondering if nextflow doesn't try to write where the nextflow runnable is located
checking
Paolo Di Tommaso
@pditommaso
Jul 22 2016 09:32
yes, it does
Mokok
@Mokok
Jul 22 2016 09:33
then that's it, i moved the runnable in user/bin instead of adding /pathToNextFlow to the $PATH
Paolo Di Tommaso
@pditommaso
Jul 22 2016 09:33
:+1:
Mokok
@Mokok
Jul 22 2016 10:10
i may have broke something on my VM, gonna built it up again, it shouldn't take long
Mokok
@Mokok
Jul 22 2016 12:02
well
windows debugger work on linux too
reboot and everything is ok
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:03
are you running it into a Linux VM ?
Mokok
@Mokok
Jul 22 2016 12:03
yeup
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:04
with a local installation of Torque ?
Mokok
@Mokok
Jul 22 2016 12:04
y
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:04
ok
Mokok
@Mokok
Jul 22 2016 12:05
everything is run locally on one linux VM, but managed with ssh (mobaxterm)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:05
ok, good for testing purpose
Mokok
@Mokok
Jul 22 2016 12:05
that's the purpose idd
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:06
are you testing also the containerisation with Docker ?
Mokok
@Mokok
Jul 22 2016 12:06
i've no idea where the bug went from but it works fine now
i did with one of your provided examples
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:07
:+1:
Mokok
@Mokok
Jul 22 2016 12:10
i read about the package-like managing of the scripts, and i'm really interested in. One of the needs is to be able to have some versioning behavior about the scripts, and to find whether if a script-to-modify is used or not in another script. Does your system already provide such a behavior ?
(the high end is to be able to know how changes can impact scripts, and if it stay stable, alert the user if it's not)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:12
not sure to have understood
however you can track any change in your scripts with Git
nextflow allows you to specify which (git) revision of your project you want to execute
Mokok
@Mokok
Jul 22 2016 12:13

script A is used by script B as
B {
something
do A
something
}

A is edited
i've to create a new version, and check about all the scripts using A remains stable

Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:13
yes
and?
Mokok
@Mokok
Jul 22 2016 12:14
kind of package dependencies
does NextFlow already provide it ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 12:14
you can use Docker to track your package dependencies
or Environment modules
Mokok
@Mokok
Jul 22 2016 12:17
mmm ok !
thks
Mokok
@Mokok
Jul 22 2016 13:37
re
i'd have some more questions for you ;)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 13:38
you are welcome
Mokok
@Mokok
Jul 22 2016 13:40
first one :
how are the data flows working ? i mean what appends in term of copy and transfer if there is a central data store somewhere, the NextFlow scheduler somewhere else and nodes somewhere else again (by somewhere i mean network related)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 13:42
dataflow is an in-memory data structure, it's not distributed in the cluster
basically your nextflow script works as a driver application in the launching machine orchestrating the task execution locally - or - distributed in the cluster when using PBS (or other executors)
/end
Mokok
@Mokok
Jul 22 2016 13:49

I'm not sure to understand.
Let's give an example:
a script that is not intended to be run locally - intended to be run on a other machine (using a remote PBS for example), need a file located on a central server.

NextFlow will send the script to the targeted executor, but what about the file ?

Paolo Di Tommaso
@pditommaso
Jul 22 2016 13:50
nextflow executes PBS jobs exactly in the same manner as you would do by using PBS qsub command
thus you will need to launch you nextflow app in the cluster head/login node where qsub is available
nextflow launch a qsub command for each task that needs to be executed
required files (scripts, data, etc) and distributed using the shared files system which is supposed to be used
Mokok
@Mokok
Jul 22 2016 13:56

ok, wanted to ask about that, because some Workflow Management Solutions provide temporary working dir allocated specially for the task to be executed and where all the data are copied from/to. And it's a pain in term of network load when large file are used.

in case the file is needed by several task/script that are in the worflow, does NextFlow notice it and create an implicit dependency/concurency ?

or place a lock on the file
Paolo Di Tommaso
@pditommaso
Jul 22 2016 13:58
yes, most of grid engines create a local scratch dir where the computing is supposed to be executed
by default do not use it, but if you add process.scratch=true in your config file
it copies the task outputs files from the node local storage to the shared working directory
currently inputs are not copied to the node local storage but there's a request for that
nextflow-io/nextflow#197
Mokok
@Mokok
Jul 22 2016 14:03
ok thks !
Mokok
@Mokok
Jul 22 2016 14:10
let's go for the second question :smile:
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:10
:)
Mokok
@Mokok
Jul 22 2016 14:10
what can you tell me about optional inputs ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:11
it depends what you need exactly
formally are not supported because by definition a task is triggered when all inputs are delivered
btw some tricks are possible depending a specific use case
Mokok
@Mokok
Jul 22 2016 14:14
task T need input A and optional B
if A doens't exist then error
if B doesn't exist then use A and do X
if B exists, uses both A and B and do Y
for example*
is it possible, and how does the workflow scheduling react to such a case (if possible)?
(or is it simple like "get B; if B==null then ...")
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:17
well, a way could be using a composite input type that holds both A and B values
Mokok
@Mokok
Jul 22 2016 14:18
(the purpose is not to wait for optional inputs if they are not existing when the time comes to run the task)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:18
thus in the task you would declare a single input but that will received one or both of them
Mokok
@Mokok
Jul 22 2016 14:18
idd, it may be a good way
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:19
it sounds interesting, it could be an idea for a feature to implement
Mokok
@Mokok
Jul 22 2016 14:21
btw, couldn't it be achieved by a default-value mechanism ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:22
yes, I was thinking something like that
Mokok
@Mokok
Jul 22 2016 14:22
brb in minutes
sry
Mokok
@Mokok
Jul 22 2016 14:37
re
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:43
?
Mokok
@Mokok
Jul 22 2016 14:43
didn't find anything about default input value
+
if a channel doesn't contains data, isn't it .empty() ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:45
the ifEmpty could be used for that
you need to think channel as streams
so it's empty when the only item emitted is the closing signal
(that's managed automatically by the framework)
Mokok
@Mokok
Jul 22 2016 14:47
yes, "contains" wasn't a well chosen word
but in case the previous task ended in error (and we dumbly forgot to manage it with errorStrategy), doesn't it throw an exeption as the channel is in error/doesn't exist ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:48
not clear
Mokok
@Mokok
Jul 22 2016 14:49
i mean...
task A output X, task B wait for X.
task A ends in error
how goes B and X
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:51
by default when a task returns an error pipeline execution is interrupted
Mokok
@Mokok
Jul 22 2016 14:51
nevermind, the process will end immediately
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:51
exactly
Mokok
@Mokok
Jul 22 2016 14:51
just found it, sorry ^^
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:51
you can choose to ignore the error
but in any case the failing task won't produce any output, thus downstream tasks will be not executed
Mokok
@Mokok
Jul 22 2016 14:52

may be interesting in some case

third and last question for now that is quite related

Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:53
ok
Mokok
@Mokok
Jul 22 2016 14:53
if C wait for A and B, how C know about execution status of A and B
i suppose we can send it through a channel, but maybe is there an existing mechanism ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:54
that's how dataflow works
C will wait for some inputs, it doesn't know about A and B
when it receives all the expected inputs it's execution is triggered
behind there's a complex mechanism based on asynchronous queues and java threads
Mokok
@Mokok
Jul 22 2016 14:57
so the only ways for reacting to given execution status is to manage it from the task validExitStatus+errorStrategy OR to send the execution status through a channel for the next task to know about it, am i right ?
Paolo Di Tommaso
@pditommaso
Jul 22 2016 14:58
tasks execution status is not propagated to downstream tasks
you can just send to a task a set of inputs
these inputs can be other task outputs
or you can create some channels that feed a process as you need
not sure it's clear
Mokok
@Mokok
Jul 22 2016 15:01
yep that was what i meant by "sending through a channel" (ie. store it as variable, output it, and being an input of the next task)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 15:01
yes
Mokok
@Mokok
Jul 22 2016 15:03
I'm glad we understand each other ^^
Paolo Di Tommaso
@pditommaso
Jul 22 2016 15:03
ahaha
me too
Mokok
@Mokok
Jul 22 2016 15:04
thanks for these answers and the time you spend to answer my questions
Paolo Di Tommaso
@pditommaso
Jul 22 2016 15:04
you are welcome
hope you will find nextflow useful
Mokok
@Mokok
Jul 22 2016 15:06
i analyzed several solution, and yours is the best so far in terms of fitting my requirements
so, i do find it useful
Paolo Di Tommaso
@pditommaso
Jul 22 2016 15:06
great
what other tools have you tried ?
Mokok
@Mokok
Jul 22 2016 15:07
Luigi, Proactive Scheduler (and not yet SnakeMake)
Paolo Di Tommaso
@pditommaso
Jul 22 2016 15:07
I don't know Proactive Scheduler
about Snakemake you may find this useful
Mokok
@Mokok
Jul 22 2016 15:09
Activeeon Proactive Workflow : http://www.activeeon.com/
thanks for sharing
i will give it a look....next week :D
Paolo Di Tommaso
@pditommaso
Jul 22 2016 15:10
enjoy the week-end
Mokok
@Mokok
Jul 22 2016 15:10
thanks, you too