These are chat archives for nextflow-io/nextflow

21st
Jul 2016
Mokok
@Mokok
Jul 21 2016 08:52 UTC

Hi there !
I just found NextFlow, and after having read the doc, it's seems to be a really promising product.
However some questions still remain...
1-How are tasks handled ? Does it inspect the in/out and then manage if a task need a for or a join to other tasks ?
2-In case of fork/join and using Torque/PBS, is the complete workflow sent to Torque/PBS (and torque manage it ; including the needed fork/join OR does it launch a giant task ?), OR does the fork/join occur on the NextFlow side, and each task can be assigned to any node (local, remoteSSH, Torque/PBS,...)
3-How goes the script buidling process ? Is there a way to import already written adaptable(with parameters) scripts ? (the purpose is to build specific workflow using several already existing "templates")
4-Is there a way to add information to the script (like metadata about what the script will deal with : information about in/out variables, process name, main project the process is part of,...)
5-Is there a wait to put conditions on in/out ?
I mean...a task needs inputs as "A&B or C and D and maybe E" like "if a is provided, be must be provided too, not C, and D must me of ThisSpecificType and E is optional; Otherwise C is provided, A and B mustn't be, and D is ThisOtherType and E take default value"

Thanks!
Regards

Paolo Di Tommaso
@pditommaso
Jul 21 2016 09:32 UTC
@Mokok A lot of questions! Thanks for your interest
1) Yes, the input/output declarations for each tasks define their dependencies, so their synchronisation is implicitly managed by the framework
2) no each tasks is submitted independently
3) Yes, see the template feature
4) Currently no, but other users are asking for that. It could be easily added
Paolo Di Tommaso
@pditommaso
Jul 21 2016 09:38 UTC
5) No, conditional inputs are not supported by design. Nextflow is based on the dataflow programming model that is inspired to a functional approach
Mokok
@Mokok
Jul 21 2016 09:57 UTC

Your welcome, interest is deserved

  • couldn't expect a quicker answer :o
    Thanks a lot

You've been so efficient i can't help asking you some more question (only one for now) :
About the GPL licensing (i admit i'm not familiar with yet) : if i understand....if i use NextFlow as a part of a bigger solution, can sell my solution without restriction (including that i'm free to give you part of benefits), but if i modify NextFlow (without consideration about the wrapping solution) i've to share it this new NextFlow version under GPL (without needing to publish the wrapping solution).

(Sorry if my english isn't really fluent )

Paolo Di Tommaso
@pditommaso
Jul 21 2016 10:01 UTC
GPL requires that if you modify nextflow and you distribute it, your version need to be GPL too and you are obliged to release the source code along with it.
You are free to sell and make money with it
If you modify it and keep it for yourself or your company you don't have any restriction.
Mokok
@Mokok
Jul 21 2016 10:05 UTC
Ok, the main "threat" was about licence "contamination" (there is not).
Thanks for clarifying
Paolo Di Tommaso
@pditommaso
Jul 21 2016 10:06 UTC
Actually we are thinking to change to LGPL that is more permissive
Mokok
@Mokok
Jul 21 2016 13:10 UTC
hello again
I'm testing several little things to evaluate how nextflow fits my requirements and i'm slowing down when approaching the use of PBS.
Is there any script-using-torque example available ? (btw i can't find the nextflow.config file)
Paolo Di Tommaso
@pditommaso
Jul 21 2016 13:14 UTC
create a file named nexflow.config with this content
process {
  executor = 'pbs'
  queue = 'the-name-of-the-queue-you-want-to-use' 
}
that's all
Mokok
@Mokok
Jul 21 2016 13:16 UTC
damn, i feel ashamed not to have tried it earlier, you guys made it too easy :)
thanks
Paolo Di Tommaso
@pditommaso
Jul 21 2016 13:17 UTC
yes, nextflow development was driven by a rebellion against complexity ;)
Mokok
@Mokok
Jul 21 2016 13:21 UTC
well done then
Mokok
@Mokok
Jul 21 2016 13:52 UTC
it works fine i just ran the multiple hello-world script and then a new question appears:
as you helped me to use Torque with NextFlow, i would like to know if there is a way to specify multiples executors; and if it is possible, how does nextflow chooses between several available executors
Paolo Di Tommaso
@pditommaso
Jul 21 2016 13:55 UTC
yes, a nextflow pipeline is composed by several processes
Mokok
@Mokok
Jul 21 2016 13:55 UTC
any kind of "best computation power", "closer through network", "closest fit of the tasks needs"
?
Paolo Di Tommaso
@pditommaso
Jul 21 2016 13:55 UTC
you can specify a different executor for each process
let's say you have a pipeline with two processes: A and B
in your config file you can write
process { 
  $A {
    executor = 'local'
  }
  $B {
     executor = 'pbs'
   } 
}
(better like this)
@Mokok
any kind of "best computation power", "closer through network", "closest fit of the tasks needs"
?
what do you mean ?
Mokok
@Mokok
Jul 21 2016 13:58 UTC

mh yes, i saw this in your doc,

I mean : can it be a dynamic process-to-executor assignment ?

where NextFlow ressource manager choose the best executor for a given task to execute
Paolo Di Tommaso
@pditommaso
Jul 21 2016 13:59 UTC
yes, but you will need to write the logic to assign them dynamically at runtime
Mokok
@Mokok
Jul 21 2016 13:59 UTC
really interesting :)
Paolo Di Tommaso
@pditommaso
Jul 21 2016 14:00 UTC
an interesting one is the ability to define task resource requirement at dynamically at runtime
Mokok
@Mokok
Jul 21 2016 14:01 UTC
yep saw this too, even handling error to expand the diskspace needs for example when retrying
Paolo Di Tommaso
@pditommaso
Jul 21 2016 14:01 UTC
it very common that the same task can have very different resource requirement e.g. memory, time, etc
thus you can define it depending the actual task input
exactly
Mokok
@Mokok
Jul 21 2016 14:02 UTC
the work you accomplished making NextFlow is just amazing
Paolo Di Tommaso
@pditommaso
Jul 21 2016 14:02 UTC
thank you :)
BTW what's your organisation (if I may ask) ?
Mokok
@Mokok
Jul 21 2016 14:05 UTC
as i'm in internship working on a pre-tender (no pun), i not sure revealing the company name without noticing my "referent" is a good idea
he's busy today, but he'll hear what i think about NextFlow ;)
Paolo Di Tommaso
@pditommaso
Jul 21 2016 14:06 UTC
OK, no problem
Mokok
@Mokok
Jul 21 2016 14:06 UTC
(and may contact you)
Paolo Di Tommaso
@pditommaso
Jul 21 2016 14:09 UTC
but please support the project starring it, if you find it useful
Tx!
Mokok
@Mokok
Jul 21 2016 14:12 UTC
You got my star ;)
Paolo Di Tommaso
@pditommaso
Jul 21 2016 14:12 UTC
well done, thanks
Raymond Lim
@raylim
Jul 21 2016 14:56 UTC
@mes5k that seems like a reasonable solution, thanks
Raymond Lim
@raylim
Jul 21 2016 15:11 UTC
is it possible to get the number of elements emitted by a channel with in a process?
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:13 UTC
you need just the number of elements or both i.e. the num and the elements ?
Raymond Lim
@raylim
Jul 21 2016 15:13 UTC
the number
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:13 UTC
ok, you can do something like this
Raymond Lim
@raylim
Jul 21 2016 15:14 UTC
sorry, actually both
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:14 UTC
process foo {
  input: 
  val c from yourChannel.count()
  """
  echo $c
  """
}
if so is a bit more complex, something like this should work
This message was deleted
process foo {
  input: 
  val c from yourChannel.tap{ items }.count()
  val i from items
  """
  echo item: $i
  echo count: $c
  """ 
}
(wait there was a typo)
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:19 UTC
so, what is happening is that with tap you create a copy of yourChannel that is used as a second input
than the count is applied to the original channel
Raymond Lim
@raylim
Jul 21 2016 15:26 UTC
a little complicated, especially since for me ,yourChannel emits a set (value + x number of files)
maybe I should just stick to my bash solution
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:27 UTC
does not change
the only difference is that you will need to use set instead of val in the input declaration
wait
yourChannel is supposed to bring files ?
Raymond Lim
@raylim
Jul 21 2016 15:28 UTC
an ID and files
set val(pairId), file(bams) from groupedBams
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:29 UTC
ok perfect
but you still want to execute as many jobs as are the number of pairs, right?
Raymond Lim
@raylim
Jul 21 2016 15:30 UTC
yes, one job for each set
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:31 UTC
process foo {
  input: 
  val c from groupedBams.tap{ items }.count()
  set val(pairId), file(bams) from items
  """
  your code here 
  """ 
}
so you have groupBams
tap redirect it to a new channel items which you will use in the same manner
Raymond Lim
@raylim
Jul 21 2016 15:32 UTC
that count includes the pairId?
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:32 UTC
count will report the number of <pairId, file> pairs
Raymond Lim
@raylim
Jul 21 2016 15:34 UTC
sorry, I meant that I want the count of just the files in the groupedBams
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:34 UTC
ah
much easier
process foo {
  input: 
  set val(pairId), file(bams) from groupedBams
  """
  the count is ${bams.size()}
  """ 
}
Raymond Lim
@raylim
Jul 21 2016 15:36 UTC
ah, ok that is much easier
thanks
Paolo Di Tommaso
@pditommaso
Jul 21 2016 15:36 UTC
welcome
Raymond Lim
@raylim
Jul 21 2016 16:15 UTC
getting an error from that actually
java.nio.file.NoSuchFileException: sorted.bam
which is the element in bams
maybe it's because bams is a single file in this particular case
Paolo Di Tommaso
@pditommaso
Jul 21 2016 16:18 UTC
It could
You will need a bit more code if so
I can send an example later
Raymond Lim
@raylim
Jul 21 2016 16:22 UTC
ok
also, I think this is a bug but if you attempt to merge two channels where one is empty, it throws an error
Paolo Di Tommaso
@pditommaso
Jul 21 2016 16:55 UTC
@raylim define in your script a function like the following
def count( files ) {
  files instanceof Path ? 1 : files.size()
}
then use count(bams) instead of bams.size()
however that exception is a bit weird, not sure that the single file is the problem
I think this is a bug but if you attempt to merge two channels where one is empty, it throws an error
can you provide an example showing that problem?
Raymond Lim
@raylim
Jul 21 2016 18:08 UTC
actually, I think it was because I was using merge instead of mix
Paolo Di Tommaso
@pditommaso
Jul 21 2016 18:10 UTC
ok