These are chat archives for nextflow-io/nextflow

9th
Nov 2016
Johan Viklund
@viklund
Nov 09 2016 08:15
...
Paolo Di Tommaso
@pditommaso
Nov 09 2016 08:17
trump.jpg
good luck guys
Phil Ewels
@ewels
Nov 09 2016 08:32
Good luck all of us 😫
Phil Ewels
@ewels
Nov 09 2016 11:32
Is it possible to have conditional directives? I'd like to set publishDir to save results only if params.saveResults is set...
Paolo Di Tommaso
@pditommaso
Nov 09 2016 11:35
umh, a bit tricky but it should be possible
publishDir {  params.saveResults ? 'target_path' : false }
I didn't try but should work
Maxime Garcia
@MaxUlysse
Nov 09 2016 11:36
Nice
Phil Ewels
@ewels
Nov 09 2016 11:44
Brilliant, thanks!
Lots of stuff to test in this change anyway, so we'll try it
(this is actually for auto-generating a reference index, saving for future use)
This ok?
publishDir { params.saveReference ? "${params.outdir}/reference_genome", mode: 'copy' : false }
Needs extra squiggly brackets for the mode bit maybe
publishDir { params.saveReference ? { "${params.outdir}/reference_genome", mode: 'copy' } : false }
Maxime Garcia
@MaxUlysse
Nov 09 2016 11:47
I would try that:
publishDir { params.saveReference ? "$params.outdir/reference_genome", mode: 'copy' : false }
Phil Ewels
@ewels
Nov 09 2016 11:55
How is that different to the first one I posted?
Maxime Garcia
@MaxUlysse
Nov 09 2016 11:56
Removed the {} around params.outDir
Johan Viklund
@viklund
Nov 09 2016 11:56
I have a very strange nextflow error currently:
WARN: Duplicate output channel name: '/pica/h1/rajohvik/Work/wgs-structvar/test_data/CEP-1-7.clean.dedup.recal.chr21-2.fq.gz' in the script context -- it's worth to rename it to avoid possible conflicts
I have no channels called that (that's one of the input filenames)
Maxime Garcia
@MaxUlysse
Nov 09 2016 11:56
@viklund If you update nextflow you'll get a better error message
Johan Viklund
@viklund
Nov 09 2016 11:56
ah, thx
Maxime Garcia
@MaxUlysse
Nov 09 2016 11:57
Most likely you have a channel that is used twice
Johan Viklund
@viklund
Nov 09 2016 11:57
(everything works, so I've just left it for a while)
Yes, but I thought I had removed all that
Maxime Garcia
@MaxUlysse
Nov 09 2016 11:58
I had exactly the same error, and the better error message helped me to catch the culprit
Johan Viklund
@viklund
Nov 09 2016 12:02
it was a local variable in a map
I thought the map blocks had their own scope
ch.map {
    stuff = it[0]
    [it[0], stuff, it[1]]
}.set { outch }
didn't expect the stuff var to leak to the global scope
Maxime Garcia
@MaxUlysse
Nov 09 2016 12:05
ok
Johan Viklund
@viklund
Nov 09 2016 12:11
but that's symmetrical with the set {} syntax
so I guess I should not be surprised
Johan Viklund
@viklund
Nov 09 2016 12:47
Uhm, I think I've found a small bug, when using groupBy, depending on whether I have val or file the group gets "flattened".
no, it's no bug
hmm
Paolo Di Tommaso
@pditommaso
Nov 09 2016 13:04
@ewels I've just tried, false does not work, it should use null instead
  publishDir { params.saveReference ? 'results' : null }
if you want to specify the copy mode
Phil Ewels
@ewels
Nov 09 2016 13:06
ok, thanks :+1: - haven't managed to test it yet, other changes I did broke lots of stuff so working that first
Paolo Di Tommaso
@pditommaso
Nov 09 2016 13:06
  publishDir path: {params.saveReference ? 'results' : null}, mode: 'copy'
ok
Phil Ewels
@ewels
Nov 09 2016 13:07
Perfect, thanks!
Tiffany Delhomme
@tdelhomme
Nov 09 2016 13:10

Hi Paolo,
I have this process:

str = Channel.from('hello', 'hola')

num = Channel.from(1,2,3)

process printHello {

   input:
   val str
   val num

   output:
   stdout into result

   """
   echo $str"_"$num
   """
}

result.println()

and would like to print in output each possible combination between str and num channels i.e. hello_1, hello_2, hello_3, hola_1, and so (no order needed).
Thanks!

Johan Viklund
@viklund
Nov 09 2016 13:13
Just shooting from the hip here:
    each val str
    each val num
in the input declaration
Tiffany Delhomme
@tdelhomme
Nov 09 2016 13:16

With each val str and each val num I get

Launching toy_multiple_channels.nf
ERROR ~ No such variable: str

But with each str and each num it seems to work well, but with a warning

WARN: Using queue channel on each parameter declaration should be avoided -- take in consideration to change declaration for each: 'str' parameter

Thanks @viklund , will test cross to avoid the warning!

Johan Viklund
@viklund
Nov 09 2016 13:18
you could just use arrays directly to avoid that (but I guess you have channels in your real example)
Paolo Di Tommaso
@pditommaso
Nov 09 2016 13:32
str = Channel.from('hello', 'hola')

num = Channel.from(1,2,3)

process printHello {

   input:
   each x from str
   each y from num

   output:
   stdout into result

   """
   echo ${x}_${y}
   """
}

result.println()
Tiffany Delhomme
@tdelhomme
Nov 09 2016 16:00

Thanks both!
Here is my script at the moment:

str = Channel.from('hello', 'hola')

num = Channel.from(1,2,3)

process printHello {

   tag { tag }

   input:
   each x from str
   each y from num

   output:
   set val(tag), file("*.txt") into result

   shell:
   tag = x
   """
   if [ "$x" == "hello" ]
   then
    sleep 5
   fi
   echo $x" ... "$y > ${x}${y}.txt
   """
}


process mergeHello {

  tag { tag }

  input:
  set val(tag), file(z) from result.groupTuple(size: 3)

  output:
  file("*.txt") into resultM

  shell:
  tag = tag
  """
  echo ${z} > ${tag}_res.txt
  """
}

But the warning is still here, also do you have any idea of how to get the length of my num channel to give it to groupTuple? Here I put 3 but want to compute it from the channel...

If I replace 3 by num.count() I have the following
ERROR ~ Channel `num` has been used as an input by more than a process or an operator
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:02
ok, you don't need channels for num and str, just use a plain list
str = ['hello', 'hola']

num = [1,2,3]
then you can use num.size() as many times as you need
Kyle Hernandez
@kmhernan
Nov 09 2016 16:09
i have a very basic question. How do you make a modular tools with nextflow? What i mean is, all the examples look like "scripts" where nothing is really modularized. Is it possible to modularize nextflow (e.g., I could import something form another nextflow module file)? Related, how do you make processes "function" that you can conditionally call? Even if you just point me to repos that do this I would probably be ok, but right now it's hurting my brain.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:10
this is a planned nextflow-io/nextflow#238
currently you can use templates and invoke other NF pipelines from a process as any other tools
Kyle Hernandez
@kmhernan
Nov 09 2016 16:12
ok thank you @pditommaso
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:12
but there isn't yet a native sub-workflows import mechanism
Kyle Hernandez
@kmhernan
Nov 09 2016 16:12
I thought I was losing my mind
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:12
:+1:
Kyle Hernandez
@kmhernan
Nov 09 2016 16:12
like I was just not "getting" it or something
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:12
for so little :)
anyhow take in consideration that in the NF the main abstraction for re-usability is a process, that allows you to invoke external tools
Kyle Hernandez
@kmhernan
Nov 09 2016 16:16
thank you kind sir :+1:
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:19
welcome !
:ok_hand:
Félix C. Morency
@fmorency
Nov 09 2016 16:20
@pditommaso how well does the built-in ignite feature scale? what's the biggest cluster you tested against?
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:22
still under test, I've tested with ~ 50 nodes
feedback is welcome on larger ones
Félix C. Morency
@fmorency
Nov 09 2016 16:23
how many tasks under heavy load?
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:24
~ 1 mln
Félix C. Morency
@fmorency
Nov 09 2016 16:25
much, much more than what luigi is able to handle
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:25
really ?
Félix C. Morency
@fmorency
Nov 09 2016 16:25
luigi is not made to handle a large number of task
this is a documented design limitation
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:26
you may want to comment on this post if so
"The assumption is that each task is a sizable chunk of work. While you can probably schedule a few thousand jobs, it’s not meant to scale beyond tens of thousands."
Paolo Di Tommaso
@pditommaso
Nov 09 2016 16:27
interesting
Mike Smoot
@mes5k
Nov 09 2016 17:15
After a terrible night, some good news: I finally got one of our production pipelines running with nextflow cloud!
Paolo Di Tommaso
@pditommaso
Nov 09 2016 17:16
Cool!
What was wrong?
Mike Smoot
@mes5k
Nov 09 2016 17:18
Turns out it was this line: yum install -y nfs-utils We have a yum mirror set up, but we didn't have a latest release target, so that command was failing and therefore the rest of the part-002 boothook was failing, which is where the mount happens.
I actually think relying on yum is a bug here because I'm pretty sure it would fail with Ubuntu. I think a better approach would just be to require an image with nfs-utils installed along with Java and Docker.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 17:19
I see
Something that needs to be improved in NF launcher so
Tiffany Delhomme
@tdelhomme
Nov 09 2016 17:21
@pditommaso actually this is a toy example and in reality num and str will be true channels from previous processes
Paolo Di Tommaso
@pditommaso
Nov 09 2016 17:22
This is a bit complicated, please post in the mailing list
Mike Smoot
@mes5k
Nov 09 2016 17:24

One other minor issue I ran into is that when I run nextflow cloud create ... I see this as the output:

...
Login in the master node using the following command: 
  ssh -i <path to msmoot-aws key file> ec2-user@

The problem is that because we're a private vpc, we don't have a public dns name for the server. It would be good to failover to the IP address here.

Paolo Di Tommaso
@pditommaso
Nov 09 2016 17:51
ah
@mes5k could you open an issue for that, please ?
Mike Smoot
@mes5k
Nov 09 2016 18:01
Certainly! Should I open one for the use of yum as well or do you not think that's an issue?
Paolo Di Tommaso
@pditommaso
Nov 09 2016 18:01
yes thanks!
Paolo Di Tommaso
@pditommaso
Nov 09 2016 21:41
@mes5k ok, I've uploaded 0.22.5-SNAPSHOT fixing both #239 and #240, you may want to give it a try
Mike Smoot
@mes5k
Nov 09 2016 22:03
Thanks will take a look!
Jason Byars
@jbyars
Nov 09 2016 22:04
Is the nextflow ami available in us-east-1?
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:04
out of curiosity, how many nodes have u launched ?
Is the nextflow ami available in us-east-1?
oh I was fearing that .. :)
Jason Byars
@jbyars
Nov 09 2016 22:05
I'm just getting around to playing with that
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:06
you should be able to copy AMI ami-43f49030 from eu-west-1 region
I would like to avoid to maintain a set of AMIs for each region
Mike Smoot
@mes5k
Nov 09 2016 22:07
I ran a small cluster of 3 m4.xlarge machines. However, it was a very encouraging test. The pipeline took 1h 4min in AWS while getting all inputs from S3 and storing all results back there, whereas it took 56 minutes on a 64 cpu local machine writing to local disk.
Jason Byars
@jbyars
Nov 09 2016 22:07
NP, I see it now, eu-west-1
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:07
@jbyars ok, if you have problems let me know
@mes5k well, m4.xlarge is a 4 cpus instance
interesting
Mike Smoot
@mes5k
Nov 09 2016 22:11
Yeah, but not like the r3.4xlarge, which are similar to the nodes in our existing cluster!
My tentative plan is to set up a cluster with autoscaling enabled where the head node also runs a celery worker that will trigger new nextflow jobs as requests appear on the celery queue.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:12
!
Mike Smoot
@mes5k
Nov 09 2016 22:12
So we can blast against NR in a sane amount of time.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:13
take in consideration the NF cluster is not meant to work as a resource manager
Mike Smoot
@mes5k
Nov 09 2016 22:14
Can you elaborate on that a bit? I'm not totally sure I follow.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:15
in the meaning that each run should launch its own cluster
Mike Smoot
@mes5k
Nov 09 2016 22:17
I've been imagining that the cluster would stay up, but would run pipelines one at a time.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:17
one at a time, yes
I was understanding that you were planning to run more pipelines in parallel
Mike Smoot
@mes5k
Nov 09 2016 22:19
Gonna try and start simple! :)
Which resource manager would you recommend if I did want to run pipelines in parallel? Assuming AWS, of course.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:20
well, there's @jbyars that reported a good feedback with CfnCluster
Jason Byars
@jbyars
Nov 09 2016 22:21
last cfncluster release I actually had some problems. I moved on to Alces Flight with good results
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:21
ah
Jason Byars
@jbyars
Nov 09 2016 22:22
it's grid engine based by default and the template is quite nice.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:22
using NF with it ?
Jason Byars
@jbyars
Nov 09 2016 22:23
yes, it was a fairly easy autoscaling solution
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:23
does it support EFS ?
Jason Byars
@jbyars
Nov 09 2016 22:24
that part I haven't tested, but I don't see why it wouldn't. You can put just about anything you want in the node startup script.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:25
well, does it install a NFS server if so? or any other shared FS ?
Jason Byars
@jbyars
Nov 09 2016 22:26
it provided it's own NFS, before EFS launched.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:26
I see
Mike Smoot
@mes5k
Nov 09 2016 22:27
I'd been thinking that I should look at CfnCluster, but I'll check out Alces Flight too. Looks very interesting.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:27
I need to give a try to it
Jason Byars
@jbyars
Nov 09 2016 22:27
checking the updates on CloudFormation console, give me a minute
I don't see an immediate EFS option instead of the default nFS on the Alces CloudFormation template, but it wouldn't be too hard to add. It would probably be quicker to email them or post on their forums. They respond pretty quick.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:32
I would be interested in your feedback on NF cluster on AWS
let me know if you know if try it
Jason Byars
@jbyars
Nov 09 2016 22:32
pending....
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:32
:+1:
Jason Byars
@jbyars
Nov 09 2016 22:32
wait... the image finished copying
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:33
good, almost there :)
Jason Byars
@jbyars
Nov 09 2016 22:34
you might want to post a note for those of us in the wrong region aws ec2 copy-image --source-region eu-west-1 --source-image-id ami-43f49030 --name nextflow-cn-linuxv-2.3
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:35
I will, though the main point is that you can use any any image just having the JVM + Docker
Jason Byars
@jbyars
Nov 09 2016 22:35
cool, I will have to try with my images.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:37
(currently there's a small bug that limit the usage with redhat like distro ie. having the yum package manager, need to fix to support apt-get as well)
Jason Byars
@jbyars
Nov 09 2016 22:38
That's ok, at this point I'm trying to move to 1 pipeline = 1 docker container.
It's not the issue it used to be
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:39
I agree
Mike Smoot
@mes5k
Nov 09 2016 22:41
When you say 1 pipeline = 1 docker container do you actually mean only 1 container running for all of your processes?
i.e. the container persists for the life of the pipeline?
Jason Byars
@jbyars
Nov 09 2016 22:42
sorry, I just mean I try to make sure the container for a given pipeline contains every tool that pipeline needs,
I don't try to persist them
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:42
I mean the same
Mike Smoot
@mes5k
Nov 09 2016 22:46
That's what I thought. We're having all kinds of problems with docker daemon choking when we start too many containers so if there was a way to run with just one container per node in a cluster, that would be a big help. Running nextflow in the container is one option, but then I think I'd need to create a cluster manually.
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:47
we are also testing Singularity
so far it looks promising
Mike Smoot
@mes5k
Nov 09 2016 22:54
Singularity looks pretty interesting
Paolo Di Tommaso
@pditommaso
Nov 09 2016 22:54
yep
I will benchmark it soon
Jason Byars
@jbyars
Nov 09 2016 23:36
ami appears to be working fine. It will be a couple hours before I have all the data loaded up on the EFS volume for a real test. I'll let you know how it goes.