These are chat archives for nextflow-io/nextflow

25th
Oct 2018
Maxime HEBRARD
@mhebrard
Oct 25 2018 02:53
hello here. I was thinking I can declare my processes in any order and nextflow manage the execution graph by inspecting channels/input/output .... but I have an issue
chA = Channel.from('A')
chB = Channel.from('B')
// if I declare process here is ok
process proc {
  input: 
  val x from chA

  output:
  stdout chA2

  """
  echo $x
  """
}

chC = Channel.empty().mix(chA2, chB)

// If I declare process here
// I get an error "no such variable chA2"
Maxime HEBRARD
@mhebrard
Oct 25 2018 02:58
is there a nice way to have
// all my channels declaration logic here

// all my processes there
Rad Suchecki
@rsuchecki
Oct 25 2018 05:29
Can you provide full example to reproduce the error?
Maxime HEBRARD
@mhebrard
Oct 25 2018 05:57

full example

#!/usr/bin/env nextflow

/* Def channel then proc then channel */

chA = Channel.from('A')
chB = Channel.from('B')

process procAtoA2 {
  input: val x from chA
  output: stdout chA2
  "echo $x$x"
}

chC = Channel.empty().mix(chA2, chB)
  .subscribe {println "channel C out: $it"}

///* Def channel then proc */
//chE = Channel.from('E')
//chF = Channel.from('F')
//chG = Channel.empty().mix(chE2, chF)
//  .subscribe {println "channel G out: $it"}
//
//process procEtoE2 {
//  input: val x from chE
//  output: stdout chE2
//  "echo $x$x"
//}

as is, chC print is working... if you uncomment the second section of the code an error pop up "No such variable: chE2"

Rad Suchecki
@rsuchecki
Oct 25 2018 06:06
Right, I see. You mix(), so operate, on a channel which is declared in procEtoE2.
Maxime HEBRARD
@mhebrard
Oct 25 2018 06:19
oh... so I cannot "use" channels before the declaration
but even subscribe() fire the error :-/
Maxime HEBRARD
@mhebrard
Oct 25 2018 06:58
question: can I generate a folder in a process and output: file "my folder" into chPath ?
Thomas Van Parys
@thpar
Oct 25 2018 08:51
Is there an Emacs mode for Nextflow?
Or a way to make groovy-mode play nice with Nextflow indentation?
Paolo Di Tommaso
@pditommaso
Oct 25 2018 08:53
unfortunately no, on Emacs the best approximation is groovy
@mhebrard "can I generate a folder in a process" => yes
Thomas Van Parys
@thpar
Oct 25 2018 08:54
@pditommaso : ah, too bad.
Paolo Di Tommaso
@pditommaso
Oct 25 2018 08:56
we need a emacs guru that contribute that starting from the nextflow syntax for atom
micans
@micans
Oct 25 2018 10:07
Resuming a NF run on 7K samples, where the last two steps failed (because of memory limit exceeded), took a few hours, at least two. Given how NF tracks the state of everything so nicely, is there any chance for a feature where one could specify to 'jump' to a certain process execution? Or is it the case that the internal state needs be built up by replaying the entire sequence of events? I fear the latter .. can't complain if that's the case.
@pditommaso
Paolo Di Tommaso
@pditommaso
Oct 25 2018 10:09
it will be possible by #452
micans
@micans
Oct 25 2018 10:20
That's pretty cool!
Tobias "Tobi" Schraink
@tobsecret
Oct 25 2018 15:34
That would be huge! I am running into file storage issues currently which forces me to combine several processes into one, rather than keeping them as individual consecutive processes (easier for debugging, resource allocation, etc).
Tobias "Tobi" Schraink
@tobsecret
Oct 25 2018 16:18
So if I add rm -r $someinputfile as an afterScript, Nextflow will rerun the upstream task even when I use the -resume flag?
Paolo Di Tommaso
@pditommaso
Oct 25 2018 16:31
yes
Tobias "Tobi" Schraink
@tobsecret
Oct 25 2018 18:52
Thanks @pditommaso! also major congrats on the new release and seqeralabs!
Tobias Neumann
@t-neumann
Oct 25 2018 21:11

Hi everyone,

I got some play-money to run some TCGA analysis on AWS :)

So after doing quite some reading on AWS manuals and also the Nextflow documentation, I have some very basic beginner questions - I hope somebody might be kind enough to help out an absolute AWS novice.

The major difference between setting up a cluster on AWS and AWS batch is that for AWS batch you predefine your setup (number of EC2 instances + resources) beforehand and can launch pipelines locally vs with nextflow cloud you would dynamically launch instances and run pipelines on the masternode?

Next: The typical workflow is to put input files into s3 -> then you launch the pipeline which will load any required files (containers + input files from s3) into ebs of the respective ec2 instance -> do the computation -> copy results back to s3
Is that correct?
I was just wondering how to scale the EBS of the custom image, do I need to calculate all image files, input files, intermediate files and output files or will input be directly read from s3 and output be directly written to s3?
The documentation says "the pipeline execution must specifies a S3 bucket where jobs intermediate results are stored with the -bucket-dir command line options.", so it's not clear to me where intermediate files are stored and why they would be stored both on EBS and S3.

Finally: The documentation says that the AMI needs the aws cli available. However in the documentation it says to create the AMI and only after install the aws CLI. Should that not be installed when creating the AMI so it's available within each AMI?

Any input on that would be highly appreciated to understand the whole process more and also save money. Thanks in advance.