These are chat archives for nextflow-io/nextflow

25th
Oct 2018
Maxime HEBRARD
@mhebrard
Oct 25 2018 02:53 UTC
hello here. I was thinking I can declare my processes in any order and nextflow manage the execution graph by inspecting channels/input/output .... but I have an issue
chA = Channel.from('A')
chB = Channel.from('B')
// if I declare process here is ok
process proc {
  input: 
  val x from chA

  output:
  stdout chA2

  """
  echo $x
  """
}

chC = Channel.empty().mix(chA2, chB)

// If I declare process here
// I get an error "no such variable chA2"
Maxime HEBRARD
@mhebrard
Oct 25 2018 02:58 UTC
is there a nice way to have
// all my channels declaration logic here

// all my processes there
Rad Suchecki
@rsuchecki
Oct 25 2018 05:29 UTC
Can you provide full example to reproduce the error?
Maxime HEBRARD
@mhebrard
Oct 25 2018 05:57 UTC

full example

#!/usr/bin/env nextflow

/* Def channel then proc then channel */

chA = Channel.from('A')
chB = Channel.from('B')

process procAtoA2 {
  input: val x from chA
  output: stdout chA2
  "echo $x$x"
}

chC = Channel.empty().mix(chA2, chB)
  .subscribe {println "channel C out: $it"}

///* Def channel then proc */
//chE = Channel.from('E')
//chF = Channel.from('F')
//chG = Channel.empty().mix(chE2, chF)
//  .subscribe {println "channel G out: $it"}
//
//process procEtoE2 {
//  input: val x from chE
//  output: stdout chE2
//  "echo $x$x"
//}

as is, chC print is working... if you uncomment the second section of the code an error pop up "No such variable: chE2"

Rad Suchecki
@rsuchecki
Oct 25 2018 06:06 UTC
Right, I see. You mix(), so operate, on a channel which is declared in procEtoE2.
Maxime HEBRARD
@mhebrard
Oct 25 2018 06:19 UTC
oh... so I cannot "use" channels before the declaration
but even subscribe() fire the error :-/
Maxime HEBRARD
@mhebrard
Oct 25 2018 06:58 UTC
question: can I generate a folder in a process and output: file "my folder" into chPath ?
Thomas Van Parys
@thpar
Oct 25 2018 08:51 UTC
Is there an Emacs mode for Nextflow?
Or a way to make groovy-mode play nice with Nextflow indentation?
Paolo Di Tommaso
@pditommaso
Oct 25 2018 08:53 UTC
unfortunately no, on Emacs the best approximation is groovy
@mhebrard "can I generate a folder in a process" => yes
Thomas Van Parys
@thpar
Oct 25 2018 08:54 UTC
@pditommaso : ah, too bad.
Paolo Di Tommaso
@pditommaso
Oct 25 2018 08:56 UTC
we need a emacs guru that contribute that starting from the nextflow syntax for atom
Stijn van Dongen
@micans
Oct 25 2018 10:07 UTC
Resuming a NF run on 7K samples, where the last two steps failed (because of memory limit exceeded), took a few hours, at least two. Given how NF tracks the state of everything so nicely, is there any chance for a feature where one could specify to 'jump' to a certain process execution? Or is it the case that the internal state needs be built up by replaying the entire sequence of events? I fear the latter .. can't complain if that's the case.
@pditommaso
Paolo Di Tommaso
@pditommaso
Oct 25 2018 10:09 UTC
it will be possible by #452
Stijn van Dongen
@micans
Oct 25 2018 10:20 UTC
That's pretty cool!
Tobias "Tobi" Schraink
@tobsecret
Oct 25 2018 15:34 UTC
That would be huge! I am running into file storage issues currently which forces me to combine several processes into one, rather than keeping them as individual consecutive processes (easier for debugging, resource allocation, etc).
Tobias "Tobi" Schraink
@tobsecret
Oct 25 2018 16:18 UTC
So if I add rm -r $someinputfile as an afterScript, Nextflow will rerun the upstream task even when I use the -resume flag?
Paolo Di Tommaso
@pditommaso
Oct 25 2018 16:31 UTC
yes
Tobias "Tobi" Schraink
@tobsecret
Oct 25 2018 18:52 UTC
Thanks @pditommaso! also major congrats on the new release and seqeralabs!
Tobias Neumann
@t-neumann
Oct 25 2018 21:11 UTC

Hi everyone,

I got some play-money to run some TCGA analysis on AWS :)

So after doing quite some reading on AWS manuals and also the Nextflow documentation, I have some very basic beginner questions - I hope somebody might be kind enough to help out an absolute AWS novice.

The major difference between setting up a cluster on AWS and AWS batch is that for AWS batch you predefine your setup (number of EC2 instances + resources) beforehand and can launch pipelines locally vs with nextflow cloud you would dynamically launch instances and run pipelines on the masternode?

Next: The typical workflow is to put input files into s3 -> then you launch the pipeline which will load any required files (containers + input files from s3) into ebs of the respective ec2 instance -> do the computation -> copy results back to s3
Is that correct?
I was just wondering how to scale the EBS of the custom image, do I need to calculate all image files, input files, intermediate files and output files or will input be directly read from s3 and output be directly written to s3?
The documentation says "the pipeline execution must specifies a S3 bucket where jobs intermediate results are stored with the -bucket-dir command line options.", so it's not clear to me where intermediate files are stored and why they would be stored both on EBS and S3.

Finally: The documentation says that the AMI needs the aws cli available. However in the documentation it says to create the AMI and only after install the aws CLI. Should that not be installed when creating the AMI so it's available within each AMI?

Any input on that would be highly appreciated to understand the whole process more and also save money. Thanks in advance.