These are chat archives for nextflow-io/nextflow

31st
Jan 2018
Stephen Zhang
@zsteve
Jan 31 2018 04:22

Hey all, I'm pretty new to Groovy/Nextflow (used to C/C++!), I'm trying to define a class to encapsulate data. Not sure if this goes against the design objectives of Nextflow, but I'm getting some errors:

class Sample{
  def name
  //String[] fastqFiles // paths to the original fastq files

  Sample(name){
    this.name = name
  //  this.fastqFiles = fastqFiles
  }
}

a = new Sample("z")

Running this gives me:

Launching `atac_pipeline.nf` [nice_varahamihira] - revision: 5c738dbcc0
ERROR ~ No signature of method: Sample.init() is applicable for argument types: (java.util.ArrayList) values: [[]]
Possible solutions: wait(), find(), wait(long), any(), print(java.io.PrintWriter), print(java.lang.Object)

 -- Check script 'atac_pipeline.nf' at line: 11 or see '.nextflow.log' file for more details
I get good results when I run the same code snippet on https://groovy-playground.appspot.com/
Any help would be appreciated :)
Ashley S Doane
@DoaneAS
Jan 31 2018 04:35

Hi Paola and all, I have a question about a process and why it seems to always run serially- 1 sample at a time.
I combine group output from many early processes using mix, like this:

finalbamforqc.mix(nsortedbamforqc)
    .mix(broadpeakqc)
    .mix(finalbedqc)
    .mix(sortbamqc)
    .mix(insertionTrackbw)
    .mix(picardcomplexity)
    .mix(pbcqc)
    .mix(dupqc)
    .mix(frips)
    .groupTuple(sort: true)
    .view()
    .set{ qcin }

This works to group everything together and run a QC script, but for some reason it seems to only run 1 sample at a time, whereas my nextflow pipeliine runs aligning, peak calling, and so forth in parallel across many samples. I've doubled checked that the resource requirements for the process that takes as input "qcin" are not high (4 cpus). Any ideas? Thanks

Paolo Di Tommaso
@pditommaso
Jan 31 2018 09:07
@zsteve This looks like a bug, please open an issue in the GitHub project repo.
As workaround define that class is a separate .groovy file placed in the lib directory in the project root (ie. where the main script is located)
Paolo Di Tommaso
@pditommaso
Jan 31 2018 09:20
@DoaneAS the first bug is the spelling in my name it's paolo (masculine) not paola (femminine) :smile:
I guess the problem is the groupTuple that introduces a sync point
if can be avoided if the expected groups have a fixed size, see the size option https://www.nextflow.io/docs/latest/operator.html#grouptuple
Tim Diels
@timdiels
Jan 31 2018 10:10
Welp, I failed to convince my colleagues to replace the database which is mutated by nearly all the processes with a a couple of immutable databases (some databases as input, another database as output) which would allow perfect caching :/. Instead they prefer each step to clear any changes it may have made on a previous run of the step. Problem with this is 1) changing something may cause the step to not be run at all, meaning you need to manually clear anything the step produced in the previous run and 2) you might end up reusing something from the cache whose tables have already been cleared and regenerated. I can't come up with a likely case where the second point would happen though.
Paolo Di Tommaso
@pditommaso
Jan 31 2018 10:13
I have too little information to give a valuable feedback
my suggestion, create a prototype a run some evaluation tests
syntax highlighting for VS Code editor
Edgar
@edgano
Jan 31 2018 10:23
:clap:
Tim Diels
@timdiels
Jan 31 2018 10:41
@pditommaso Alright, we were planning to first reimplement only the first steps in Nextflow, I suppose we could revisit the issue then when we have more concrete cases to look at.
Alexander Peltzer
@apeltzer
Jan 31 2018 11:12
nice Paolo!
Paolo Di Tommaso
@pditommaso
Jan 31 2018 11:13
:ok_hand:
Stephen Zhang
@zsteve
Jan 31 2018 12:32

Also quick question - I have some fastq files I would like to pass from one process to FastQC and a processing pipeline in parallel (since FastQC is read-only). I understand the forking operators create separate copies for each fork, but I would like to pass a 'reference' to each branch. Any suggestions as to how to do this?

My understanding is that one is not allowed to have multiple processes feeding off the input from one original process

Félix C. Morency
@fmorency
Jan 31 2018 15:00
@pditommaso Thanks for the VSCode extension! Real cool
Paolo Di Tommaso
@pditommaso
Jan 31 2018 15:06
:+1:
Simone Baffelli
@baffelli
Jan 31 2018 15:53
Cool, finally a VSCode extension!

back to my last weeks question. If I use:

        file(slcPars:"HH.slc.par","HV.slc.par","VH.slc.par","VV.slc.par"), 
        file(slcCorr:"HH.slc","HV.slc","VH.slc","VV.slc") into calibratedSMatrix

to capture multiple outputs in the order I want, I get the following message:

ERROR ~ illegal colon after argument expression;
   solution: a complex label expression before a colon must be parenthesized @ line 332, column 34.
           file(slcPars:"HH.slc.par":"HV.slc.par":"VH.slc.par":"VV.slc.par"),

Any clue?

Paolo Di Tommaso
@pditommaso
Jan 31 2018 16:15
that's an invalid syntax
you can eventually write
file("HH.slc:HV.slc:VH.slc:VV.slc") into calibratedSMatrix
Simone Baffelli
@baffelli
Jan 31 2018 16:32
Ah, I see. Turns out I don't even need that
I could use a regular glob glob glob :chicken: pattern
Paolo Di Tommaso
@pditommaso
Jan 31 2018 16:33
:+1:
Simone Baffelli
@baffelli
Jan 31 2018 16:33
But thanks!
Paolo Di Tommaso
@pditommaso
Jan 31 2018 16:33
welcome
Ashley S Doane
@DoaneAS
Jan 31 2018 17:07
@pditommaso Thanks I will try the size option, and sorry for the name mispelling!!
Paolo Di Tommaso
@pditommaso
Jan 31 2018 17:09
no pb
Martin Šošić
@Martinsos
Jan 31 2018 18:38
Hi, does anybody know if there is a way to use nextflow cloud shutdown <clusterName> without being asked if we really want to shutdown cluster? Right now it interactively asks, and I would like to use it in script, which means I can't answer questions.
Félix C. Morency
@fmorency
Jan 31 2018 18:38
Martin Šošić
@Martinsos
Jan 31 2018 18:48
Ah so obvious, thanks
Paolo Di Tommaso
@pditommaso
Jan 31 2018 20:06
@Martinsos please open an issue for that on GitHub, it would be useful having that option
Shawn Rynearson
@srynobio
Jan 31 2018 22:36
to run nextflow on aws-batch, is nextflow required to be installed in the container or the ami?
Reason I ask is because I have the job queue, job definitions etc built, and I can launch a test process, which starts the ami, writes to s3, but the job gets stuck at the aws runnable step.