These are chat archives for nextflow-io/nextflow

22nd
May 2019
Laurence E. Bernstein
@lebernstein
May 22 00:42
@micans Thanks. Since I use the sample ID as the beginning of the name of every output file is there an even easier/compact way?
Really what I am mostly doing is making index files as a separate step using samtools.
So the files have the same name.. ie.
SampleN1.bam SampleN1.bai
There are other similar things, but I've named them all have SampleN1.xxx or SampleN1_xxx.yyy
Laurence E. Bernstein
@lebernstein
May 22 02:28

@micans I got it mostly working but I'm not sure what the right syntax for creating two channels so I can use it in two processes:

ch1.join(ch3).set{ ch4, ch5 }

does not work, nor does something like:

ch4 = Channel.create()
ch5 = Channel.create()
ch1.join(ch3).separate( ch4, ch5 )

Unless something else is wrong.

Laurence E. Bernstein
@lebernstein
May 22 04:08
This message was deleted
Laurence E. Bernstein
@lebernstein
May 22 06:50
@micans OK. Got it working. Thanks again. I had to dupe the previous channel and do 2 separate joins but it works.
Alaa Badredine
@AlaaBadredine_twitter
May 22 08:58
nothing I could do about my question ?
Is there a way to get the directory path of Nextflow script and script name ?For example, we have nextflow run /media/Nextflow/scripts/AwesomeScript.nf. I would like to have /media/Nextflow/scripts/ and AwesomeScript.nf
micans
@micans
May 22 09:03
@lebernstein I was off snoring ... I assume you found that you need .into for multiple channels. About sample IDs, I find it best practice to pass them around as val(sampleID) in all appropriate channels, rather than relying on implicitly carrying it along in file names, which may lead to lots of contortions and additional parsing. I use sample IDs in file names as well, but not as the primary mechanism. YMMV of course. If I recall correctly, some people pass around a Groovy dictionary between channels, which enables you to encode even more state in a very extensible manner.
@AlaaBadredine_twitter (1) is "$baseDir"
Alaa Badredine
@AlaaBadredine_twitter
May 22 09:06
thanks @micans
Luca Cozzuto
@lucacozzuto
May 22 09:07

Hi all, I'm using this configuration

  withLabel: increase_mem {
       errorStrategy = 'retry'
       memory = {10.GB * task.attempt}
       cpus = 1
       time = {6h * task.attempt}
       maxRetries = 3
  }

and I got this error:

Exception in thread "Task submitter" groovy.lang.MissingMethodException: No signature of method: groovy.util.ConfigObject.multiply() is applicable for argument types: (Integer) values: [1]
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:72)
...
using this it is ok (maybe)
  withLabel: increase_mem {
       errorStrategy = 'retry'
       memory = {10.GB * task.attempt}
       cpus = 1
       time = {6.h * task.attempt}
       maxRetries = 3
  }
Paolo Di Tommaso
@pditommaso
May 22 09:11
6h -> 6.h
:punch:
Luca Cozzuto
@lucacozzuto
May 22 09:12
well I'm starting to understand the logics (eheheh)
Paolo Di Tommaso
@pditommaso
May 22 09:13
@AlaaBadredine_twitter use $baseDir variable in your script
Alaa Badredine
@AlaaBadredine_twitter
May 22 09:13
thank you @pditommaso ! and to get the script name ?
workflow.scriptName
actually you can use directly workflow.scriptFile
Alaa Badredine
@AlaaBadredine_twitter
May 22 09:18
ahhh nice ! Thanks for the link
micans
@micans
May 22 10:44

@pditommaso I'm going through my snippets of nf idioms, and found this one:

.collectFile { file -> file.collect{ it.toString() }.join('\n') + '\n' }

I seem to remember there is a more natural way of doing this (sticking things in a file, one per line) .. can you help :-?

sureshhewa
@sureshhewabi
May 22 10:44
Hi all. I want to run a pipeline in nextflow . As the first process, I want to copy files from our nfs file system . i.e. path/to/the/archive/<accession>/data. I need files inside data folder to the next process to extract some data from those files. My question is about my first process. I will pass the <accession> as a params.accession to the first process. it will call my python container and pass accession to copy data. but my problem here is docker container doesn't know about the path/to/the/archive/<accession>/data location unless I mount it(like -v path/to/the/archive/<accession>/data: data). I can copy files to the container before I execute the python command using COPY command if my files were in the place where I run docker. but my data is in another location. I cannot use absolute path to copy files, because docker does not allow that. What I can do to make my path available in the docker container inside nextflow? is there a way to mention volume mount in nextflow?
I am using singularity to run this as a job in nfs file system
Paolo Di Tommaso
@pditommaso
May 22 10:49
@micans can think a better one at this moment, actually it deserves to be mentioned as a pattern
micans
@micans
May 22 10:50
thanks Paolo -- there was a recent one where it seemed to me that the newlines were handled better (no need to add the trailing one), but it was not a huge difference as I recall.

@sureshhewabi you can use a scope like this:

singularity {
  enabled     = true
  autoMounts  = true

  cacheDir = '/another/path/'
  runOptions = '-B /a/path/'
}

and see https://www.nextflow.io/docs/latest/singularity.html

@pditommaso my plan is to make my own pattern repository, but then I'll try to make a PR to include some in the main pattern repo. Right now I'm just wading through a chaos of 60 .nf files :grimacing:
Paolo Di Tommaso
@pditommaso
May 22 10:56
:D
maybe you were referring newLine option
micans
@micans
May 22 11:00
@pditommaso perfect that's it! Off to lunch now, celebrating.
Paolo Di Tommaso
@pditommaso
May 22 11:16
:+1:
sureshhewa
@sureshhewabi
May 22 13:20
Thanks @pditommaso I am testing the solution you proposed and it should work
sureshhewa
@sureshhewabi
May 22 14:19
Thanks @pditommaso It works!!!
micans
@micans
May 22 14:20
:+1:
Paolo Di Tommaso
@pditommaso
May 22 14:29
Blogged about the state of modules
(it's me or gitter does not show any more tweet/image previews?)
micans
@micans
May 22 14:31
same here. Nice blog!
Paolo Di Tommaso
@pditommaso
May 22 14:31
thanks
micans
@micans
May 22 14:32
now towards onError! :-)
Paolo Di Tommaso
@pditommaso
May 22 14:32
ahah .. true!
micans
@micans
May 22 14:32
woohoo .. I felt a bit rude there. When you are ready!
Paolo Di Tommaso
@pditommaso
May 22 14:32
no pb! :D
Luca Cozzuto
@lucacozzuto
May 22 14:39
lovely new features... my pipelines will be shrunk by 60% :)
Laurence E. Bernstein
@lebernstein
May 22 16:37
Very excited to try out DSL 2. I have 8 different pipelines all of which use the same pieces. This will be a HUGE improvement to my design/usage.
Paolo Di Tommaso
@pditommaso
May 22 16:37
nice
Michael L Heuer
@heuermh
May 22 16:57
Very cool! Raises some questions...
Is it possible to have required and optional module input/outputs?
How are module inputs/outputs mapped to stdin/stdout for the pipe syntax?
Can module inputs/outputs be documented (javadoc)? If so, how would one generate docs and how should they be hosted?
Can a module be extended (say to add or override container or tags or publishDir)?
micans
@micans
May 22 17:03
@heuermh interesting idea, overriding publishDir (and others)!
Paolo Di Tommaso
@pditommaso
May 22 17:16
How are module inputs/outputs mapped to stdin/stdout for the pipe syntax?
too much excitement! :satisfied:
so far is just syntax sugar on top existing process, therefore no
but it could be done
Can module inputs/outputs be documented (javadoc)? If so, how would one generate docs and how should they be hosted?
well, not really the same concept of extension but the config file is allow you to override directives
inputs no, output yes
Can module inputs/outputs be documented (javadoc)? If so, how would one generate docs and how should they be hosted?
nice idea that could be implemented
micans
@micans
May 22 17:20
(quoted the same thing twice @pditommaso) -- so publishDir can be overridden but in config file? Including a saveAs closure?
Paolo Di Tommaso
@pditommaso
May 22 17:22
yes, you can do this now ..
with plain nf
micans
@micans
May 22 17:22
:+1: :beers:
Paolo Di Tommaso
@pditommaso
May 22 17:23
with dsl 2, you can include with an alias, and use the alias name to override the publishDir
never tried but hopefully should work .. :joy:
Dan Fornika
@dfornika
May 22 18:01
Is there any way to install the 19.05.0-edge release using the nextflow self-update command?
Paolo Di Tommaso
@pditommaso
May 22 18:02
simplest way is just to the set export NXF_VER=19.05.0-edge
that's all
Laurence E. Bernstein
@lebernstein
May 22 20:42

Lately I seem to get this error a lot more:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:297: getting the final child's pid from pipe caused \"EOF\"": unknown.
time="2019-05-22T20:34:29Z" level=error msg="error waiting for container: context canceled"

Is this me or is this maybe something from Nextflow 04?

Olga Botvinnik
@olgabot
May 22 21:20

Hello! I'm extending the nf-core/rnaseq workflow to have composable references, e.g. GRCh38,ERCC,GFP or GRCm38,Cre,Tdtom,zsGreen whose fastas+gtfs and thus STAR/Hisat2 indices woulc get combined on the fly. This is the PR: czbiohub/rnaseq#16

My issue is that I keep getting "file not found" errors when the files are definitely there, and it seems like the error is somewhere else instead:

 Wed 22 May - 14:16  ~/code/rnaseq   origin ☊ olgabot/composable-references ✔ 5☀ 
  make test_biohub
nextflow run main.nf \
        --reads "s3://olgabot-maca/mini-maca/*{R1,R2}*.fastq.gz" \
        --genome GRCm38,ERCC \
        -profile czbiohub_aws \
        -resume
N E X T F L O W  ~  version 19.03.0-edge
Launching `main.nf` [special_galileo] - revision: f5899b2305
===================================================================================================
                                                                                    (((((((((((((((
  ______  ________    .______    __    ______    __    __   __    __  .______      (((,.((((((   (((
 /      ||       /    |   _  \  |  |  /  __  \  |  |  |  | |  |  |  | |   _  \    ((             ((((
|  ,----'`---/  /     |  |_)  | |  | |  |  |  | |  |__|  | |  |  |  | |  |_)  |   (((    /(((. ((((((
|  |        /  /      |   _  <  |  | |  |  |  | |   __   | |  |  |  | |   _  <    (((((((((    ((((((
|  `----.  /  /----.  |  |_)  | |  | |  `--'  | |  |  |  | |  `--'  | |  |_)  |   (((((((       (((((
 \______| /________|  |______/  |__|  \______/  |__|  |__|  \______/  |______/     ((((((.     ,((((
                                                                                    (((((((* /(((((
 czbiohub/rnaseq : RNA-Seq Best Practice v1.2
===================================================================================================
ERROR ~ Fasta file not found: [s3://czbiohub-reference/gencode/mouse/vM19/GRCm38.p6.genome.fa.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.fa.gz]

 -- Check '.nextflow.log' file for details
ERROR ~ Cannot find any reads matching: s3://olgabot-maca/mini-maca/*{R1,R2}*.fastq.gz
NB: Path needs to be enclosed in quotes!
NB: Path requires at least one * wildcard!
If this is single-end data, please specify --singleEnd on the command line.

 -- Check '.nextflow.log' file for details
ERROR ~ GTF annotation file not found: [s3://czbiohub-reference/gencode/mouse/vM19/gencode.vM19.annotation.gtf.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.gtf.gz]

 -- Check '.nextflow.log' file for details
make: *** [test_biohub] Error 1
The erroring part of the nextflow log is here:
May-22 14:19:46.920 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: process for class: _nf_script_2cf7a49f
May-22 14:19:46.964 [main] DEBUG nextflow.Session - The following nodes are still active:
  [operator] ifEmpty
  [operator] ifEmpty
  [operator] ifEmpty
  [operator] into

May-22 14:19:46.980 [Actor Thread 2] ERROR nextflow.Nextflow - Fasta file not found: [s3://czbiohub-reference/gencode/mouse/vM19/GRCm38.p6.genome.fa.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.fa.gz]
May-22 14:19:46.981 [Actor Thread 3] ERROR nextflow.Nextflow - GTF annotation file not found: [s3://czbiohub-reference/gencode/mouse/vM19/gencode.vM19.annotation.gtf.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.gtf.gz]
I thought it was the same as this error (https://groups.google.com/forum/#!topic/nextflow/DPASCrAONSY) where the output into command had to be protected by parentheses but changing that didn't seem to fix it

I'm confused why Channel.fromPath can't seem to find the paths since it accepts a list of paths and if I have a script with just those fastas, they get found just fine:

Channel.fromPath( ["s3://czbiohub-reference/gencode/mouse/vM19/GRCm38.p6.genome.fa.gz", "s3://czbiohub-reference/transgenes/ERCC92/ERCC92.fa.gz"])
  .ifEmpty{ println("empty??") }
  .println()

Output:

(base)
 ✘  Wed 22 May - 14:21  ~/code/rnaseq   origin ☊ olgabot/composable-references 51● 
  nextflow run scratch.nf
N E X T F L O W  ~  version 19.03.0-edge
Launching `scratch.nf` [boring_shannon] - revision: e10d62aa32
WARN: There's no process matching config selector: makeSTARindex
/czbiohub-reference/gencode/mouse/vM19/GRCm38.p6.genome.fa.gz
/czbiohub-reference/transgenes/ERCC92/ERCC92.fa.gz
Completed at: 22-May-2019 14:24:41
Duration    : 2.1s
CPU hours   : (a few seconds)
Succeeded   : 0
Olga Botvinnik
@olgabot
May 22 21:31

I think this is part of the problem. For each valid genome name, its fasta and gtf etc are grabbed:

  params.star_index = genome_names_valid.collect{ params.genomes[ it ].star ?: false }
  params.fasta = genome_names_valid.collect{ params.genomes[ it ].fasta ?: false }
  params.gtf = genome_names_valid.collect{ params.genomes[ it ].gtf ?: false }
  params.gff = genome_names_valid.collect{ params.genomes[ it ].gff ?: false }
  params.bed12 = genome_names_valid.collect{ params.genomes[ it ].bed12 ?: false }
  params.hisat2_index = genome_names_valid.collect{ params.genomes[ it ].hisat2_index ?: false }
}

println("params.fasta:")
println(params.fasta)
println("Fasta Channel.fromPath:")
Channel
    .fromPath(params.fasta)
    .println()
println("--- end Fasta Channel.fromPath ---")

But that Channel.fromPath doesn't print anything!!

(base)
 ✘  Wed 22 May - 14:30  ~/code/rnaseq   origin ☊ olgabot/composable-references 5☀ 1● 
  make test_biohub
nextflow run main.nf \
        --reads "s3://olgabot-maca/mini-maca/*{R1,R2}*.fastq.gz" \
        --genome GRCm38,ERCC \
        -profile czbiohub_aws \
        -resume
N E X T F L O W  ~  version 19.03.0-edge
Launching `main.nf` [gigantic_hodgkin] - revision: 9bd44453f7
params.fasta:
[s3://czbiohub-reference/gencode/mouse/vM19/GRCm38.p6.genome.fa.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.fa.gz]
Fasta Channel.fromPath:
--- end Fasta Channel.fromPath ---
===================================================================================================
                                                                                    (((((((((((((((
  ______  ________    .______    __    ______    __    __   __    __  .______      (((,.((((((   (((
 /      ||       /    |   _  \  |  |  /  __  \  |  |  |  | |  |  |  | |   _  \    ((             ((((
|  ,----'`---/  /     |  |_)  | |  | |  |  |  | |  |__|  | |  |  |  | |  |_)  |   (((    /(((. ((((((
|  |        /  /      |   _  <  |  | |  |  |  | |   __   | |  |  |  | |   _  <    (((((((((    ((((((
|  `----.  /  /----.  |  |_)  | |  | |  `--'  | |  |  |  | |  `--'  | |  |_)  |   (((((((       (((((
 \______| /________|  |______/  |__|  \______/  |__|  |__|  \______/  |______/     ((((((.     ,((((
                                                                                    (((((((* /(((((
 czbiohub/rnaseq : RNA-Seq Best Practice v1.2
===================================================================================================
ERROR ~ GTF annotation file not found: [s3://czbiohub-reference/gencode/mouse/vM19/gencode.vM19.annotation.gtf.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.gtf.gz]

 -- Check '.nextflow.log' file for details
ERROR ~ Fasta file not found: [s3://czbiohub-reference/gencode/mouse/vM19/GRCm38.p6.genome.fa.gz, s3://czbiohub-reference/transgenes/ERCC92/ERCC92.fa.gz]

 -- Check '.nextflow.log' file for details
make: *** [test_biohub] Error 1
Olga Botvinnik
@olgabot
May 22 21:58

okay I tracked it down .. the configuration parameters e.g. process.executor cannot be accessed within the workflow, and I had:

summary['Process Executor']  = process.executor
if (process.executor == "awsbatch"){
  summary['AWS Region']      = params.awsregion
  summary["AWS Batch Queue"] = params.awsqueue
}

And this caused a very cryptic error of

May-22 14:53:21.263 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: process for class: _nf_script_92d2424a

Now the summary checks params for the values:

if (params.awsregion && params.awsqueue){
  summary['AWS Region']      = params.awsregion
  summary['AWS Batch Queue'] = params.awsqueue
}

Can this error message be more obvious, e.g. variable process.executor does not exist?

Olga Botvinnik
@olgabot
May 22 22:33

The other problem was using parens for files in processes. e.g. this errors out:

output:
file("${genome_name}.fa") into (ch_fasta_for_star_index, ch_fasta_for_hisat_index)

But this doesn't:

  output:
  file "${genome_name}.fa" into ch_fasta_for_star_index, ch_fasta_for_hisat_index