These are chat archives for nextflow-io/nextflow

15th
Jul 2016
Robert Syme
@robsyme
Jul 15 2016 06:29

Heya all. I've got a question about calling Groovy methods inside a process. I usually use a method to convert from numeric identifiers to a human-readable ID. Something like:

String runClassifier (String id) {
  def matcher = (id =~ /(\d+)_\d+/)
  matcher.matches()
  Integer sampleNum = matcher.group(1).toInteger()

  switch (sampleNum) {
  case 11: return "demo";
  case [33, 41, 49]: return "early";
  case [34, 42, 50]: return "medium";
  case [35, 43, 51]: return "late";
  case [40, 48, 56]: return "culture";
  default: return "unknown"
  }
}

I use this method inside a process like so (to set rg-sample and in rg-description):

process tophat {
  tag { sampleID }
  cache 'deep'
  cpus 4

  input:
  file reference
  set val(sampleID), file("reads.*.fastq.gz") from illuminaPairs

  output:
  set val(sampleID), file("tophat_out/accepted_hits.bam") into mappedPairs

  """
bowtie2-build $reference reference \
&& ln -s $reference reference.fa
tophat2 \
--num-threads ${task.cpus} \
--microexon-search \
--b2-very-sensitive \
--min-intron-length 5 \
--max-intron-length 200 \
--rg-id ${sampleID} \
--rg-sample '${runClassifier(sampleID)}' \
--rg-library '${sampleID}' \
--rg-description 'Illumina sequencing run ${sampleID} (${runClassifier(sampleID)})' \
--rg-platform ILLUMINA \
--library-type fr-firststrand \
reference \
reads.1.fastq.gz \
reads.2.fastq.gz
  """

My problem is that method calls seem to be getting mixed up, and I have .command.sh files where the results of runClassifier(sampleID) are different:

#!/bin/bash -ue
bowtie2-build Arab_Me14-0.03.fasta reference && ln -s Arab_Me14-0.03.fasta reference.fa
tophat2 --num-threads 4 --microexon-search --b2-very-sensitive --min-intron-length 5 --max-intron-length 200 --rg-id 34_50 --rg-sample 'early' --rg-library '34_50' --rg-description 'Illumina sequencing run 34_50 (medium)' --rg-platform ILLUMINA --library-type fr-firststrand reference reads.1.fastq.gz reads.2.fastq.gz
in the example above, the method call runClassifier(sampleID) returns early in the first invocation, but returns medium in the second invocation (with the same argument). Am I doing something screwy?
Is there something wrong with running regexp expressions at the same time?
Robert Syme
@robsyme
Jul 15 2016 06:39
For the moment, I've added map step before this process that sets the variable and passes it in through the input set.
Robert Syme
@robsyme
Jul 15 2016 06:53
... also:
wget http://www.nextflow.io/releases/v0.21.2-SNAPSHOT/nextflow-0.21.2-SNAPSHOT-one.jar
--2016-07-15 02:49:36--  http://www.nextflow.io/releases/v0.21.2-SNAPSHOT/nextflow-0.21.2-SNAPSHOT-one.jar
Resolving www.nextflow.io (www.nextflow.io)... 54.231.134.132
Connecting to www.nextflow.io (www.nextflow.io)|54.231.134.132|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-07-15 02:49:37 ERROR 404: Not Found.
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:05
Hey, I've read about attributes at the input and output channels in the documentation. Is there some place where the available attributes are listed?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:21
@robsyme Weird! it looks fine to me. I will try to test it
wget http://www.nextflow.io/releases/v0.21.2-SNAPSHOT/nextflow-0.21.2-SNAPSHOT-one.jar
--2016-07-15 02:49:36-- http://www.nextflow.io/releases/v0.21.2-SNAPSHOT/nextflow-0.21.2-SNAPSHOT-one.jar
Resolving www.nextflow.io (www.nextflow.io)... 54.231.134.132
Connecting to www.nextflow.io (www.nextflow.io)|54.231.134.132|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-07-15 02:49:37 ERROR 404: Not Found.
That version do not exist, if I've said to use that it's a typo. Please use 0.21.0-RC1 instead
@lukasjelonek I'm not getting what you are referring
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:24
input:
<input qualifier> <input name> [from <source channel>] [attributes]
the attributes part
I've seen mode flatten in one the example scripts. Are there more options?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:27
I see, actually no. It was planned to add more options but they have not been implemented yet
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:28
okay, thanks for the info. Maybe you can add a section on the attributes to the documentation?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:29
makes sense, thanks for the tip
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:29
What kind of not yet implemented attributes have you been thinking of?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:31
for example how to stage the input files (link or copy), or if include the hidden files in the output, etc
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:35
These are the features that I was looking for. I gave a course on high throughput data analysis the last four weeks and nextflow was part of the practical work. One of the students asked if it is possible to copy the files instead of symlinking it. So I looked up the documentation and ended up here ;)
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:37
I see, there's a feature request for that nextflow-io/nextflow#197
Vote for it :)
BTW where are you organising this course if I may ask ?
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:38
I'll tell the student to vote for it. For me it's not really important at the moment.
At the Justus Liebig University in Giessen/Germany for the bioinformatics and systems biology master students
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:39
Nice
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:41
I had quite good experiences with nextflow and wanted to check if it can be a good tool for teaching data analysis
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:42
Happy to know that the project is being used and taught
interesting point, and what's your feeling after the course ?
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:45
Now I know a lot of things that I will change the next time, regarding my teaching. Regarding nextflow I had statements from the extremes "This tool is awesome", "This tool is useless, I can do everything it does with perl if I need to" and everything between these two opinions
Robert Syme
@robsyme
Jul 15 2016 08:46
Gotcha. Thanks @pditommaso!!
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:46
This tool is useless, I can do everything it does with perl if I need to
:)
tell him/her that also assembler is a good option !
Robert Syme
@robsyme
Jul 15 2016 08:47
Ha!
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:47
@robsyme Wat?
Robert Syme
@robsyme
Jul 15 2016 08:48
I'll try and nail down a minimal script that generates that error and submit it as an issue.
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:49
that would be great, it looks very strange
I don't see how it could happen
Robert Syme
@robsyme
Jul 15 2016 08:49
Yeah. I'm confused too.
Paolo Di Tommaso
@pditommaso
Jul 15 2016 08:50
you may try to declare staticbut that method has no state, so I don't think that's the problem
Lukas Jelonek
@lukasjelonek
Jul 15 2016 08:53
I told him something like that, but I can't really remember what it was
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:11
Does nextflow cleanup the files when they are processed in a scratch directory?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:11
nope
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:13
Wouldn't it be good to have an option that removes the scratch dir after processing?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:13
yep
nextflow-io/nextflow#165
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:15
I'll vote for it :)
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:15
good!
bwt the generally cluster managers cleanup temp dir
so in most cases it should not be a problem
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:23
With our setup there is not auto cleanup for the scratch directory. Has to do with some very long running tasks. So everyone is expected to cleanup his remainings.
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:25
I see
removing just the temp directory would be easy to implement, the tricky part is to cleanup unused files when scratch is not used
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:29
This should be quite simple by collecting all files that are passed to the output channels and removing everything that is not within this list. Currently this should be possible with the afterScript directive, isn't it?
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:32
yes, if u are using scratch you can just write afterScript 'rm -rf*'
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:32
but then the directory will remain? I'll give it a try
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:33
a yes, well if so you could do afterScript 'rm -rf $PWD'
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:45
That works and it also removes all files that are not needed by the next steps :+1:
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:46
cool
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:48
Having this in a simple 'cleanup' option would be great and improve readability, but this will suffice for the moment :) . Thanks
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:48
I'm looking at the code for a better solution
Lukas Jelonek
@lukasjelonek
Jul 15 2016 14:50
I'm off now. Have a nice weekend
Paolo Di Tommaso
@pditommaso
Jul 15 2016 14:50
the same !