These are chat archives for nextflow-io/nextflow

25th
Jul 2017
Ashley S Doane
@DoaneAS
Jul 25 2017 03:29
thanks @mes5k that's very helpful, I'm trying this approach, will let you know how I make out
Simone Baffelli
@baffelli
Jul 25 2017 08:20
Good morning. Is it possible to set nextflows work dir to be somewhere else? I'm processing a long timeseries and I'm afraid I cannot fit it entirely on my disk
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:20
of course -w cli option
Simone Baffelli
@baffelli
Jul 25 2017 08:21
Ah, I see. And from nextflow.config?
Future me will be very happy to have everything store in a config file :grimacing:
nevermind, I found it out ;)
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:22
workDir = '/some/path'
:+1:
Simone Baffelli
@baffelli
Jul 25 2017 08:24
Is there anything nextflow can't do? :clap:
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:24
coffee :)
Simone Baffelli
@baffelli
Jul 25 2017 08:25
We managed to get it for free :confetti_ball:
But I'll be happy if it would cook dinner for me
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:25
LOL
Simone Baffelli
@baffelli
Jul 25 2017 08:26
Actually it could be done with some lab automation system?
controlled by nextflow
Simone Baffelli
@baffelli
Jul 25 2017 09:22
Another question: is nextflow handling of glob patterns different from the way bash treats them?
Because when using certain patterns with fromFilePairs nextflow does not return any file, but if is use the same pattern on my shell, I can find them
Paolo Di Tommaso
@pditommaso
Jul 25 2017 09:23
NF relies on Java glob pattern, which may not be identical to BASH
Simone Baffelli
@baffelli
Jul 25 2017 09:25
Well that explains a lot then. I guess exclude patterns are not supported by java
Simone Baffelli
@baffelli
Jul 25 2017 09:35
Or rather they work differently
Simone Baffelli
@baffelli
Jul 25 2017 10:05
I presume I cannot use a regex instead?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 10:25
yes, you can
specify a pattern object as ~/your-regex/ (if I'm not wrong) to fromPath
Simone Baffelli
@baffelli
Jul 25 2017 11:27
That's excellent :+1:
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:35
@pditommaso Said and done! Here's the pipeline: https://github.com/oskarvid/nextflow-GermlineVarCall/blob/master/bwamem.nf
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:36
fantastic
so what's the problematic part ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:37
When MergeBamFiles takes its input files it seems to take two random files, but it needs to take the correct pairs.
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:37
let me see
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:37
so there's eight pairs, one pair per lane, one file is unmapped and the other is mapped
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:37
do you mean MergeBamAlignment ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:38
+yeah
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:39
basically BwaMem_output and FastqToSam_output, right ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:39
yes
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:41
the easiest way to handle this is to keep the pair_id along with the output file
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:41
i.e don't use pair_id and pair_id2?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:42
yes, but you need to specify in the output, for example
in the Bwa_mem process, replace
 output:
    file "bwamem.sam" into BwaMem_output
with
 output:
    set pair_id, file("bwamem.sam") into BwaMem_output
the same for the FastqToSam process
then, since the process execution is parallel, the output ordered
hence you will need to create a channel containing the sam and bam for the same pair_id
does make sense ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:48
to begin with, it doesn't like that I'm using the channel "reads" for both bwa and fastqtosam
that's why I made two earlier, but I don't need to? or shouldn't?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:49
do you mean this
?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:49
yes
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:50
you can have the same channel as input for more than a process, but you can simplify that as shown below
Channel
  .fromFilePairs( "/data/workspace/data/Samples/NA12878-rep7_S7_L00*_R{1,2}_001.fastq.gz", flat: true) 
  .into { reads; reads2 }
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:59
It's still not clear to me what I need to do, but I
oops
but I'm going home now for today, will look at it tomorrow
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:59
:ok_hand:
Félix C. Morency
@fmorency
Jul 25 2017 14:58
@pditommaso do you have an ETA for 0.25.3? :D
Dani Soronellas
@dsoronellas
Jul 25 2017 14:59
Hi! I started to using nextflow which I find amazing! I wanted to create a small cluster for aws just for testing using the following CMD: nextflow cloud create test-cluster (up to 3 t2.micro instances). I'm having problems to run the CMD as it complains saying: "ERROR ~ Cannot cast object 'null' with class 'null' to class 'int'. Try 'java.lang.Integer' instead". Any ideas how to solve this? Thanks in advance! :smile:
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:00
@fmorency hopefully, thu or fri
Félix C. Morency
@fmorency
Jul 25 2017 15:00
@pditommaso awesome thanks
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:00
@dsoronellas yes, we just spotted this issue nextflow-io/nextflow#408
use the command in the last comment
Dani Soronellas
@dsoronellas
Jul 25 2017 15:01
ok! I go for it! Thanks for the immediate response hehe
Simone Baffelli
@baffelli
Jul 25 2017 15:06
Is it possible to clean only the older results from nextflows cache?
Félix C. Morency
@fmorency
Jul 25 2017 15:06
yes, see nextflow clean
Simone Baffelli
@baffelli
Jul 25 2017 15:07
yes, but does it clean only the older ones?
I'm afraid to type it :fearful:
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:08
check the help
Simone Baffelli
@baffelli
Jul 25 2017 15:09
I see
;)
Félix C. Morency
@fmorency
Jul 25 2017 15:10
You can execute a dry run to see what will be removed.
It will list the files that will be removed without removing them.
Simone Baffelli
@baffelli
Jul 25 2017 15:17
fantastic
Sergey Venev
@sergpolly
Jul 25 2017 15:34

Hi,

I managed to reproduce my storeDir issue by an unfortunate coincidence... This time, the whole pipeline was terminated due to critical failure in some process. Because of that, some upstream processes that were copying results to storeDir were also terminated/aborted etc. I relaunched the pipeline (actually without -resume flag) and it took off with whatever was in the storeDir folder - regrardless of whether the upstream was aborted, results were incomplete or whatever. It seems that the only thing nextflow cares in such cases is the content of storeDir. Is it expected behavior? Is there a way to make nextflow check if copying to storeDir went ok?

Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:38
unfortunately this is a side effect of storeDir
do you have more than a process writing to the same storeDir ?
Sergey Venev
@sergpolly
Jul 25 2017 15:38
Yes
I mean multiple instance of the same process
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:39
you can try to mitigate this problem with errorStrategy = 'finish'
Sergey Venev
@sergpolly
Jul 25 2017 15:39
Let's say, when we are mapping chunks , bam files go to the same storeDir
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:42
however, this is the reason why I suggest to use publishDir in place of storeDir whenever possible
Sergey Venev
@sergpolly
Jul 25 2017 15:42
Can I do errorStrategy = { task.attempt>2 ? 'finish' : 'retry' } ?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:42
yes
Sergey Venev
@sergpolly
Jul 25 2017 15:43
Ok, I'll try to read about publishDir behavior, and see if we could rewrite the pipeline in accordance with the best practices ...
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:45
you should use storeDir only when it's needed a cache across different execution of the pipeline
Sergey Venev
@sergpolly
Jul 25 2017 15:45
Thank you!
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:45
:+1:
Sergey Venev
@sergpolly
Jul 25 2017 15:45
what do you mean like a cache for different executions?
say, if i'd want to change some parameters and relaunch?
something like that>?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:46
for example you download a big database file, and you want to keep in the local storage
if i'd want to change some parameters and relaunch?
no in this case you won't need storeDir
just use the -resume mechanism
Sergey Venev
@sergpolly
Jul 25 2017 15:47
ok, I need to read more about publishDir to see the difference
do -resume check the contents of the work directory?
if there is no storeDir?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:48
yes
Sergey Venev
@sergpolly
Jul 25 2017 15:49
oh! I see - i didn't know that
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:49
that's guaranteed to be consistent
Sergey Venev
@sergpolly
Jul 25 2017 15:49
and does -resume check things like .exitcode ?
like, if the process was actually successful ?
or is it just content-based?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:50
check the task exitcode and the existance of the expected files
Sergey Venev
@sergpolly
Jul 25 2017 15:51
that sounds exactly what we'd need! I have no idea why they used so many storeDir in the pipeline to begin with ...
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:51
nice
Sergey Venev
@sergpolly
Jul 25 2017 15:51
I didn't write the pipeline from scratch
I'm just trying to run and adjust it on a cluster
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:52
which is the pipeline ?
Sergey Venev
@sergpolly
Jul 25 2017 15:52
distiller
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:53
I would keep storeDir only here
Sergey Venev
@sergpolly
Jul 25 2017 15:54
Now I see, what kind of cache you've meant ...
Thank you so much Paolo!
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:55
you are welcome
Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:05

When I run this:
./nextflow cloud create optitype-cluster -c 4

I get the following error:

> cluster name: optitype-cluster
> instances count: 4
> Launch configuration:
 - driver: 'aws'
 - imageId: 'ami-cc2d64da'
 - instanceType: 'c4.xlarge'
 - keyName: 'xxxxxxxxx'
 - securityGroup: 'xxxxxxxxx'
 - subnetId: 'xxxxxxxxx'
 - userName: 'ubuntu'

Please confirm you really want to launch the cluster with above configuration [y/n] y
Fetching EC2 prices (it can take a few seconds depending your internet connection) ..
ERROR ~ Cannot cast object 'null' with class 'null' to class 'int'. Try 'java.lang.Integer' instead

Any thoughts on why this might be happening? This was working fine for me a couple weeks ago. Any help would be much appreciated!

Paolo Di Tommaso
@pditommaso
Jul 25 2017 16:08
it seems AWS pushed some dirty data in the price file
nextflow-io/nextflow#408
there's a temporary workaround in the last comment
Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:18

@pditommaso Doing the following worked.

export NXF_VER=0.25.3-SNAPSHOT

Thanks!

Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:43

Running this command on the master node

./nextflow run nmdp-bioinformatics/flow-Optitype \
    --with-docker nmdpbioinformatics/flow-OptiType \
    --outfile hli-optitype.csv \
    --bamdir s3://bucket/s3/data \
    --datatype dna

Returns the following errors:

N E X T F L O W  ~  version 0.25.3-SNAPSHOT
Pulling nmdp-bioinformatics/flow-Optitype ...
 downloaded from https://github.com/nmdp-bioinformatics/flow-OptiType.git
Launching `nmdp-bioinformatics/flow-Optitype` [sleepy_jang] - revision: 6fcb330fe1 [master]

---------------------------------------------------------------
NEXTFLOW OPTITYPE
---------------------------------------------------------------
Input BAM folder   (--bamdir)          : s3://bucket/s3/data
Sequence data type (--datatype)        : dna
Output file name   (--outfile)         : hli-optitype.csv


[warm up] executor > ignite
ERROR ~ ip-xxxxx-xx: ip-1xxxx-xx-xx: Name or service not known

 -- Check script 'main.nf' at line: 68 or see '.nextflow.log' file for more details

What's the best way to share log data? Thanks, Mike.

Paolo Di Tommaso
@pditommaso
Jul 25 2017 16:46
open an issue on GH and upload there the log file, thanks
Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:51
nextflow-io/nextflow#409