These are chat archives for nextflow-io/nextflow

25th
Jul 2017
Ashley S Doane
@DoaneAS
Jul 25 2017 03:29 UTC
thanks @mes5k that's very helpful, I'm trying this approach, will let you know how I make out
Simone Baffelli
@baffelli
Jul 25 2017 08:20 UTC
Good morning. Is it possible to set nextflows work dir to be somewhere else? I'm processing a long timeseries and I'm afraid I cannot fit it entirely on my disk
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:20 UTC
of course -w cli option
Simone Baffelli
@baffelli
Jul 25 2017 08:21 UTC
Ah, I see. And from nextflow.config?
Future me will be very happy to have everything store in a config file :grimacing:
nevermind, I found it out ;)
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:22 UTC
workDir = '/some/path'
:+1:
Simone Baffelli
@baffelli
Jul 25 2017 08:24 UTC
Is there anything nextflow can't do? :clap:
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:24 UTC
coffee :)
Simone Baffelli
@baffelli
Jul 25 2017 08:25 UTC
We managed to get it for free :confetti_ball:
But I'll be happy if it would cook dinner for me
Paolo Di Tommaso
@pditommaso
Jul 25 2017 08:25 UTC
LOL
Simone Baffelli
@baffelli
Jul 25 2017 08:26 UTC
Actually it could be done with some lab automation system?
controlled by nextflow
Simone Baffelli
@baffelli
Jul 25 2017 09:22 UTC
Another question: is nextflow handling of glob patterns different from the way bash treats them?
Because when using certain patterns with fromFilePairs nextflow does not return any file, but if is use the same pattern on my shell, I can find them
Paolo Di Tommaso
@pditommaso
Jul 25 2017 09:23 UTC
NF relies on Java glob pattern, which may not be identical to BASH
Simone Baffelli
@baffelli
Jul 25 2017 09:25 UTC
Well that explains a lot then. I guess exclude patterns are not supported by java
Simone Baffelli
@baffelli
Jul 25 2017 09:35 UTC
Or rather they work differently
Simone Baffelli
@baffelli
Jul 25 2017 10:05 UTC
I presume I cannot use a regex instead?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 10:25 UTC
yes, you can
specify a pattern object as ~/your-regex/ (if I'm not wrong) to fromPath
Simone Baffelli
@baffelli
Jul 25 2017 11:27 UTC
That's excellent :+1:
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:35 UTC
@pditommaso Said and done! Here's the pipeline: https://github.com/oskarvid/nextflow-GermlineVarCall/blob/master/bwamem.nf
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:36 UTC
fantastic
so what's the problematic part ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:37 UTC
When MergeBamFiles takes its input files it seems to take two random files, but it needs to take the correct pairs.
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:37 UTC
let me see
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:37 UTC
so there's eight pairs, one pair per lane, one file is unmapped and the other is mapped
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:37 UTC
do you mean MergeBamAlignment ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:38 UTC
+yeah
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:39 UTC
basically BwaMem_output and FastqToSam_output, right ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:39 UTC
yes
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:41 UTC
the easiest way to handle this is to keep the pair_id along with the output file
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:41 UTC
i.e don't use pair_id and pair_id2?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:42 UTC
yes, but you need to specify in the output, for example
in the Bwa_mem process, replace
 output:
    file "bwamem.sam" into BwaMem_output
with
 output:
    set pair_id, file("bwamem.sam") into BwaMem_output
the same for the FastqToSam process
then, since the process execution is parallel, the output ordered
hence you will need to create a channel containing the sam and bam for the same pair_id
does make sense ?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:48 UTC
to begin with, it doesn't like that I'm using the channel "reads" for both bwa and fastqtosam
that's why I made two earlier, but I don't need to? or shouldn't?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:49 UTC
do you mean this
?
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:49 UTC
yes
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:50 UTC
you can have the same channel as input for more than a process, but you can simplify that as shown below
Channel
  .fromFilePairs( "/data/workspace/data/Samples/NA12878-rep7_S7_L00*_R{1,2}_001.fastq.gz", flat: true) 
  .into { reads; reads2 }
Oskar Vidarsson
@oskarvid
Jul 25 2017 13:59 UTC
It's still not clear to me what I need to do, but I
oops
but I'm going home now for today, will look at it tomorrow
Paolo Di Tommaso
@pditommaso
Jul 25 2017 13:59 UTC
:ok_hand:
Félix C. Morency
@fmorency
Jul 25 2017 14:58 UTC
@pditommaso do you have an ETA for 0.25.3? :D
Dani Soronellas
@dsoronellas
Jul 25 2017 14:59 UTC
Hi! I started to using nextflow which I find amazing! I wanted to create a small cluster for aws just for testing using the following CMD: nextflow cloud create test-cluster (up to 3 t2.micro instances). I'm having problems to run the CMD as it complains saying: "ERROR ~ Cannot cast object 'null' with class 'null' to class 'int'. Try 'java.lang.Integer' instead". Any ideas how to solve this? Thanks in advance! :smile:
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:00 UTC
@fmorency hopefully, thu or fri
Félix C. Morency
@fmorency
Jul 25 2017 15:00 UTC
@pditommaso awesome thanks
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:00 UTC
@dsoronellas yes, we just spotted this issue nextflow-io/nextflow#408
use the command in the last comment
Dani Soronellas
@dsoronellas
Jul 25 2017 15:01 UTC
ok! I go for it! Thanks for the immediate response hehe
Simone Baffelli
@baffelli
Jul 25 2017 15:06 UTC
Is it possible to clean only the older results from nextflows cache?
Félix C. Morency
@fmorency
Jul 25 2017 15:06 UTC
yes, see nextflow clean
Simone Baffelli
@baffelli
Jul 25 2017 15:07 UTC
yes, but does it clean only the older ones?
I'm afraid to type it :fearful:
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:08 UTC
check the help
Simone Baffelli
@baffelli
Jul 25 2017 15:09 UTC
I see
;)
Félix C. Morency
@fmorency
Jul 25 2017 15:10 UTC
You can execute a dry run to see what will be removed.
It will list the files that will be removed without removing them.
Simone Baffelli
@baffelli
Jul 25 2017 15:17 UTC
fantastic
Sergey Venev
@sergpolly
Jul 25 2017 15:34 UTC

Hi,

I managed to reproduce my storeDir issue by an unfortunate coincidence... This time, the whole pipeline was terminated due to critical failure in some process. Because of that, some upstream processes that were copying results to storeDir were also terminated/aborted etc. I relaunched the pipeline (actually without -resume flag) and it took off with whatever was in the storeDir folder - regrardless of whether the upstream was aborted, results were incomplete or whatever. It seems that the only thing nextflow cares in such cases is the content of storeDir. Is it expected behavior? Is there a way to make nextflow check if copying to storeDir went ok?

Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:38 UTC
unfortunately this is a side effect of storeDir
do you have more than a process writing to the same storeDir ?
Sergey Venev
@sergpolly
Jul 25 2017 15:38 UTC
Yes
I mean multiple instance of the same process
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:39 UTC
you can try to mitigate this problem with errorStrategy = 'finish'
Sergey Venev
@sergpolly
Jul 25 2017 15:39 UTC
Let's say, when we are mapping chunks , bam files go to the same storeDir
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:42 UTC
however, this is the reason why I suggest to use publishDir in place of storeDir whenever possible
Sergey Venev
@sergpolly
Jul 25 2017 15:42 UTC
Can I do errorStrategy = { task.attempt>2 ? 'finish' : 'retry' } ?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:42 UTC
yes
Sergey Venev
@sergpolly
Jul 25 2017 15:43 UTC
Ok, I'll try to read about publishDir behavior, and see if we could rewrite the pipeline in accordance with the best practices ...
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:45 UTC
you should use storeDir only when it's needed a cache across different execution of the pipeline
Sergey Venev
@sergpolly
Jul 25 2017 15:45 UTC
Thank you!
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:45 UTC
:+1:
Sergey Venev
@sergpolly
Jul 25 2017 15:45 UTC
what do you mean like a cache for different executions?
say, if i'd want to change some parameters and relaunch?
something like that>?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:46 UTC
for example you download a big database file, and you want to keep in the local storage
if i'd want to change some parameters and relaunch?
no in this case you won't need storeDir
just use the -resume mechanism
Sergey Venev
@sergpolly
Jul 25 2017 15:47 UTC
ok, I need to read more about publishDir to see the difference
do -resume check the contents of the work directory?
if there is no storeDir?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:48 UTC
yes
Sergey Venev
@sergpolly
Jul 25 2017 15:49 UTC
oh! I see - i didn't know that
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:49 UTC
that's guaranteed to be consistent
Sergey Venev
@sergpolly
Jul 25 2017 15:49 UTC
and does -resume check things like .exitcode ?
like, if the process was actually successful ?
or is it just content-based?
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:50 UTC
check the task exitcode and the existance of the expected files
Sergey Venev
@sergpolly
Jul 25 2017 15:51 UTC
that sounds exactly what we'd need! I have no idea why they used so many storeDir in the pipeline to begin with ...
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:51 UTC
nice
Sergey Venev
@sergpolly
Jul 25 2017 15:51 UTC
I didn't write the pipeline from scratch
I'm just trying to run and adjust it on a cluster
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:52 UTC
which is the pipeline ?
Sergey Venev
@sergpolly
Jul 25 2017 15:52 UTC
distiller
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:53 UTC
I would keep storeDir only here
Sergey Venev
@sergpolly
Jul 25 2017 15:54 UTC
Now I see, what kind of cache you've meant ...
Thank you so much Paolo!
Paolo Di Tommaso
@pditommaso
Jul 25 2017 15:55 UTC
you are welcome
Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:05 UTC

When I run this:
./nextflow cloud create optitype-cluster -c 4

I get the following error:

> cluster name: optitype-cluster
> instances count: 4
> Launch configuration:
 - driver: 'aws'
 - imageId: 'ami-cc2d64da'
 - instanceType: 'c4.xlarge'
 - keyName: 'xxxxxxxxx'
 - securityGroup: 'xxxxxxxxx'
 - subnetId: 'xxxxxxxxx'
 - userName: 'ubuntu'

Please confirm you really want to launch the cluster with above configuration [y/n] y
Fetching EC2 prices (it can take a few seconds depending your internet connection) ..
ERROR ~ Cannot cast object 'null' with class 'null' to class 'int'. Try 'java.lang.Integer' instead

Any thoughts on why this might be happening? This was working fine for me a couple weeks ago. Any help would be much appreciated!

Paolo Di Tommaso
@pditommaso
Jul 25 2017 16:08 UTC
it seems AWS pushed some dirty data in the price file
nextflow-io/nextflow#408
there's a temporary workaround in the last comment
Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:18 UTC

@pditommaso Doing the following worked.

export NXF_VER=0.25.3-SNAPSHOT

Thanks!

Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:43 UTC

Running this command on the master node

./nextflow run nmdp-bioinformatics/flow-Optitype \
    --with-docker nmdpbioinformatics/flow-OptiType \
    --outfile hli-optitype.csv \
    --bamdir s3://bucket/s3/data \
    --datatype dna

Returns the following errors:

N E X T F L O W  ~  version 0.25.3-SNAPSHOT
Pulling nmdp-bioinformatics/flow-Optitype ...
 downloaded from https://github.com/nmdp-bioinformatics/flow-OptiType.git
Launching `nmdp-bioinformatics/flow-Optitype` [sleepy_jang] - revision: 6fcb330fe1 [master]

---------------------------------------------------------------
NEXTFLOW OPTITYPE
---------------------------------------------------------------
Input BAM folder   (--bamdir)          : s3://bucket/s3/data
Sequence data type (--datatype)        : dna
Output file name   (--outfile)         : hli-optitype.csv


[warm up] executor > ignite
ERROR ~ ip-xxxxx-xx: ip-1xxxx-xx-xx: Name or service not known

 -- Check script 'main.nf' at line: 68 or see '.nextflow.log' file for more details

What's the best way to share log data? Thanks, Mike.

Paolo Di Tommaso
@pditommaso
Jul 25 2017 16:46 UTC
open an issue on GH and upload there the log file, thanks
Michael Halagan
@mhalagan-nmdp
Jul 25 2017 16:51 UTC
nextflow-io/nextflow#409