These are chat archives for nextflow-io/nextflow

25th
Aug 2017
marchoeppner
@marchoeppner
Aug 25 2017 08:22
hi - quick question: I would like to a) split a fastq file (using a bash script..) and then pass each chunk to a downstream stage for each chromosome in a list of chromosomes (basically, divide the problem of read alignment and variant calling into smaller problems). I thought something like chromosomes.combine(chunks) would work, but that diesnt seem to produce pairs of chunks and chromosomes. Any idea how to do this?
currently it matches each chromosome with all chunks
so something like [ "chr1", file1, file2, file3] - what I need is: [ "chr1", file1], ["chr1",file2], ...and soon
marchoeppner
@marchoeppner
Aug 25 2017 08:28
ah I think the issue is that the process producing chunks for each fastq files emits a list of chunks as one object to be passed downstream, so I probably need to separate that first
marchoeppner
@marchoeppner
Aug 25 2017 08:37
(i.e. : flatten()) - works now, nvm ^^
Luca Cozzuto
@lucacozzuto
Aug 25 2017 08:57
hi a related question: after doing the flatten (I suppose with flatMap()) how can we collect the results for id produced by one of the two channels? (in the previous example for fasta file for instance)
?
Francesco Strozzi
@fstrozzi
Aug 25 2017 09:13
hi guys, simple question: which is the max number of tasks NF can handle ? Just curious to know the experiences of people submitting huge workloads with for example hundreds of samples to analyse. Is there an hard limit that someone has experienced or everything runs smoothly even with 1000 samples in parallel ?
Paolo Di Tommaso
@pditommaso
Aug 25 2017 12:34
There's no such hard limit (other than the JVM heap size), NF can easily habdle thousands of parallel jobs, I've personally run pipeline spawning some million of tasks
@lucacozzuto not clear what your trying to do
Francesco Strozzi
@fstrozzi
Aug 25 2017 12:38
:+1:
Luca Cozzuto
@lucacozzuto
Aug 25 2017 12:41
Paolo I have the following problem
    set val(bamid), file(bamfile) from bam_files
    each file(bedinterval) from bedintervals

   output:
    set bamid, file "${bamid}.*.g.vcf" into results
`
bamid works only if I remove it from (set bamid)
Paolo Di Tommaso
@pditommaso
Aug 25 2017 12:43
@mes5k has deployed some big pipelines, he can add some notes
@lucacozzuto use parenthesis on file, file("xxx")
Luca Cozzuto
@lucacozzuto
Aug 25 2017 13:04
god bless @pditommaso ! Now I still have to problem to collect the output files per bamid.. ;)
Francesco Strozzi
@fstrozzi
Aug 25 2017 13:07
@pditommaso I’m working on the Java code for S3FS. I hope you are having a very relaxing holiday because I have a tsunami of questions for you once back :)
Paolo Di Tommaso
@pditommaso
Aug 25 2017 13:10
You are still in time to join the NF workshop
:sunglasses:
Francesco Strozzi
@fstrozzi
Aug 25 2017 13:14
err ….deadline 15h Jun for registration ?
the form is no longer available
Paolo Di Tommaso
@pditommaso
Aug 25 2017 13:17
If you are interested write to training@crg.eu putting me in cc asap
Evan Floden
@evanfloden
Aug 25 2017 13:41
When running singularity and use docker://myimage it appears as though the PATH variable from the original container is not present.

Docker:

-bash-4.2$ docker run -ti -v $PWD:/here cbcrg/tcoffee-dpa:latest
root@0602a3b050c8:/# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/mafft/bin

Singularity:

-bash-4.2$ singularity shell cbcrg-tcoffee-dpa.img
Singularity: Invoking an interactive shell within container...

Singularity cbcrg-tcoffee-dpa.img:> echo $PATH
/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
*reduced commands for readibility
Paolo Di Tommaso
@pditommaso
Aug 25 2017 13:45
Frankly I don't know if it's expected or a singularity bug, you may want to clarify with the Singularity guys
Said that maybe just using the standard paths in the original container it a easy workaround
*it's
Luca Cozzuto
@lucacozzuto
Aug 25 2017 13:46
I had similar issue, how you made your singularity container?
Evan Floden
@evanfloden
Aug 25 2017 13:47
Yeah... Also maybe difference between ENV and RUN export PATH=...
It is made via auto pulling from Dockerhub. Dockerfile is here
Luca Cozzuto
@lucacozzuto
Aug 25 2017 13:52
Paolo Di Tommaso
@pditommaso
Aug 25 2017 13:52
Move mafft, etc in the /use/local/bin path et voilà
Evan Floden
@evanfloden
Aug 25 2017 13:53
Yeah will do. :+1:
Mike Smoot
@mes5k
Aug 25 2017 17:54
@fstrozzi and @pditommaso I'll try and write up something about the performance challenges we've encountered with nextflow and what we've done about them, but it'll take me a day or two (because I'm busy not because there are so many problems! :))
Francesco Strozzi
@fstrozzi
Aug 25 2017 17:56
@mes5k no problem! Just curious to know others people experiences