These are chat archives for nextflow-io/nextflow

29th
Sep 2017
Francesco Strozzi
@fstrozzi
Sep 29 2017 08:27 UTC
hello, what is the best solution to use a file declared as input from a channel, into a python script in a NF process ? So if I am declaring a file(sequences) as input, how I can then pass the sequences variable to the inline python script ? The examples show this using stdin instead of explicitly declaring a file variable…is it the only option ?
Paolo Di Tommaso
@pditommaso
Sep 29 2017 08:29 UTC
of course, you can declare a stdin input type
@aunderwo you may want to try to fix the permission problem with this setting docker.runOptions = "-u \$(id -u):\$(id -g) "
(in the nextflow.config file)
Francesco Strozzi
@fstrozzi
Sep 29 2017 08:40 UTC
thanks @pditommaso
Paolo Di Tommaso
@pditommaso
Sep 29 2017 08:40 UTC
:+1:
Luca Cozzuto
@lucacozzuto
Sep 29 2017 11:36 UTC
Hi guys. I'm wondering if we can make splitfasta more effective (I know that NF has this tool only for historical reason)
I'm thinking to a splitter that is able to make more or less files with homogeneous size
(and btw this could be good also for a possible splitbed function)
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:37 UTC
not in term of number of reads but bytes ?
Luca Cozzuto
@lucacozzuto
Sep 29 2017 11:38 UTC
Imagine that you have 10 sequences: 1 very large and 9 small. Splitting them each one per file is not helping a lot while splitting in two groups (the largest in one and the small one on the other) allows to balance the parallelization
A sort of clustering per size
Where you specify the number of clusters (k-?)
This without breaking the sequences
Evan Floden
@evanfloden
Sep 29 2017 11:39 UTC
We wrote or atleast talked doing exactly this
say chr 1 is huge, then you have heaps of small contigs, you want the split to be equal in the number of nucleotides
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:40 UTC
a fasta, not fastq
Luca Cozzuto
@lucacozzuto
Sep 29 2017 11:40 UTC
I know that there is a perl script that does this day
No fastq are often equal in size
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:40 UTC
it makes sense
Luca Cozzuto
@lucacozzuto
Sep 29 2017 11:41 UTC
(except with pacbio and nanopore)
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:41 UTC
please open an issue for that
Evan Floden
@evanfloden
Sep 29 2017 11:41 UTC
@pditommaso, we did this already and I sent it to you one Friday afternoon last year.
i try find it now
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:42 UTC
really? I don't remember ..
one Friday afternoon last year
how do you remember that ? :D
Evan Floden
@evanfloden
Sep 29 2017 11:45 UTC
Because I remember being very pumped about it, and I had a glass on wine in my hand as I was writing it :wine_glass: Need a better search function for gitter!
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:46 UTC
but I think you got too much :wine_glass: :joy:
I've never seen such feature request
Evan Floden
@evanfloden
Sep 29 2017 11:48 UTC
Found it. I called it the splitGenome operator :wink:
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:49 UTC
that's *perl* !
Evan Floden
@evanfloden
Sep 29 2017 11:50 UTC
maybe that was the wrong link
Francesco Strozzi
@fstrozzi
Sep 29 2017 11:50 UTC
mother of god
Paolo Di Tommaso
@pditommaso
Sep 29 2017 11:50 UTC
lol
ok, @lucacozzuto please open an issue for that
I would need a keyboard with a please open an issue key :)
Francesco Strozzi
@fstrozzi
Sep 29 2017 11:53 UTC
:)
Evan Floden
@evanfloden
Sep 29 2017 13:40 UTC
Blog post round up of the hackathon projects from #nfhack17 here
Paolo Di Tommaso
@pditommaso
Sep 29 2017 13:42 UTC
:clap: :clap: :clap:
Félix C. Morency
@fmorency
Sep 29 2017 13:44 UTC
\o/
Venkat Malladi
@vsmalladi
Sep 29 2017 14:40 UTC
@skptic thanks
when is the next hackathon?
Paolo Di Tommaso
@pditommaso
Sep 29 2017 14:42 UTC
not before next year .. :)
Francesco Strozzi
@fstrozzi
Sep 29 2017 14:53 UTC
so, January ? :)
Félix C. Morency
@fmorency
Sep 29 2017 14:55 UTC
:D
Paolo Di Tommaso
@pditommaso
Sep 29 2017 14:57 UTC
;)
Francesco Strozzi
@fstrozzi
Sep 29 2017 15:04 UTC
anyway it was a nice and productive Friday, developed and tested a small pipeline locally on my Mac using NF and Docker. Then when happy I just switched the executor and run the whole thing on AWS via Batch. :+1:
SO MUCH LESS PAIN
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:05 UTC
nice
Félix C. Morency
@fmorency
Sep 29 2017 15:05 UTC
ikr
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:05 UTC
how are going the tests ?
Francesco Strozzi
@fstrozzi
Sep 29 2017 15:06 UTC
good, passed over NF to a colleague here who developed the pipeline locally on our cluster and then run everything on Batch. Multiple samples, fairly complex pipeline with 5-7 hrs of running time per sample. It simply worked
We tested also what happens when for example, a spot instance is killed because the price is higher than the maximum bid price set. In that case the “error” is directly captured and managed by Batch, who simply re-run the job if you set the retry option in your job definition.
For NF nothing happens and it just keep waiting for the job to complete…so nice
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:08 UTC
achievement unlock :trophy:
Anthony Underwood
@aunderwo
Sep 29 2017 15:08 UTC
@aunderwo you may want to try to fix the permission problem with this setting docker.runOptions = "-u \$(id -u):\$(id -g) "
Thanks
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:09 UTC
did it work ?
Anthony Underwood
@aunderwo
Sep 29 2017 15:09 UTC
@pditommaso sorry can't test till Monday :(
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:09 UTC
sure, no hurry
Venkat Malladi
@vsmalladi
Sep 29 2017 15:10 UTC
Ya trying to write my pipelines with unittests and integration tests
locally using docker and deploy on our cluster then publish with DOI
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:11 UTC
sounds cool
you mean the container image DOI ?
Venkat Malladi
@vsmalladi
Sep 29 2017 15:11 UTC
Ya container image DOI and Pipeline DOI
Anthony Underwood
@aunderwo
Sep 29 2017 15:12 UTC

:thumbsup: That would be awesome?

Ya trying to write my pipelines with unittests and integration tests

Venkat Malladi
@vsmalladi
Sep 29 2017 15:12 UTC
ya hard to write the unit tests per process
might be easier to run the entire pipeline
and then test output for each process
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:13 UTC
I prefer the second approach along with a CI server
Venkat Malladi
@vsmalladi
Sep 29 2017 15:13 UTC
Okay, that was my initial plan. To run the pipeline on a couple of datasets and then have have tests per process to check out put
using a CI server
and unit test any code in my script that is about error handling
Félix C. Morency
@fmorency
Sep 29 2017 15:27 UTC
We're testing the entire pipeline and test outputs here using gitlab CI
We also have a test suite for our python frontend
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:28 UTC
well done, using the overall dataset or a sample ?
Venkat Malladi
@vsmalladi
Sep 29 2017 15:28 UTC
@fmorency do you have a link for an example?
Félix C. Morency
@fmorency
Sep 29 2017 15:29 UTC
We're using toy data generated on-the-fly and some real data stored on another gitlab repo
@vsmalladi Unfortunately no since we're a business but I'm willing to share the knowledge :)
Venkat Malladi
@vsmalladi
Sep 29 2017 15:30 UTC
@fmorency okay, ya any help would be appreiated
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:30 UTC
@fmorency you owe a blog post to the community :)
Venkat Malladi
@vsmalladi
Sep 29 2017 15:30 UTC
I am writing all the process-steps in python
Félix C. Morency
@fmorency
Sep 29 2017 15:30 UTC
@pditommaso yeah I owe you a couple of those
Anthony Underwood
@aunderwo
Sep 29 2017 15:32 UTC
I'll be writing up my AWS experiences at some stage :)
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:32 UTC
eager to learn
Venkat Malladi
@vsmalladi
Sep 29 2017 15:35 UTC
ya me too
once I get mine up and running I can write a blog post
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:35 UTC
:+1:
Félix C. Morency
@fmorency
Sep 29 2017 15:36 UTC
if I write some post can you host them somewhere @pditommaso
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:36 UTC
it will be a pleasure
Félix C. Morency
@fmorency
Sep 29 2017 15:36 UTC
awesome
Venkat Malladi
@vsmalladi
Sep 29 2017 15:42 UTC
ya where would be the place to post them
Mike Smoot
@mes5k
Sep 29 2017 15:43 UTC
:+1 for splitting fasta by size. Also an option for using fasta record id as filename would be nice.
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:44 UTC
nice => open an issue on GH :)
Mike Smoot
@mes5k
Sep 29 2017 15:45 UTC
Apparently, I need a button for that on my keyboard too! :)
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:45 UTC
@vsmalladi I would be happy to publish guest post on the NF blog
@mes5k you need a Pull request one :joy:
Mike Smoot
@mes5k
Sep 29 2017 15:46 UTC
It could go with my stack overflow keyboard: https://i.imgur.com/phVdsuQ.jpg
Paolo Di Tommaso
@pditommaso
Sep 29 2017 15:46 UTC
lol
Félix C. Morency
@fmorency
Sep 29 2017 15:49 UTC
pff not even a mk
Mike Smoot
@mes5k
Sep 29 2017 17:03 UTC
Hi @pditommaso can you share the NXF_GITHUB_ACCESS_TOKEN required to run all the tests in AssetManagerTest?
Paolo Di Tommaso
@pditommaso
Sep 29 2017 17:04 UTC
well, not sure it's a good idea .. ;)
can't use your ?
Mike Smoot
@mes5k
Sep 29 2017 17:05 UTC
Not sure where to find it...
Paolo Di Tommaso
@pditommaso
Sep 29 2017 17:05 UTC
wait
settings > personal access tokens
Mike Smoot
@mes5k
Sep 29 2017 17:07 UTC
Any idea what scope to choose?
Paolo Di Tommaso
@pditommaso
Sep 29 2017 17:08 UTC
I'm using Full control of private repositories
Mike Smoot
@mes5k
Sep 29 2017 17:08 UTC
Alright, I'll give that a try
Paolo Di Tommaso
@pditommaso
Sep 29 2017 17:09 UTC
:+1:
Mike Smoot
@mes5k
Sep 29 2017 17:11 UTC
Perfect, I have a test for nextflow-io/nextflow#389 that fails! I'll try and get a patch together.
Paolo Di Tommaso
@pditommaso
Sep 29 2017 17:32 UTC
nice, thanks a lot