These are chat archives for nextflow-io/nextflow

6th
Jun 2016
Hugues Fontenelle
@huguesfontenelle
Jun 06 2016 16:06
@pditommaso I opened an issue instead, where I explain better. I doubt it's a bug, more like a "feature"
nextflow-io/nextflow#172
Paolo Di Tommaso
@pditommaso
Jun 06 2016 16:08
let me try
actually it stops (quite) immediately on MY computer .. :)
Paolo Di Tommaso
@pditommaso
Jun 06 2016 16:14
is this happening in your slurm cluster ?
Hugues Fontenelle
@huguesfontenelle
Jun 06 2016 16:19
using the local executor actually.
Paolo Di Tommaso
@pditommaso
Jun 06 2016 16:19
um, strange
Hugues Fontenelle
@huguesfontenelle
Jun 06 2016 16:20
the pipeline does stop, but ps -aux | grep sleep reveals that those are still running
Paolo Di Tommaso
@pditommaso
Jun 06 2016 16:21
If so please attach the .nextflow.log file and the output of ps fx in the issue
Hugues Fontenelle
@huguesfontenelle
Jun 06 2016 16:21
ok
Paolo Di Tommaso
@pditommaso
Jun 06 2016 16:24
Oops, you are right
58048 pts/36   S      0:00 /bin/bash -ue .command.run
58105 pts/36   S      0:00  \_ tee .command.out
58109 pts/36   S      0:00  \_ tee .command.err
58112 pts/36   S      0:00  \_ /bin/bash -ue /users/cn/ptommaso/scratch/e2/5b4ce8d644041f0b72c8ca2923b82d/.command.sh
58123 pts/36   S      0:00      \_ sleep 1h
58046 pts/36   S      0:00 /bin/bash -ue .command.run
58107 pts/36   S      0:00  \_ tee .command.out
58111 pts/36   S      0:00  \_ tee .command.err
58114 pts/36   S      0:00  \_ /bin/bash -ue /users/cn/ptommaso/scratch/04/889826b6c48ce95d0ac734b6c4a948/.command.sh
58121 pts/36   S      0:00      \_ sleep 1h
58044 pts/36   S      0:00 /bin/bash -ue .command.run
58097 pts/36   S      0:00  \_ tee .command.out
58099 pts/36   S      0:00  \_ tee .command.err
58101 pts/36   S      0:00  \_ /bin/bash -ue /users/cn/ptommaso/scratch/7b/70cc551497057d1f37d8555ad20424/.command.sh
58122 pts/36   S      0:00      \_ sleep 1h
Hugues Fontenelle
@huguesfontenelle
Jun 06 2016 16:26
Uploaded also :)
I'm doing a lot of ps -eo pid | xargs kill . Very safe thing to do :D
(ok with some filtering, but still)
gtg, enjoy
Paolo Di Tommaso
@pditommaso
Jun 06 2016 16:29
ok, it should be possible to handle it somehow
thanks for reporting it
Hugues Fontenelle
@huguesfontenelle
Jun 06 2016 16:29
np
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:32

recommended workaround for using a channel more than once?

ERROR ~ Channel `reference` has been used twice as an input by process `indexReference` and process `decompressReference`

indexReference is doing a bwa mem, decompressReference using bgzip to convert fna.gz to fna (another tool downstream could't use gzipped reference)

Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:33
you can write
process foo {
  output: 
  file '..' into ref1, ref2

'''
your command
''' 
}
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:36
ah
Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:38
alternatively you can split any channel using the into operator
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:39

that works thanks. whats the reasoning behind not allowing forking of channels? is it because items are taken out of the queue?

oh I'll try into too

Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:41
because by definition a read operation consume the head of a dataflow queue
thus having more than one reader you will have a nondeterministic behaviour
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:43
I see. coming from make/snakemake, where different rules can take the same input target file, I think I was trying to force nextflow (dataflow paradigm) into that. that being said, really like how clean nextflow is w.r.t folders for each task, no fear of file name conflicts, etc
Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:47
yes, nextflow uses a different paradigm when compared with other tools
the functional-reactive approach can be confusing at the beginning, but once you got is much more fun ;)
w.r.t folders for each task, no fear of file name conflicts, etc
yes this hugely simplify many things
you don't have to take care of create a unique file name for your results
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:50
yep I like the F.R.P approach :D been meaning to try Rxjs for web apps, so seeing F.R.P in other domains is nice
Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:51
indeed, the channel operators are modelled over the first version of RxJava
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:56

PS, when first introducing closures, I think it would help to show

[1, 2, 3].each { val -> println(val) }

first, and then be like "you can also do this in a shortform syntax using it":

[1, 2, 3].each { println(it) }

closures a little confusing at first, but I am imagining them as JS callbacks

Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:57
make sense, that for suggesting that
Julian Mazzitelli
@thejmazz
Jun 06 2016 18:58
as well, maybe only start omitting brackets after a few examples. just to make things as clear as possible in the beginning
Paolo Di Tommaso
@pditommaso
Jun 06 2016 18:59
document is a HUGE work, you know
it would be required a team only for that
Julian Mazzitelli
@thejmazz
Jun 06 2016 19:00
yeah I read it all the other day :p
good work on it. I can PR what ive just suggested
Paolo Di Tommaso
@pditommaso
Jun 06 2016 19:01
yes please!
:)
any improvement is welcome
Julian Mazzitelli
@thejmazz
Jun 06 2016 20:09

can you elaborate a bit on how I might use the into operator to clean this up?

put ref1, ref2, ref3 into a target channel perhaps? then target has duplicates, can have same file pulled off queue by different processes? Though that doesnt sound right..

Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:00
@thejmazz not sure that the into op would clean up much more
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:00
hm
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:01
not convinced ?
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:01
nope just thinking
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:03
why not merging these two ?
oops .. no, get confused
interesting
make: 270 lines
snakemake: 104
nextflow: 107
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:08
make was really long because of the template lines in each rule to log out time spent
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:10
I guess that, I still do not understand why people continues to use Make for pipelines
I mean, it make sense when you have a very simple logic with a few steps
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:10
my guess is just b/c they are comfortable with it and scared of learning curves lol
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:11
for simple things, for real use cases it becomes quickly very complex
have you tried to render the pipeline DAG with nextflow?
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:12
not yet but that is on my todos
atm im debugging why my variant.vcfs are 2.8K when I leave both species
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:13
I was just curious to see it ;)
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:13
but works fine with only one at a time
for sure ill commit it soon :p
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:14
good
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:37
for some reason my variant.vcf's got created correctly after playing with the code (running one specie at a time, also maybe ls before mpileup did something.., maybe ls -lh was just acting up). maybe it has something to do with caches, maybe I did something wrong, idk. I'll see if I can reproduce, but in the mean time Ill post some screenshots in a gist
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:49
don't mean to raise problems if it was just an error on my side, im going to rm work now and see if it works, but here is a report:
https://gist.github.com/thejmazz/ae3e41764c229d37d2a1a7ccf8ef4ee7
the "total 24" comes from the ls -lh I added
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:52
I'm not getting this, what is exactly the problem?
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:53
when I toggled from one specie to 2 (via toggling species map at top), the final step didnt work, producing way too small a variant.vcf file
its probably a samtools problem, im guessing
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:56
in this case you can debug that step moving in the task work dir
and running bash .command.run
you have all it is needed there
Julian Mazzitelli
@thejmazz
Jun 06 2016 21:57
true I forgot about that
I already rm -rf worked :/
if I can reproduce the problem I'll give that a try
Paolo Di Tommaso
@pditommaso
Jun 06 2016 21:57
:)
Julian Mazzitelli
@thejmazz
Jun 06 2016 22:06
it worked as expected with a clean dir :D
my next step is to dockerize, then tomorrow I will start writing my blog post comparing make/snakemake/nextflow/etc.
thus far, nextflow is my favourite :)
Paolo Di Tommaso
@pditommaso
Jun 06 2016 22:09
Great, I'm really happy about that.
As soon as you manage to create a container for that I will give a look to your code to understand if there's something wrong in the resume logic
Julian Mazzitelli
@thejmazz
Jun 06 2016 22:12

cool thanks

my biggest concern so far is how to handle the "forking of channel" issue I raised earlier. I'd like to be able to have resource A produced in one, then defined as an input for two, three, without having to do

process one {
  output: val A into A1, A2
  ...
}

Perhaps there is a separate design pattern for this use case that I am not catching onto.

Paolo Di Tommaso
@pditommaso
Jun 06 2016 22:14
what is worrying specifically about that?
Julian Mazzitelli
@thejmazz
Jun 06 2016 22:15
it is tedious to add A1, A2, ... An and have to keep track which is used where in other rules is really the only issue. functionally, its fine
Paolo Di Tommaso
@pditommaso
Jun 06 2016 22:15
ok
I know that that is a bit boring that at this time there's no other solution
there are some ideas to remove this limitation in a future release
but it would require a major refactoring that at this time we can't deal with it
Julian Mazzitelli
@thejmazz
Jun 06 2016 22:19
I feel like it is a consequence of the dataflow paradigm?
Paolo Di Tommaso
@pditommaso
Jun 06 2016 22:19
yes