These are chat archives for nextflow-io/nextflow

9th
Jun 2015
ekageyama
@ekageyama
Jun 09 2015 09:00
Hello, sorry to bother you, I have a question that havent been able to find an answer too
I have 2 programs, one running for a list of files(alignment files) and after I finished analyzing all files, I need to run a second program (create a vcf file)
So, the first part works perfect, but note sure how to wait until all files have finished been analyzed
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:26
@ekageyama Hi there, the second program need to run a single job for all the alignments?
ekageyama
@ekageyama
Jun 09 2015 09:27
yes, once i have all the bam files ready for all previous process
*from
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:28
You should be able to collect all the bam by using the toList operator
ekageyama
@ekageyama
Jun 09 2015 09:30
maybe its jsut a silly question but then, If I decalare a channel after a process, then it will wait untill all process are finished to continue?
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:31
few seconds sorry
ekageyama
@ekageyama
Jun 09 2015 09:34
few seconds?
it waits a few seconds to start? or a few secodns after all previous processes are finished ?
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:36
:)
no sorry, I was meaning just you wait few seconds because they interrupted me
so regarding you question
the process synchronization is managed implicitly by nextflow, it doesn't case the order in which you declare the processes/operators
when you write
process foo {
  """
  something..
  """
}
the script execution does not stop there to wait the job to be executed
ekageyama
@ekageyama
Jun 09 2015 09:41
aha, yeah, thats what I imagined
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:41
yes, process and operators are asynchronous
ekageyama
@ekageyama
Jun 09 2015 09:43
'''
process bla{
output :
"out/aln.bam"
"""
something
"""
'''
ahh
yeah, something like that, I need to wait untill all instances of process bla are finished, that is my questions
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:44
yes, for this I'm suggesting to use the toList operator on the channel that returns the BAM files
it will create a new channel that produces a single list item with all the files you need.
ekageyama
@ekageyama
Jun 09 2015 09:45
mmm...ok, ill have a look, thnx for the support! :)
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:45
does that make sense?
let see if I manage to find an example
Paolo Di Tommaso
@pditommaso
Jun 09 2015 09:51
ok
something like this
process align {
    input: 
    each x from 1..5 

    output:
    file 'file.bam' into aligns 

    """
    echo 'bam data $x' > file.bam
    """    
}

process vcf { 
  echo true
  input: 
  file '*.bam'  from aligns.toList()

  """
  cat  *.bam
  """
}
ekageyama
@ekageyama
Jun 09 2015 09:55
thank you a lot!
Paolo Di Tommaso
@pditommaso
Jun 09 2015 10:01
welcome !
let me know if you manage to run it
ekageyama
@ekageyama
Jun 09 2015 12:42
Hey its me again, everythign seems to working fine, now i have a different question. I am working with an SGE cluster, and because Im doing allignments, I dont want to be copying the reference genome whenver a cluster process is called
so Im using q-src to do this, which returns a path in a tmp folder available to every cluster node
not entirely sure how to copy the output of q-src to an output channel
````
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:45
what's q-src ?
ekageyama
@ekageyama
Jun 09 2015 12:46
process qsrc{
input:
file path
"""
ref_dir=\$(q-src ${path} || exit 1)
"""
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:46
I don't know that tools
what is meant for ?
ekageyama
@ekageyama
Jun 09 2015 12:47
sorry, q-src jsut copies a folder into a tmp directory available to cluster nodes, so it doesnt need to read everything again from the file system, and it jsut give you back a path
what I need is the value of refdir as an output
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:48
this is managed automatically from nextflow
By default, it creates a symlink for your input files in the process working dir
ekageyama
@ekageyama
Jun 09 2015 12:49
ah yeah, but I dont know where my input files are until I exectue q-src
so in escecne, is there a way to get as an output the value of something being executed?
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:52
in not getting what you want to do
ekageyama
@ekageyama
Jun 09 2015 12:53
The return value of a command is a path
I want to pass this path he the subsequent processes
in this case the value of ref_dir
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:53
ok, if so it should be
process qsrc{
input:
file path
output: 
file ref_dir 
"""
ref_dir=\$(q-src ${path} || exit 1)
"""
ekageyama
@ekageyama
Jun 09 2015 12:55
ah...didnt know I could do that...thnx! Hopefully Ill stop bugging you now
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:55
no problem
however the ref_dir is a directory created in the node local storage ?
ekageyama
@ekageyama
Jun 09 2015 12:56
yes
its created dynamically and as such, it wan be different from time to time
*can
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:57
well, but if your are doing this, a process running in another node won't be able to read that path .. (?)
I don't get the point
ekageyama
@ekageyama
Jun 09 2015 12:58
it only changes when I run q-src, and sometimes it gets deleted by our sys-admin
its kinda weird :P
Paolo Di Tommaso
@pditommaso
Jun 09 2015 12:58
it seems :)
ok, you know your system
happy hacking