These are chat archives for nextflow-io/nextflow

28th
Jan 2015
Sascha Steinbiss
@satta
Jan 28 2015 13:42
hey hey
question: is there an easy way to conditionally enable/disable processes? and adjust the inputs of other processes accordingly?
I would like to optionally include a step in my pipeline, configured by a boolean in my params
Paolo Di Tommaso
@pditommaso
Jan 28 2015 16:47
@andrewcstewart You can use a process which script contains a wget/curl command
or you can use groovy/java api
For example using something like this http://groovy-almanac.org/save-url-to-file/
@satta Usually I manage this trying to include the optional step into a process which may depend on it
I mean for exmaple:
`process foo {
input: ...
Paolo Di Tommaso
@pditommaso
Jan 28 2015 16:54
@satta Sorry I've a bit messed up with that code, have a look to this for an example
@satta otherwise you can even include a process in a if branch. Like this https://github.com/cbcrg/mta-nf/blob/master/mta.nf#L92
However I'm not fully satisfied with solution, I think there should be better way to handle process conditions.
Andrew Stewart
@andrewcstewart
Jan 28 2015 17:26
Is it best to handle such conditions at the process of channel level?
(referring to @satta's question, not mine)
Thomas Dyar
@tom-dyar
Jan 28 2015 17:32
Hi, I don't suppose you can use docker images with the dnanexus framework?
Andrew Stewart
@andrewcstewart
Jan 28 2015 17:36
@tom-dyar I don't think so. I was investigating exactly that awhile back independent of NextFlow and I learned from one of their devs that DX actually uses docker containers itself
which is cool, but precludes running containers from within a container for the time being, as that is a slightly more complicated task
Thomas Dyar
@tom-dyar
Jan 28 2015 17:43
I expected they would have provided that functionality by now, so it is disappointing
any plans to make a persistent server for monitoring/scheduling workflows?
i am thinking about using luigi for that type of functionality, but not sure if the 2 projects really fit together very well
Andrew Stewart
@andrewcstewart
Jan 28 2015 17:51
That's pretty complicated functionality, I'm not sure I would have such an expectation
It basically amounts to a complete redesign of their current architecture
fortunately something like NextFlow really negates the need for vendor managed services like DNANexus
imho
Thomas Dyar
@tom-dyar
Jan 28 2015 17:57
yes, i would just use dnanexus for large-scale re-analysis -- just a convenient way to get to AWS
Andrew Stewart
@andrewcstewart
Jan 28 2015 18:20
Yea Im a big fan of their CLI
I wish that could be abstracted and open sourced
@pditommaso w.r.t. my last question, how would one handle the output of that process then? If it's just a single file that I want to use within multiple processes, should it be sent to a Channel? Or can it somehow be made into a 'global'? In either case I'm not really clear how this should look within the process's output section
Paolo Di Tommaso
@pditommaso
Jan 28 2015 18:48
@tom-dyar Yes, currently Docker cannot b used in DNAnexus because they yet not support it. I've asked 8 months ago about it, and they replied they where working on it, but still no news.
Paolo Di Tommaso
@pditommaso
Jan 28 2015 18:56
@andrewcstewart For example sometthing like this
process download_ref_file {

  output:
  file my_ref_file

  """
  wget http:... -O my_ref_file
  """

}

ref1 = Channel.create()
ref2 = Channel.create()
..

my_ref_file.into(ref1, ref2, ..)


process foo1 {

  input: 
  file ref1

  """
  <your code here>
  """
}

:
You can use the directive storeDir to skip the process if the file has been already downloaded
(in the first proc obviously)
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:03
So you def need to create multiple channels though?
That feels a bit cumbersome
Couldn't I just skip the channel and have the input section of fool directly reference "file my_ref_file" ?
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:05
Yes, I know. Currently is not possible, but I'm working to make it possible somehow similar to what you are proposing
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:09

I think Im slightly confused because I know I can do something like the following
```
reference = file("/path/to/reference")

process something {
input:
file(reference)

 """
 myscript $reference
 """

}

hmm
that didn't work
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:10
Yes, you can
Yo need a channel only when dealing with a output (file or whatever) of a process
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:11
I wonder if I could somehow spoof it
something like the following...
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:11
because the process is executed asynchronously
If you download that file with java/groovy api, you can referenced as you just showed above
does it make sense ?
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:13
reference = file("/path/to/dummy_reference")

process something {
  input: 
    file(reference)

"""
wget http... -O $reference
"""
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:13
ahh, no
this won't work because in this way you don't know when download has been completed
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:15
ah ok
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:15
don't forget that all processes asynchronous, and launched at the same time
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:15
instead of creating a bunch of separate channels, could I instead just 'put it back' with the process?
(right, keep needing to remind myself of exactly how that works)
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:16
(that is the thing why nextflow is different)
what do you mean by put it back?
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:18
process download_ref_file {

  output:
  file my_ref_file into reference

  """
  wget http:... -O my_ref_file
  """
}

process foo1 {

  input: 
  file reference

  output:
  file reference

  """
  <your code here>
  """
}

process foo2 {

  input: 
  file reference

  output:
  file reference

  """
  <your code here>
  """
}
and lets say I don't care what order foo1 and foo2 run
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:19
ah, no can't work in this way
the last channel would override the others with the same name
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:20
right
hmm..
Ok, so I think for the time being I just need to do some pre-nextflow provisioning
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:20
However I think the simplest way is not use a process to download the file
You can use the fragment of groovy code that I've linked you, and use it to download that file at the beginning of the script
that use that file as an input
is that clear?
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:23
Yeah, so if I just do that through groovy outside of a process, that will be guaranteed to happen before any process kicks off?
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:24
yes! because that code is sequential
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:24
ahhh
im understanding the order of execution a bit better now
So Im actually grabbing this reference data from a private S3 repo. I imagine I could just use groovy bindings for the S3 API to do the same thing?
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:25
instead the process block is the definition of a process, which execution is deferred until data is available
yes
Is there a S3 binding for Groovy ?
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:26
I have no idea, Im not a java/groovy guy
but I think I'm going to have to learn some groovy :)
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:26
:)
Do you know java?
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:27
I was hoping you'd know of one! I'll go look it up.
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:27
however I learn groovy with this in a week
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:27
Yeah, I've done plenty of Java in school, it's just not my main practitioner language
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:28
it's very easy
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:28
For sure. It's more of figuring out the equivalent libraries, etc.
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:28
I know
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:28
Although maybe a system call to the local AWS CLI would be simpler
and not require dealing with an external library
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:30
But public files in S3 can be access via http
you don't need an extra layer ..
unless you want to upload there
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:31
these aren't public
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:31
ah
well, I guess it is possible to access via http specifying the user and password
however I'm integrating a S3 file system in nextflow
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:33
zomg
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:33
so it will be possible to access (read/write) S3 buckets transparently like any other file
zomg ?
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:34
sorry, that means I'm excited
:D
Yeah, that would be awesome
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:34
LOL
actually it's already implemented but only included in nextflow "gridgain" version
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:42
lets see if I have this right...
# make sys call to "aws s3 cp s3://mybucket/ /local/path/file"
def.proc = ['aws','s3','s3://mybucket/','/local/path/file'].execute()
proc.waitFor()
file('/local/path/file')
does that look right?
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:43
def proc
without the dot
yes it should work
also waitFor returns the exitstatus that you may want to check
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:45
cool
should be fine for the time being
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:45
nice
enjoy it
Andrew Stewart
@andrewcstewart
Jan 28 2015 19:45
Thanks!
Paolo Di Tommaso
@pditommaso
Jan 28 2015 19:45
welcome!
Andrew Stewart
@andrewcstewart
Jan 28 2015 21:01
If I have a process with an embedded python script that I want to produce a list for output... is stdout the best way to capture that output into a channel?
process foo {

  output:
    stdout myChan

"""
!/usr/bin/env python

print [1,2,3]
"""
}
Andrew Stewart
@andrewcstewart
Jan 28 2015 22:42
(and also, is it possible to convert from stdout to val?