These are chat archives for nextflow-io/nextflow

10th
Oct 2017
Félix C. Morency
@fmorency
Oct 10 2017 14:14
Gratz
Francesco Strozzi
@fstrozzi
Oct 10 2017 14:17
:+1:
Paolo Di Tommaso
@pditommaso
Oct 10 2017 14:19
thanks francesco for your support!
Francesco Strozzi
@fstrozzi
Oct 10 2017 14:23
my pleasure
but we have still work to do ;)
Paolo Di Tommaso
@pditommaso
Oct 10 2017 14:24
sure thing, endless ..
Venkat Malladi
@vsmalladi
Oct 10 2017 14:43
@fstrozzi looking great
Paolo Di Tommaso
@pditommaso
Oct 10 2017 14:43
it is ;)
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:33
any suggestion for a better name for ungroupTuple operator ?
nextflow-io/nextflow#440
Venkat Malladi
@vsmalladi
Oct 10 2017 15:36
what about splitTuple
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:40
not sure, a bit confusing
because there are already splitFasta, splitFastq that do a very different thing
Venkat Malladi
@vsmalladi
Oct 10 2017 15:40
flattenTuple
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:41
sounds better
flatTuple ?
Venkat Malladi
@vsmalladi
Oct 10 2017 15:45
ya i like it
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:45
nice, let's see if anybody propose something else
Venkat Malladi
@vsmalladi
Oct 10 2017 15:45
goes well with option fromFilePairs which has option flat
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:46
there's also flatMap
tho, there's a little clash with it
because the map in flatMap stands for an association
Félix C. Morency
@fmorency
Oct 10 2017 15:47
is there a way to add a variable to the cache?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:47
instead in flatTuple, tuple is a data structure
Félix C. Morency
@fmorency
Oct 10 2017 15:47
I have a map that gets filled and would like -resume to re-use this map
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:48
is there a way to add a variable to the cache?
Venkat Malladi
@vsmalladi
Oct 10 2017 15:48
There is flatten which is more akin to the function
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:48
?
There is flatten which is more akin to the function
hence you are suggesting flattenTuple instead of flatTuple ?
Venkat Malladi
@vsmalladi
Oct 10 2017 15:48
Ya
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:49
good point
Félix C. Morency
@fmorency
Oct 10 2017 15:50
@pditommaso I declared a map foo bla = [:] and I fill it with some (random) data. When I -resume my pipeline, it would seems the map is not kept between runs leading to re-filling the map with other (random) data. I would like to be able to say "keep this structure in the cache" if that makes sense
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:51
it depends how that map is declared
Félix C. Morency
@fmorency
Oct 10 2017 15:52
colors = [:] at the top of the .nf file
Paolo Di Tommaso
@pditommaso
Oct 10 2017 15:52
if you put in the global scope it will be re-computed
Félix C. Morency
@fmorency
Oct 10 2017 15:53
I see
Mike Smoot
@mes5k
Oct 10 2017 16:15
What's the behavior of ungroupTuple? Given [a, [x, y]] do I get 1. [a, x, y] or 2. [[a, x], [a, y]]? 1. implies flatten, while 2. suggests ungroup.
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:16
2.
Simone Baffelli
@baffelli
Oct 10 2017 16:17
Untuple?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:17
LOL
Venkat Malladi
@vsmalladi
Oct 10 2017 16:17
lol
deTuple?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:17
jesus
Venkat Malladi
@vsmalladi
Oct 10 2017 16:18
sorry that was more joking
Simone Baffelli
@baffelli
Oct 10 2017 16:18
@fmorency you could use a fixed seed for the rng, and declare that in parameters
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:18
flattenTuple makes sense IMO
because the semantic is the same for flatten but applied to a single tuple
Simone Baffelli
@baffelli
Oct 10 2017 16:20
But looking at @mes5k example it looks like it is combining all entries in a new list of tuples
Is that the function of this operator?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:20
let me show you
Simone Baffelli
@baffelli
Oct 10 2017 16:21
It look like pythons itertools.zip_longest
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:21
!
given the input a channel emitting an item [a, [x, y]]
ungroupTuple would return
[a, x]
and
[a, y]
because it flatten the first tuple it finds in the item
Simone Baffelli
@baffelli
Oct 10 2017 16:26
Ok, then it is similar to transpose
I use map with transpose a lot
To do smth similar
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:27
this was the missing word .. :)
Simone Baffelli
@baffelli
Oct 10 2017 16:27
Well I call it "zip"
Python..:snake:
Venkat Malladi
@vsmalladi
Oct 10 2017 16:27
zipTuple?
Simone Baffelli
@baffelli
Oct 10 2017 16:28
Transposetuple
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:28
:)
just transpose ?
Simone Baffelli
@baffelli
Oct 10 2017 16:28
Yes
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:29
now let's make it difficult
Simone Baffelli
@baffelli
Oct 10 2017 16:29
Makes sense
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:29
what's the transpose of [a, [x, y], [1,2]] ?
Simone Baffelli
@baffelli
Oct 10 2017 16:30
Depends, does it repeat the shortest element or truncate it?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:30
currently it is doing
Venkat Malladi
@vsmalladi
Oct 10 2017 16:30
[a, [x,y]] and [a, [1,2]
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:30
that's not a transpose, that a split ..
now it's doing
[a, x, [1,2]]
[a, y, [1,2]]
Simone Baffelli
@baffelli
Oct 10 2017 16:31
So all combinations with the shortest entry
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:31
unless you specified to applied it on the 3rd element
[a, [x, y], 1]
[a, [x, y], 2]
Simone Baffelli
@baffelli
Oct 10 2017 16:32
That function was sorely missing
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:32
?
Simone Baffelli
@baffelli
Oct 10 2017 16:32
I can replace lots of map
With that
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:33
I guess so
Simone Baffelli
@baffelli
Oct 10 2017 16:33
But then, why not allow any collections method as operators?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:33
?
ahh
well, I don't think all of them make sense
Simone Baffelli
@baffelli
Oct 10 2017 16:34
Ofc no
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:34
and above all there's not an automatic mapping
Simone Baffelli
@baffelli
Oct 10 2017 16:34
I see
But I highly approve of this new operator!!
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:38
:)
do you have an example of list transpose ?
forget .. found
Simone Baffelli
@baffelli
Oct 10 2017 16:40
Well i use a lot after collecting
If i collect tuples of images and timestams that i want to average
Excuse my bad typing...I'm using my phone
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:42
still not sure if it should be applied to all tuples in the item or just one ..
maybe all by default and give a parameter to choose a subset
Simone Baffelli
@baffelli
Oct 10 2017 16:46
That would make the most sense
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:46
I have some work for this night :)
Simone Baffelli
@baffelli
Oct 10 2017 16:47
do you ever rest?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:47
coding is not resting?
Simone Baffelli
@baffelli
Oct 10 2017 16:48
i like it too, but sometimes i need to eat and sleep😀
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:48
:joy:
Simone Baffelli
@baffelli
Oct 10 2017 16:49
I like to exercise too, from time to time
Perhaps i should get a treadmill desk
Anthony Underwood
@aunderwo
Oct 10 2017 16:50
@mes5k What's a groovy grape?
:grapes:
Paolo Di Tommaso
@pditommaso
Oct 10 2017 16:51
:)
a way to import java/groovy libraries via @Annotation
Mike Smoot
@mes5k
Oct 10 2017 16:51
yep
Anthony Underwood
@aunderwo
Oct 10 2017 16:51
using @Grab?
@mes5k do you have an example of how you have imported a local function?
Mike Smoot
@mes5k
Oct 10 2017 16:54

Here's what the top of each of my pipelines looks like:

@GrabResolver(name='sgi_releases',
              root='http://internal_url/nexus/content/repositories/releases/')

@Grab(group='com.syntheticgenomics.compbio',
      module='groovy_nextflow_utilities',
      version='0.3.14')

// These already exist in the nextflow execution context,
// so importing them again causes problems.
@GrabExclude('org.codehaus.groovy:groovy-all')
@GrabExclude('io.nextflow:nextflow')

import com.syntheticgenomics.compbio.nextflow.Utils

I've got a small number of utility functions in com.syntheticgenomics.compbio.nextflow.Utils. That project is just a normal java/groovy project that gets jarred up.

Anthony Underwood
@aunderwo
Oct 10 2017 16:55
Cool. These are available over http? http://internal_url/nexus/content/repositories/releases/
Mike Smoot
@mes5k
Oct 10 2017 17:00
That's an internal Nexus repository that we've got running. Very handy for managing package binaries (java, groovy, python, and probably more).
Paolo Di Tommaso
@pditommaso
Oct 10 2017 17:01
worth saying that you can also put a jar or some classes/scripts in the lib folder and it does the same
Mike Smoot
@mes5k
Oct 10 2017 17:03
right
Probably best to start that way
Anthony Underwood
@aunderwo
Oct 10 2017 17:12
@pditommaso would the lib folder be $HOME/.groovy/lib by default?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 17:12
nope, project ROOT/lib
Anthony Underwood
@aunderwo
Oct 10 2017 17:16
groovy ROOT?
Paolo Di Tommaso
@pditommaso
Oct 10 2017 17:16
no sorry your project root
ie the folder where main script is located
Anthony Underwood
@aunderwo
Oct 10 2017 17:17
ahh that's nice and easy :)
Paolo Di Tommaso
@pditommaso
Oct 10 2017 17:17
:+1:
Venkat Malladi
@vsmalladi
Oct 10 2017 17:32
tredmill desks are fun and dangerous
Simone Baffelli
@baffelli
Oct 10 2017 18:10
Never tried one
I run after work at the moment
Venkat Malladi
@vsmalladi
Oct 10 2017 18:10
Ya so do i
only way i can clear my head
Simone Baffelli
@baffelli
Oct 10 2017 18:11
Absolutely
But I generally dislike the treadmill
It does not feel at all like regular running
Venkat Malladi
@vsmalladi
Oct 10 2017 18:11
ya me for the most part
but sometimes its the only way i can get a run in
Simone Baffelli
@baffelli
Oct 10 2017 18:12
I think down to -15 c is still fine to run
With the right gear
Venkat Malladi
@vsmalladi
Oct 10 2017 18:12
ya i am looking at 40 C
I like the cold hard to do it in the heat
Simone Baffelli
@baffelli
Oct 10 2017 18:13
My best runs always were in te cold
It is easier to push oneself harder
Venkat Malladi
@vsmalladi
Oct 10 2017 18:14
yep for sure
Félix C. Morency
@fmorency
Oct 10 2017 18:27
@baffelli Thanks for the suggestion. I am already using a fixed seed. However, for a specific set of inputs, I want the process to always generate the same random parameter
I was keeping the result in a map, but the map is always recomputed on -resume which is not what I want
Simone Baffelli
@baffelli
Oct 10 2017 18:52
Could you store it in a file?
Eg use a process to store the map as a json file?
Venkat Malladi
@vsmalladi
Oct 10 2017 18:53
that would be my suggestion
Félix C. Morency
@fmorency
Oct 10 2017 18:54
I was trying to avoid that but I guess it would be a solution yes
Simone Baffelli
@baffelli
Oct 10 2017 18:54
That should not to bad right?,
Venkat Malladi
@vsmalladi
Oct 10 2017 18:54
@fmorency is this for testing purposes?
Félix C. Morency
@fmorency
Oct 10 2017 18:54
@vsmalladi no, prod
Simone Baffelli
@baffelli
Oct 10 2017 18:54
Except that it would add an extra step in the pipeline
Venkat Malladi
@vsmalladi
Oct 10 2017 18:54
can you put them as options in the config file
like an array in config?
Simone Baffelli
@baffelli
Oct 10 2017 18:55
I do it a lot
For example to store an array of R formulas i need to compare
Félix C. Morency
@fmorency
Oct 10 2017 18:55
I already do that, but I want to generate pseudo-random data for keys that are not in the array in the config file
Simone Baffelli
@baffelli
Oct 10 2017 18:56
How?
You want the data to be pseudo random bi uniwue?
Unique sorry?
Could you hash it and use the hashes as seeds?
Félix C. Morency
@fmorency
Oct 10 2017 18:57
What I might do is generate the data from the number of inputs
or just the input itself
mmm yeah I could do that using stdin/stdout instead of files
Simone Baffelli
@baffelli
Oct 10 2017 18:58
You mean to use the data as the seed itself
Félix C. Morency
@fmorency
Oct 10 2017 18:58
something like that
Simone Baffelli
@baffelli
Oct 10 2017 18:59
Hashing the file or the path could to that
The probability of collision is really low
Venkat Malladi
@vsmalladi
Oct 10 2017 19:01
i would do it that way
prob the easiest way
Simone Baffelli
@baffelli
Oct 10 2017 19:03
Indeed
I still think a generic caching operator would be very useful
Mike Smoot
@mes5k
Oct 10 2017 19:06
I'm slowly working on just such an operator. Unfortunately I've been really busy with other things.
Simone Baffelli
@baffelli
Oct 10 2017 19:07
Well thanks for your time! Will it use the same DB that processes use?
Mike Smoot
@mes5k
Oct 10 2017 19:10
Yes, the idea is to cache the files found in a channel to the work dir OR a specified cache location (e.g. storeDir). There are a few other details to work out as well. Here's the ticket: nextflow-io/nextflow#397
Simone Baffelli
@baffelli
Oct 10 2017 19:15
That is an excellent idea!
Félix C. Morency
@fmorency
Oct 10 2017 20:02

@pditommaso is it safe to

import com.google.common.hash.Hashing
import com.google.common.base.Charsets

at the top of my .nf file? Is there any better/other ways?

Paolo Di Tommaso
@pditommaso
Oct 10 2017 20:09
it's ok
Félix C. Morency
@fmorency
Oct 10 2017 20:15
thanks
Paolo Di Tommaso
@pditommaso
Oct 10 2017 20:15
:+1:
Paolo Di Tommaso
@pditommaso
Oct 10 2017 20:41
Just uploaded
  Version: 0.26.0-beta3 build 4591
  Modified: 10-10-2017 20:23 UTC (22:23 CEST)
that implements transpose operator
feedback is welcome here nextflow-io/nextflow#440
Simone Baffelli
@baffelli
Oct 10 2017 20:54
Will try tomorrow, I'm always happy to make my ever changing pipeline easier
(Btw: dont
Venkat Malladi
@vsmalladi
Oct 10 2017 21:14
i have an array of parameters exiting out as a channel such as: set replicateId, file(bam), file(index) from aligned_bam_ch
is there a good example of how I can use the aligned_bam_ch.collect() in another process?
set replicateId, file(bam), file(index) from aligned_bam_ch
Félix C. Morency
@fmorency
Oct 10 2017 21:16
Thanks @baffelli and @vsmalladi for the suggestions. I'm hashing the file name and uses this as the random seed. Works perfectly and it's reproducible.
Venkat Malladi
@vsmalladi
Oct 10 2017 21:29
+1
Venkat Malladi
@vsmalladi
Oct 10 2017 21:42
ah got it might use collectFile to output a file