These are chat archives for nextflow-io/nextflow

2nd
Feb 2017
amacbride
@amacbride
Feb 02 2017 01:09
@pditommaso An update: I was still occasionally seeing the "missing output files" error, so I tried consolidating two separate output channels from a process into a single channel shared by all the downstream processes -- that seems to be working much more reliably.

So instead of:

output:
    set  A, B, C into C1, C2, C3
    set D, E into C4

I'm using:

ouput:
    set A, B, C, D, E into C1, C2, C3, C4
It means I have some redundant info going to processes that don't need the input, but that's a small price to pay for increased reliability. (So far, hope I'm not jinxing it!)
Paolo Di Tommaso
@pditommaso
Feb 02 2017 15:05
GitHub has released a very nice feature that allows to tag project repositories
I would propose whoever has published a NF pipeline on GitHub to tag it with the nextflow topic
Félix C. Morency
@fmorency
Feb 02 2017 15:08
@pditommaso would you recommend using the built-in ignite support in production?
Paolo Di Tommaso
@pditommaso
Feb 02 2017 15:09
yes (with a shiver down my spine..)
Félix C. Morency
@fmorency
Feb 02 2017 15:10
Haha ok. I'll let you know if I run into any troubles ;)
Paolo Di Tommaso
@pditommaso
Feb 02 2017 15:11
kidding apart it's stable in our tests, but in any case I would suggest to stress test it to make sure it satisfies your requirements
you know, open source there's no warranty but we can work to make it better
Félix C. Morency
@fmorency
Feb 02 2017 15:13
of course. but between this and having to learn/deploy some other tech... ;)
I've been using the built-in Apache Ignite support of two of my servers without any trouble
Paolo Di Tommaso
@pditommaso
Feb 02 2017 15:18
Ignite it's a very stable and sophisticated piece of software
I think there are at least 10 year of dev behind
Félix C. Morency
@fmorency
Feb 02 2017 15:19
Nice to know
Paolo Di Tommaso
@pditommaso
Feb 02 2017 15:20
before it was know as GridGain
then when Spark took off, they decided to donate it to the Apache consortium to attract more visibility
Félix C. Morency
@fmorency
Feb 02 2017 15:25
Did you ever play with the Hadoop/Spark/... techs?
Rickard Hammarén
@Hammarn
Feb 02 2017 15:48
Hi! @pditommaso I'm wondering if theres a nextflow environment variable that I can set to point to a user specific config file?
Rickard Hammarén
@Hammarn
Feb 02 2017 16:00

Longer explanation of my issue:
Basicaly I want to have

params.project = 'name_of_our_production_slurm_project'
 in ~/.nextflow/conf/config

The problem is that our production functional account's home folder isn't really accessible for automatic deployment. So I'm wondering if theres a nextflow enviroment variable that I can set to point to the config file?
From the docs you state that Nextflow looks in CWD, script base directory and $HOME/.nextflow/config .

Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:13
@Hammarn Yes, otherwise you can still specify as a command line parameter
@fmorency Yes, actually NF started because I was tired to get frustrated with Hadoop stuff
Félix C. Morency
@fmorency
Feb 02 2017 16:14
Interesting. I would be curious to pick your brain on that
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:14
With Spark I have some basic experience, but surely it's more interesting than Hadoop
actually there's an idea to run NF o top of Spark
Rickard Hammarén
@Hammarn
Feb 02 2017 16:15
I'm trying to avoid using /home/ and the command line parameter and I don't want a pipeline that multiple people and groups use have our production project as a default
Félix C. Morency
@fmorency
Feb 02 2017 16:18
(Not related) I have a process where I pass a list of files as input. I need to loop on the files and compare each file name to a set of user-defined regex to see if there's a match and if there is, get some values to pass on the command-line. If there's no match, just use the default values. What would be the easiest way of achieving this behavior in NF/Groovy?
Rickard Hammarén
@Hammarn
Feb 02 2017 16:18
But the answer it that those three option are the only options? No $NXF_CONF or anything?
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:23
Yes, there are the only option
Rickard Hammarén
@Hammarn
Feb 02 2017 16:24
ok, thanks
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:25
I'm not getting what's your problem
ah, OK you want to control the config file with a var define at user level, right?
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:36
@fmorency You can have NF/Groovy one-liner in the script section to determine that eg.
script: 
def condition = fileNames.find {  /* rule here */  }
"""
your_command_line ${condition ? this : that} 
"""
See also find and any
Félix C. Morency
@fmorency
Feb 02 2017 16:40
Ok. That works for one rule. I guess I can just loop over my set of rule
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:41
you can write any piece of NF/Groovy code there
of write a custom method and invoke there
Félix C. Morency
@fmorency
Feb 02 2017 16:44
Thanks
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:44
:+1:
Félix C. Morency
@fmorency
Feb 02 2017 16:57
Is there a way to do something like set id, each(files)
Paolo Di Tommaso
@pditommaso
Feb 02 2017 16:58
nope, you have to compose that outside the process by using one or more operator
Félix C. Morency
@fmorency
Feb 02 2017 17:01
you mean create something like [[id, file1], [id, file2]] and use each on that?
Paolo Di Tommaso
@pditommaso
Feb 02 2017 17:02
in the above example the id is suppose to be the same ?
Félix C. Morency
@fmorency
Feb 02 2017 17:03
correct
Paolo Di Tommaso
@pditommaso
Feb 02 2017 17:03
thus you don't need each, that will trigger two executions just by using set val, file(name) form ch
Félix C. Morency
@fmorency
Feb 02 2017 17:03
ah yes, right
Félix C. Morency
@fmorency
Feb 02 2017 19:01
say I trigger multiple executions of process X using what we discussed. If I use .groupTuple() afterward to collect all the processed files for a given id, will I have to wait for all process X to have finished on all id or will process Y start as soon as it can?
Paolo Di Tommaso
@pditommaso
Feb 02 2017 19:59
no provided you specify the cardinality of the group with the size parameter
Félix C. Morency
@fmorency
Feb 02 2017 20:03
Ok, thanks
Paolo Di Tommaso
@pditommaso
Feb 02 2017 20:07
np
Mike Smoot
@mes5k
Feb 02 2017 20:48
@pditommaso to follow up on @fmorency 's question, if you do NOT specify size for groupTuple then it will have to wait, correct?
Félix C. Morency
@fmorency
Feb 02 2017 20:49
@mes5k this is what I understood yes
Paolo Di Tommaso
@pditommaso
Feb 02 2017 20:49
yes, exactly
Mike Smoot
@mes5k
Feb 02 2017 20:50
great, just checking!
Trevor Tanner
@tantrev
Feb 02 2017 22:04
Today's cry for help: I have a channel where each item is a set. I used collate() to get them into chunks of 24. Is there any way to undo the collate() to get back the original channel of individual sets?
I was thinking of using buffer, but am concerned it may not preserve order.
Mike Smoot
@mes5k
Feb 02 2017 22:08
@tantrev I believe flatMap can do what you want.
Félix C. Morency
@fmorency
Feb 02 2017 22:08
+1
Trevor Tanner
@tantrev
Feb 02 2017 22:11
I apologize, I just don't really understand how? Using a .flatten() operation seems to collapse the "list of lists" too far.
Mike Smoot
@mes5k
Feb 02 2017 22:11
one moment
Trevor Tanner
@tantrev
Feb 02 2017 22:11
I guess I could hard code the return variables, I was just hoping to be able to dynamically specify the chunk size.
of course, thank you
Mike Smoot
@mes5k
Feb 02 2017 22:16
Try running this to clarify:
Channel.from([1,1],[2,2],[3,3],[4,4],[5,5],[6,6],[7,7],[8,8])
    .collate(2)
    .view()
    .flatMap()
    .view()
    .collate(2)
    .flatten()
    .view()
as you've seen flatten flattens every list it finds, whereas flatMap just flattens the first one. flatMap can also take a closure and do much more interesting things.
Trevor Tanner
@tantrev
Feb 02 2017 22:17
ohhh I'm an idiot. I didn't realize .flatMap() could be executed without a closure, thank you.
Mike Smoot
@mes5k
Feb 02 2017 22:18
glad that my procrastination is useful! :)
Trevor Tanner
@tantrev
Feb 02 2017 22:20
haha this solves a big headache, I really appreciate it :smile: