These are chat archives for nextflow-io/nextflow

2nd
Jun 2016
Mike Smoot
@mes5k
Jun 02 2016 16:50
@pditommaso Hi Paolo, I'm wondering if you've given any thought to unit testing nextflow pipelines? My pipeline is getting complicated enough that I'd really like some way of testing the channels and how everything connects and gets transformed.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 16:54
The best way to handle this is to include a test dataset in your project, small enough to run the complete pipeline in a few seconds
Mike Smoot
@mes5k
Jun 02 2016 16:55
Oh how I wish I had I dataset like that. :)
Paolo Di Tommaso
@pditommaso
Jun 02 2016 16:55
I know that can be difficult in some cases but it helps a lot especially if using it along with a CI service
Mike Smoot
@mes5k
Jun 02 2016 17:01
What I've been imagining is separating the process execution from how processes and channels are connected. I'm not so worried about process execution as those should be relatively easy to test in isolation. For connections, I've been thinking about creating mock processes that can validate the number of times they're called, the data types they're called with, etc. and then generate dummy data into the output channel. A test would then consist of some dummy input data and a bunch of assertions associated with the mock processes. Does that make sense?
Paolo Di Tommaso
@pditommaso
Jun 02 2016 17:05
(meeting, I will reply later)
Mike Smoot
@mes5k
Jun 02 2016 17:06
cool, thanks
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:25
In principle it could be done, but it looks much more complex than assembling a test dataset to quickly replicate the overall pipeline execution. For example how would you create dummy output data?
Automatically is not possible because on the output declarations there aren't enough type/semantic information. Manually would be a big work.
Mike Smoot
@mes5k
Jun 02 2016 18:29
Why not? It's either a string or file, no? I'd imagine that some of those could be automatically generated with random data. Even if it is manual, you'd really just need to specify a given string or file name.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:30
take for example a declaration like:
output: 
file ('*.txt') into channel
Mike Smoot
@mes5k
Jun 02 2016 18:30
Also, the point isn't to generate valid input data since the goal isn't to execute the processes, but really just to check that channels connect correctly.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:30
you can use that to capture one or many files
Mike Smoot
@mes5k
Jun 02 2016 18:32
Seems like you'd just create one or more empty files, right? As long as it's a valid file on the filesystem, then it should be able to connect to the next process.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:32
That's will change the behaviour of your network. Also what if you have a split operator chained on that channel that depends on the file content ?
Mike Smoot
@mes5k
Jun 02 2016 18:33
True.
But again, the goal isn't to be exhaustive, it's to gain some assurance that things are behaving as expected.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:50
Yes, I see you are proposing a unit-testing approach, but not sure that fits well in the nextflow model
mainly because it's heavily because on the dataflow paradigm which work like a network of reactive nodes.
to test it you need to test the overall network , otherwise doesn't have much sense in my opinion
anyway proposals and implements proving that I'm wrong are extremely welcome :)
Mike Smoot
@mes5k
Jun 02 2016 18:54
I understand that there are complications, but it seems like a tractable problem. I'm definitely going to keep thinking about it.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:55
that's the strength of open source ..
Mike Smoot
@mes5k
Jun 02 2016 18:57
Here's another question. One of my big concerns are the long chains of channel operators that I have. Is it possible to abstract that chain out into a wrapper channel or function? Then I should be able to test that wrapper or function in isolation.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 18:59
you should intercept all that method call invocations and replace with your wrapper
it should be possible both with groovy meta-programming or bytecode manipulation
Mike Smoot
@mes5k
Jun 02 2016 19:04
Sounds like I need to learn more groovy! :)
Paolo Di Tommaso
@pditommaso
Jun 02 2016 19:05
groovy is an incredible undervalued programming lang
at least in the context of the jvm
Mike Smoot
@mes5k
Jun 02 2016 19:07
Apart from 'map' being named 'collect' and 'reduce' being named 'inject', it hasn't irritated me too much. :)
Paolo Di Tommaso
@pditommaso
Jun 02 2016 19:07
LOL so true
Mike Smoot
@mes5k
Jun 02 2016 19:08
Thanks for your time, I'm going to keep scratching this itch as my time permits.
Paolo Di Tommaso
@pditommaso
Jun 02 2016 19:08
ok, no pb
happy hacking