These are chat archives for nextflow-io/nextflow

20th
Mar 2019
Stephen Ficklin
@spficklin
Mar 20 00:08
HI @lebernstein thanks for helping. I did look at validExitStatus but the problem is that if the primary tool fails in one way I want to handle it gracefully by braching the workflow, but in all other cases I do want the workflow to fail. Unfortunately, the tool always returns the same exit code regardless of the type of error
Laurence E. Bernstein
@lebernstein
Mar 20 00:51

@spficklin doesn't the validExitStatus example show how to do just what you are asking for?

process returnOk {
     script:
     """
     echo Hello
     exit 1
     """
}

Or is this what is not working? Or is your exit command inside your bash script? Like so:

process returnNotOk {
     shell:
     """
     echo Hello
     exit 1
     """
}

You might have to show your code (or a piece of it to explain a little better)

Laurence E. Bernstein
@lebernstein
Mar 20 02:16

I would like to ignore all data from a file until some keyword is found. Like this file:

garbage,garbage,garbage
garbage,garbage,garbage
garbage,garbage,garbage
garbage,garbage,garbage
[here is the good stuff],,
goodstuff,goodstuff,goodstuff,
goodstuff,goodstuff,goodstuff,

So that only the "goodstuff" comes out
How can I accomplish this?

sirisha sunkara
@sirishasun_twitter
Mar 20 03:54
Hi Laurence: grep -P 'whatever pattern/keyword is uniquely represented in that good stuff line' (to interpret as Perl regex), or any other grep variant that can handle extended regular expressions. :-)
Laurence E. Bernstein
@lebernstein
Mar 20 05:13
@sirishasun_twitter It's not that easy. I wish it was. I want a method that will work regardless of what is in the other lines. Plus I am creating a channel, I don't want to write a new file. I suspect Nextflow can do this natively, I'm just not sure how, but I'm guessing @pditommaso knows. :)
sirisha sunkara
@sirishasun_twitter
Mar 20 05:24
Ok. I think I am not too clear on your question then... In general that is what pattern matching is for, and the output is written to stdout.
Rad Suchecki
@rsuchecki
Mar 20 09:59
@lebernstein would the lines be individual emissions from a channel? In such case filter operator or map with some custom closure might do the trick
Chelsea Sawyer
@csawye01
Mar 20 10:44
Is there a way to use optional true for an input channel? I have a channel that may or may not be created in a previous process but the next process doesn't run if I declare the channel in the inputs and it doesn't exist.
Paolo Di Tommaso
@pditommaso
Mar 20 10:45
for example
Chelsea Sawyer
@csawye01
Mar 20 10:49
@pditommaso I'm not sure how to use that example with a channel that may not be created at all. Should I create the channel as empty before the first process and check if its empty in the next process?
Paolo Di Tommaso
@pditommaso
Mar 20 10:51
in that case you can do something like
process foo {
   input:
   val opt from your_channel_that_may_be_empty.ifEmpty { true }
   ... etc 
}
Chelsea Sawyer
@csawye01
Mar 20 10:55
@pditommaso thank you!
Paolo Di Tommaso
@pditommaso
Mar 20 10:55
welcome
Jonathan Manning
@pinin4fjords
Mar 20 11:15
Morning! Is there something I can use as a 'difference' operator? Looking to filter one channel such that the tuple keys are not in the tuple keys of another channel.
micans
@micans
Mar 20 11:16
That sounds tricky, no? It means both channels must be fully consumed before that decision is made, if I understand you correctly.
Jonathan Manning
@pinin4fjords
Mar 20 11:22
Hmm, maybe I'm not thinking the right way. I was hoping I could use the keys of one channel as a blacklist for another.
micans
@micans
Mar 20 11:35
But you'd have to consume the entire blacklist before knowing what to do.
(so only the blacklist, not both channels would have to be consumed fully)
Jonathan Manning
@pinin4fjords
Mar 20 11:44
Is consuming a channel for this purpose a problem? I'm thinking I can do a map() and toList() to get a single list item from the blacklist, which I could then use in a .filter() on the channel I want to filter. Just seems a bit messy and maybe there's a better way.
micans
@micans
Mar 20 13:08
I'd try something like this, basically to somehow populate the entire blacklist in an associative dictionary (called map in Groovy, NF map() is broadly same as Groovy collect()), and then use filter. Have you tried a toy example?
Karl Nordström
@karl616
Mar 20 13:28
his, noob question here... Can I set NXF_HOME through the nextflow.config file? or only through the environment variable?
Paolo Di Tommaso
@pditommaso
Mar 20 13:29
nope, none of NXF_xxx vars can be set there
Karl Nordström
@karl616
Mar 20 13:31
OK, thanks
micans
@micans
Mar 20 13:32
s/noob/novice/. One day I'll convince the world.
Paolo Di Tommaso
@pditommaso
Mar 20 13:33
LOL
Karl Nordström
@karl616
Mar 20 13:33
:P
you forgot g
Paolo Di Tommaso
@pditommaso
Mar 20 13:34
lot of humor in this channel .. :joy:
micans
@micans
Mar 20 13:34
hehe, g, true.
that's not a novice
Karl Nordström
@karl616
Mar 20 13:35
nextflow-novice
micans
@micans
Mar 20 13:36
count me in
Laurence E. Bernstein
@lebernstein
Mar 20 14:54
@rsuchecki The lines are input from a CSV file and I am reading them in with splitCsv(). I've been trying to do what you are suggesting but not sure how to get it to work so that it only outputs values after I see the one that is the "marker".
micans
@micans
Mar 20 15:00
@lebernstein I don't see anything directly applicable. What you want is complementary to the until operator. I wonder if you can hack this by setting a global variable in the filter operator, something like global_boolean || (pass_test(it) && global_boolean = true), but no clue if NF allows this sort of hackery.
Tobias "Tobi" Schraink
@tobsecret
Mar 20 15:04
@lebernstein this reminds me of python's itertools.dropwhile. One way of hacking it is to first read in the CSV file with something like python, parse how many n lines to skip, emit it as a value, and then use splitCSV(skip=n).
Tim Dudgeon
@tdudgeon
Mar 20 15:48
I'm wanting to provide a mechanism for processes in a NF workflow to report out status and progress information to a file that can be used to monitor the execution of a workflow. A bit like what NF does in the trace file (the --with-trace option), but where I have control over what get's written to the file. The most likely scenario is that this file will be written to when each individual process completes. Is there any mechanism for handling this?
Tim Dudgeon
@tdudgeon
Mar 20 15:53
Yes, I saw that, but it's not really what I'm looking for as I want to write to a file (like the trace.txt file) and I want to be able to control what is written.
Paolo Di Tommaso
@pditommaso
Mar 20 16:01
You can implement your own trace file observer
Tim Dudgeon
@tdudgeon
Mar 20 16:15
are there examples?
Laurence E. Bernstein
@lebernstein
Mar 20 16:39
@tobsecret Yes. I was thinking about something just like that, was hoping there was a simpler way. :) If I am going to read the file and write out another file though, I might as well just parse it right the first time with Python or something else. :(
Tobias "Tobi" Schraink
@tobsecret
Mar 20 18:58
Seems like that would require a list, so you would have to .collect()your splitCsv output
Laurence E. Bernstein
@lebernstein
Mar 20 19:14
@tobsecret Hmm.. that's interesting. Might take me a bit to figure it out. I just wrote a quick process using Python to rewrite to a new file and its easy, so I have to ask myself why am I trying so hard to do it another way? :)
Tobias "Tobi" Schraink
@tobsecret
Mar 20 19:17
@lebernstein :sweat_smile: I often ask myself that when hacking together groovy code
Laurence E. Bernstein
@lebernstein
Mar 20 20:27
TADA!
samplesData = Channel.create()
foundData = false
file_ch = Channel.fromPath(inputFile)
    .splitCsv()
    .filter { it[0] != "" }
    .subscribe { if (foundData) { samplesData.bind(it) }; 
                 if ("${it[0]}" == '[Data]') 
                 foundData = true; }
Tobias "Tobi" Schraink
@tobsecret
Mar 20 20:38
sick!
Laurence E. Bernstein
@lebernstein
Mar 20 20:43
It turned out to be pretty easy once I understood that I could put whatever groovy code I wanted into the subscribe(). This little exercise was really enlightening.
Tobias "Tobi" Schraink
@tobsecret
Mar 20 20:44
How would you generalize that to processing multiple csv files that way?
Laurence E. Bernstein
@lebernstein
Mar 20 20:46
Well my goal was take in a "Sample Sheet" generated by Illumina machines and pull out the list of samples from that Sample Sheet. If you wanted to pull in multiple files I think you could put this into an "exec" section in a process and then stack up all samples into the single output channel.
Tobias "Tobi" Schraink
@tobsecret
Mar 20 20:48
Fair enough :ok:
Rad Suchecki
@rsuchecki
Mar 20 22:41
Nice one @lebernstein , you can even do away with the additional channel
foundData = false
Channel.fromPath(inputFile)
    .splitCsv()
    .filter {
      foundData || (foundData = it[0] == '[Data]' ? true : foundData) && false
    }
    .view()
Laurence E. Bernstein
@lebernstein
Mar 20 23:09
That's even slicker. Gotta try that.