These are chat archives for nextflow-io/nextflow

17th
Jan 2018
LukeGoodsell
@LukeGoodsell
Jan 17 2018 11:13
Hey Paolo, are there any conferences you’d recommend attending this year that touch on workflow architecture and (maybe) Nextflow?
Paolo Di Tommaso
@pditommaso
Jan 17 2018 12:30
Hi, nothing in particular on my radar at this moment
you may be interested at the BOSC conference https://gccbosc2018.sched.com/
Fredrik Boulund
@boulund
Jan 17 2018 14:36
Hi!
What's the best way to access the exit status (exit code) for a process, and only send output to the output channel for certain values?
e.g. I have a contamination detection step, and only if a sample passes the contamination detection (i.e. it is without contamination) should it send the read files further into the workflow.
I've toyed a bit with task.exitStatus but not sure if that works at all. Also considered reading the exitcode from .exitcode in the processes' workdir, but that doesn't feel very clean
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:38
we can assume that are non-zero therefore an error code ?
Fredrik Boulund
@boulund
Jan 17 2018 14:38
Yep, i simplified a bit here just to get going, but in reality there's a set of different exit codes
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:39
if so you can simply use a errorStrategy 'ignore'
Fredrik Boulund
@boulund
Jan 17 2018 14:39
I've already set validExitStatus to the relevant values in the config for this particular process
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:40
and the process won't output any result
Fredrik Boulund
@boulund
Jan 17 2018 14:40
Yes, a file with the contamination detection results
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:41
you mean that you have an output in any case ?
Fredrik Boulund
@boulund
Jan 17 2018 14:41
yep
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:42
mm
if so you can declare an output optional: true
but there's a specific way to associate that to the actual exit code
unless .. let me check
Fredrik Boulund
@boulund
Jan 17 2018 14:44
I've two ideas for how I would like to solve it, but maybe they aren't very well thought through... :)
  1. Check exitcode of process, and only send the same input read files into channel for downstream processing (contamination output report files get send to publishDir as usual)
  2. Split the input read stream into two channels, process the first channel in the contamination detection process, and output the detection results into a new channel (let's call it contaimination_out for now). Then I'd like to do a conditional join of the unused read channel and contamination_out channel based on the contamination detection result, using their shared pair_id keys to create a new channel that only contains the (pair_id, read_1, read_2) for the files that actually pass the contamination detection step.
Maybe it's a bit confusing in text like this, let me know if something is terribly unclear
I would guess that the first method would be the simplest to implement, as it only requires a simple conditional statement on the exit code in the output declarations for that process, right?
For idea 1, I'm thinking something like:
process detect_contamination {
    input:
    set pair_id, file(reads) from input_reads
    output:
    if (task.exitStatus == 3) {
        set pair_id, file(reads) into output_reads
     }
     """
     bash script with different exit codes depending on outcome...
     """
}
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:47
I would check the exit status in your task command and output a fake/special file
then detected that condition in a downstream process/operator
you can't write the above
Fredrik Boulund
@boulund
Jan 17 2018 14:48
Hmm, sure that could work as well.
hehe, no. I realize that won't work :D
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:48
declare
Fredrik Boulund
@boulund
Jan 17 2018 14:49
So it's not possible to access the exitcode of the bash process before sending output down the output channel of a process?
Can you send the .exitcode file for a process into a channel? Or is that created first after the process is completed?
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:50
  output: 
  set pair_id, file(reads) optional true into output_reads
you can't declare a conditional output structure
but you can say that an output is optional
Fredrik Boulund
@boulund
Jan 17 2018 14:52
Ah, that's nice. But not sure how I best leverage that in this situation?
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:52
you can get the a command exit status in your script with $?
Fredrik Boulund
@boulund
Jan 17 2018 14:53
just plain bash, sure.
Can I communicate that exit status from the bash script back to nextflow somehow, without going via a file?
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:54
neither, my suggesting is to use the exit status only in the bash script to conditional create some optional file
Fredrik Boulund
@boulund
Jan 17 2018 14:54
I understand
Is there a way to communicate any information from the bash script back to nextflow somehow, not just exit codes?
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:56
nope
Fredrik Boulund
@boulund
Jan 17 2018 14:56
Ok! :D
That's a pity, then we're back to sentinel files to communicate this information then. The maybe I should just make my bash script output something very simple (that can easily be parsed in a downstream step) and send that file along in the tuple in the channel
Paolo Di Tommaso
@pditommaso
Jan 17 2018 14:58
this sounds a good workaround
Edgar
@edgano
Jan 17 2018 14:59
hey @boulund ,
I have the same issue. I need to get a value in NF and it comes from a bash script.
my idea is to save the exit status to a file and get the file in NF as a channel
a=$?
echo ${a} >>size.txt
Fredrik Boulund
@boulund
Jan 17 2018 14:59
Thanks, then I'll end with a quick question. Is it better to make that logic inside the bash script of a downstream step, or can I make the logic in nextflow prior to executing the bash script in a downstream step, e.g. using the script: feature.
@edgano cool, I'm not the only one :D
I hoped that the bash exit status would be available like task.exitStatus or something, and that I would be able to send that value in the output of a process. I like the concept of making a SQL-like joining of two channels based on a conditional, but maybe that's just confusing and overcomplicating things.
Sentinel files are at least good at making it easy to understand what's going on
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:01
I don't how it could help you putting that logic prior to executing the bash script .. since you don't have yet execute the task ..
Fredrik Boulund
@boulund
Jan 17 2018 15:02
@pditommaso , sorry. I'm confusing you. I meant that the logic that acts upon the output from the first process actually happens in a downstream process, based on the value in the output from the first process
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:04
that's an option
Fredrik Boulund
@boulund
Jan 17 2018 15:04
I've seen it in someone else's pipeline and thought it looked fairly clean to have the logic in nextflow/groovy before launching the bash portion of the process
Fredrik Boulund
@boulund
Jan 17 2018 15:16
Thanks for the discussion and help. Always something new to learn about nextflow. Here's the concept of what we're going to do now then:
process detect_contamination {
    input: 
    set pair_id, file(reads) from input_channel
    output: 
    set pair_id, file(reads), file("${pair_id}.contamination_report.txt") into downstream_channel
    """
    detect_contamination.py -r1 ${reads[0]} -r2 ${reads[1]} --output ${pair_id}.contamination_report.txt
    """
}

process downstream {
    input: 
    set pair_id, file(reads), file(contamination_report) from input_channel
    output: 
    set pair_id, file("some_output_file")
    """
    if ! grep --quiet "NO CONTAMINATION" ${contamination_report}; then
        run_some_other_analysis.py ....
    fi
    """
}
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:17
exactly
Fredrik Boulund
@boulund
Jan 17 2018 15:18
Is it technically impossible to implement some way of communicating information back from a bash execution into nextflow? I feels like it would fit the nextflow mentality of sending information via channels rather than via files
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:19
there's a tentative but not so easy
nextflow-io/nextflow#69
anyhow the just the exit status should be possible I will give to it
Fredrik Boulund
@boulund
Jan 17 2018 15:20
Thanks for linking me that! I didn't know you could access stdout like that. That's super convenient in my case!
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:21
:+1:
Fredrik Boulund
@boulund
Jan 17 2018 15:21
I just thought of another question that I've been struggling with recently
How to activate conda environments for a specific process
Previously, I used process.$processname.beforeScript with an source activate environment_name, but now since conda 4.4 it has changed.
Since conda 4.4 it is now conda activate environment_name but that functionality relies on first sourcing another script (anaconda3/etc/profile.d/conda.sh), rather than just adding the conda bin directory to PATH
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:25
no sure how help here
we plain to have a better support soon, but for now now you need to hack it
nextflow-io/nextflow#493
Fredrik Boulund
@boulund
Jan 17 2018 15:27
Yes, I saw that thread previously. Not sure what I think about that to be honest. It probably won't work very well for me like that, if I understood what your planning to do
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:29
the idea is to have NF load/active the conda modules for you
Fredrik Boulund
@boulund
Jan 17 2018 15:29
That is indeed nice, but I'm not sure I follow the whole biocontainer issue
I have some local custom environments that I want to run in my workflows, that won't be available as recipies or containers
Paolo Di Tommaso
@pditommaso
Jan 17 2018 15:30
I see
what about a custom container ?
Fredrik Boulund
@boulund
Jan 17 2018 15:31
I guess I could make one, I haven't yet looked into how that works
For my uses, it would be perfectly alright if nextflow just loaded an environment that I can specify for each process
like a process directive: loadCondaEnv my_environment, that just fails if the environment doesn't exist or doesn't load properly
Fredrik Boulund
@boulund
Jan 17 2018 15:43
but as I said, I now rely on beforeScript 'source activate my_env' and that seems to work fine for now, if I make sure to include the right conda paths in PATH
Vladimir Kiselev
@wikiselev
Jan 17 2018 16:54
Hi Paolo, does a new -N (send an email) argument work on a Mac? I’ve tried it on my laptop but don’t receive any emails...
Paolo Di Tommaso
@pditommaso
Jan 17 2018 17:53
quite surely you need to configure the smtp server
If no mail configuration is provided, it tries to send the notification message by using the external mail command eventually provided by the underlying system (eg. sendmail or mail).
I think I need to highlight that in the docs
Félix C. Morency
@fmorency
Jan 17 2018 17:59

Is

input:
file foo

when:
foo.text =~ /false/

permitted?

Paolo Di Tommaso
@pditommaso
Jan 17 2018 18:02
no
file content can be access only during task execution
Félix C. Morency
@fmorency
Jan 17 2018 18:05
Thanks
Paolo Di Tommaso
@pditommaso
Jan 17 2018 18:06
:+1:
but you can just do
input:
file foo from foo.filter { it.text =~ /false/ }
well maybe it should be negated
Félix C. Morency
@fmorency
Jan 17 2018 18:08
Interesting, let me try
Paolo Di Tommaso
@pditommaso
Jan 17 2018 18:08
input:
file foo from foo.filter { it.text =~ /true/ }
Félix C. Morency
@fmorency
Jan 17 2018 18:08
Nah, I want to run when it's false ;)
Paolo Di Tommaso
@pditommaso
Jan 17 2018 18:08
ah, right
Vladimir Kiselev
@wikiselev
Jan 17 2018 20:42

quite surely you need to configure the smtp server

Thanks, Paolo!

Paolo Di Tommaso
@pditommaso
Jan 17 2018 20:42
welcome