These are chat archives for nextflow-io/nextflow

16th
May 2019
Rad Suchecki
@rsuchecki
May 16 00:20
you can retry a few time to account for glitches then ignore e.g. :
errorStrategy { task.attempt < 3 ? 'retry' : 'ignore' }
Laurence E. Bernstein
@lebernstein
May 16 01:47

I have the following processes:

process '14A_signal_completion' {
  output:
    file "${sampleName}.completed" into completion_file_ch
  script:
  """
    echo "Status            : Successful" >> ${sampleName}.completed
  """
}

process '15A_collect_results' {
  input:
    file "*.completed" from completion_file_ch.collect()
  output:
    file "${analysisId}.completed" into collection_ch
  script:
  """
    python doStuffWithFiles.py
  """
}

But it seems that the input specification does not stage the files so I am unable to access all my completion files. What am I doing wrong?

evanbiederstedt
@evanbiederstedt
May 16 02:14

@rsuchecki

you can retry a few time to account for glitches then ignore e.g

This is a good idea! Thanks! However, it still makes debugging a bit tricky, as ignore means that the (failed) outputs to this step will move downstream. If that one sample "stopped" (but the other successful samples continued), it would help the NF user know exactly where the error occurred, and debug precisely there. Does this make sense?

Otherwise, I'll give errorStrategy { task.attempt < 3 ? 'retry' : 'ignore' } a shot, thanks!

Laurence E. Bernstein
@lebernstein
May 16 02:23
@evanbiederstedt What you are describing is EXACTLY what I am tackling with my solution. I use 'ignore' and then afterScript to run a script which logs information into a status file including what is the current step. If you then look at that status file later you can see what was the last step that completed (successfully or not) and any error that occurred if you cat the .command files into your output.
Rad Suchecki
@rsuchecki
May 16 02:57
I think you should also have the info on failed tasks in HTML report as well as in the trace https://www.nextflow.io/docs/latest/tracing.html @evanbiederstedt

@lebernstein perhaps I don't get the problem, a simplified version of your code seems to work:

process '14A_signal_completion' {
  input: 
      val sampleName from Channel.from(['A','B','C'])

  output:
    file "${sampleName}.completed" into completion_file_ch
  script:
  """
    echo "Status            : Successful" >> ${sampleName}.completed
  """
}

process '15A_collect_results' {
echo true
  input:
    file "*.completed" from completion_file_ch.collect()

  script:
  """
    ls -l
  """
}

output:

[warm up] executor > local
[83/54ff61] Submitted process > 14A_signal_completion (1)
[99/436850] Submitted process > 14A_signal_completion (2)
[12/43204e] Submitted process > 14A_signal_completion (3)
[7b/823815] Submitted process > 15A_collect_results
total 12
lrwxrwxrwx 1 rad rad 72 May 16 12:31 1.completed -> /home/rad/repos/test3/work/83/54ff61e4d4025409aecb35af0fb786/A.completed
lrwxrwxrwx 1 rad rad 72 May 16 12:31 2.completed -> /home/rad/repos/test3/work/99/436850afaf8d04773b3165c205f1ab/B.completed
lrwxrwxrwx 1 rad rad 72 May 16 12:31 3.completed -> /home/rad/repos/test3/work/12/43204eeee0d4b606a2ec38734f6346/C.complete
Laurence E. Bernstein
@lebernstein
May 16 03:08
Mine does not stage any of those files in the work directory.
Rad Suchecki
@rsuchecki
May 16 03:09
strange
Laurence E. Bernstein
@lebernstein
May 16 03:11
Well.. I should say it does not stage MY files. It might work on your case, I'd have to test.
Rad Suchecki
@rsuchecki
May 16 03:11
As you can see, because of the glob the files get renamed, from e.g. A.completed to 1.completed etc - you are not looking for the original filenames?
To avoid renaming, use
input:
    file '*' from completion_file_ch.collect()
Laurence E. Bernstein
@lebernstein
May 16 03:14
In my real workflow none of the collected files get staged with any name, although another input does get staged properly. When I run the test like you did it works fine.. hmm.. mysterious.. I'll have to see if I can make a hybrid test that does something like my real workflow.
Woah.. wait a second.. in the .command.run it says it is staging the one file (I ran with only one sample) with the name [.completed] !!!
Let me try it with "*" because I need them to retain their names.
Damn invisible files!
Rad Suchecki
@rsuchecki
May 16 03:20
ah, hidden, right!
Laurence E. Bernstein
@lebernstein
May 16 03:45
Thanks much.. its working now. The renaming was the issue.
Rad Suchecki
@rsuchecki
May 16 04:01
:v:
Francesco Strozzi
@fstrozzi
May 16 08:13
hi guys, as anyone had experience running AWS Batch jobs using the r5d instances ? I have a job which is supposed to run on a big machine of this type (i.e. with 768GB of RAM and 96 CPUs) but AWS Batch does not even open a SPOT request. Of course I am able to run a SPOT instance of this type “by hand”, so it is not a problem of price or availability. The amount of CPU and RAM of the NF job are correct and match exactly that type of machine, but it seems as AWS Batch is not able to open a spot request which includes that type of instance…if some one has had any similar experience I will be happy to discuss it (before opening a ticket to AWS)
Eugene Bragin
@eugene.bragin_gitlab
May 16 08:38
Hi all. When running on AWS Batch, do temporary files get cleared off the instances automatically upon task completion so long running instances wouldnt fill up eventually? I'm sure the answer is yes, unless I'm missing some default setting somewhere, just wanted to double check.
micans
@micans
May 16 09:33
@lebernstein @evanbiederstedt @rsuchecki This issue is very relevant: nextflow-io/nextflow#903
@lebernstein what does your afterScript solution look like? I have a handcoded lostcause channel-type approach (see issue). In our case however, there is an issue with transient errors (i.e. data not available). If I can document this in my lostcause channel/process, I can only do this by having the problematic process succeed. This means that -resume does not work as I need it to. If the I let the problematic process fail, my lostcause solution no longer works.
Evan Floden
@evanfloden
May 16 12:55
@fstrozzi I have had this problem when specifying instances that are not supported by batch. This was when creating compute environments in the API, it was not obvious or easy to query which instances were supported by batch.
Francesco Strozzi
@fstrozzi
May 16 12:56
thanks for the answer, yes I got that too in the past. What is weird here is that I can create a computing environment with that type of instance and it is accepted and valid. I’ve opened a ticket with AWS, let’s see what they’ll say about it
KochTobi
@KochTobi
May 16 14:36
@fstrozzi The AWS Batch executor worked for me with r5d instances in virginia. I had the maximum spot price set to 100% though and created the compute environment using the web console.
Francesco Strozzi
@fstrozzi
May 16 14:36
@KochTobi interesting, thanks for the feedback
Eugene Bragin
@eugene.bragin_gitlab
May 16 15:12
Did anyone ever run into a problem of samples mix up when running two parallel processes and then combining their output in a third process? https://groups.google.com/forum/#!topic/nextflow/8IsQejP5rsE
Eugene Bragin
@eugene.bragin_gitlab
May 16 16:15
Channel.from(1, 2, 3).into{nums1; nums2}

process a {
  input:
  val x from nums1

  output:
  val x into a_ch

  """
  """
}


process b {
  input:
  val x from nums2

  output:
  val x into b_ch

  """
  """
}


process combine {
  echo true

  input:
  val(a) from a_ch
  val(b) from b_ch

  """
  echo $a $b
  """  
}
Expected output
1 1
2 2
3 3
But I get rundom combinations
N E X T F L O W  ~  version 19.01.0
Launching `test.nf` [agitated_shaw] - revision: 3ac9dc439b
[warm up] executor > local
[ac/c674b2] Submitted process > a (3)
[c3/ddbd09] Submitted process > b (1)
[f4/adb73b] Submitted process > b (2)
[c4/06a540] Submitted process > a (2)
[37/542db7] Submitted process > b (3)
[b4/fad97a] Submitted process > a (1)
[23/c2ed9b] Submitted process > combine (1)
[26/6724ba] Submitted process > combine (2)
[38/9aea69] Submitted process > combine (3)
3 1
2 2
1 3
Paolo Di Tommaso
@pditommaso
May 16 16:25
parallel execution is not deterministic !
Eugene Bragin
@eugene.bragin_gitlab
May 16 16:34
Thanks @pditommaso , will try. I have spotted many cases in nextflow/awesome where people use more than one input (outputs of other processes) without joining by key. Unless I miss something they probably running into similar issue without knowing it
Paolo Di Tommaso
@pditommaso
May 16 16:35
that's scaring ..
Eugene Bragin
@eugene.bragin_gitlab
May 16 16:35
doesn't use join, does that mean files might get mixed up?
Paolo Di Tommaso
@pditommaso
May 16 16:36
OH! that's me :D
not sure where it was linked
Eugene Bragin
@eugene.bragin_gitlab
May 16 16:36
Sorry, didn't mean to cause trouble :)
Paolo Di Tommaso
@pditommaso
May 16 16:36
maybe I was showing what NOT to do ;)
Eugene Bragin
@eugene.bragin_gitlab
May 16 16:36
Paolo Di Tommaso
@pditommaso
May 16 16:37
well, nobody is perfect .. :D
Eugene Bragin
@eugene.bragin_gitlab
May 16 16:38
sure, I'll try and get the above example working with join (I'm new to nextflow syntax)
otherwise nextflow is awesome, got hundreds on assemblies done in minutes on batch. only spotted a problem when saw my s.aureus samples aligned against e.coli ref genome :))
Paolo Di Tommaso
@pditommaso
May 16 16:40
nice to hear that
najitaleb
@najitaleb
May 16 16:55
hi @pditommaso . Do you know a good step to take next to troubleshoot this problem?
Stephen Kelly
@stevekm
May 16 17:08

What we would like to implement is a pipeline which will run all steps to completion. However, there are sometimes problems with the sample quality, i.e. let's assume there are 5 bad samples, some will fail at alignment, others at variant calling.

@evanbiederstedt do not use retries for this, you should use filtering in your Channels to try and figure out which data might be bad and prevent them from making it to processes where they will fail. I use this strategy a lot on my exome pipeline;

@lebernstein
Stephen Kelly
@stevekm
May 16 17:16
if the method required to determine if you sample data is "bad" is too complex to figure out in pure Groovy inside a Channel .filter then yeah passing it through a process with a dedicated script that writes a PASS/FAIL type value that you can then parse easily from within the Nextflow channel is a good idea
basically you want to do everything you can to prevent bad data from getting to your Nextflow process in the first place, thats the easiest method I think, rather than let it break
Laurence E. Bernstein
@lebernstein
May 16 17:35

@eugene.bragin_gitlab Currently I don't really know what might cause my workflow to fail (because I am new to the team) so I put a solution in place that will allow me to debug issues with the pipeline when a failure occurs. My solution is like this:

At the beginning of the pipeline I log parameters to a log file:

logFileName = "${workingDir}/loginfo.txt"
File logFile = new File(logFileName)
logFile << "Project Name       : ${params.projectName}\n"
logFile << "Analysis ID        : ${params.analysisId}\n"
logFile << "Launch Time        : ${workflow.start.format('dd-MMM-yyyy HH:mm:ss')}\n"

This allows me to cat all those parameters (including workflow params) into my status log in the afterScript shell script.

In the process I use errorStrategy 'ignore' and afterScript that runs a shell script and I pass it a few key values (the sample name - which is passed as an input value channel, the workingDir - which is really my output location, the log file and the current process name.

  afterScript "source ${scriptsPath}/statusUpdate.sh ${sampleName} ${workingDir} ${logFileName} 1A_bwa_align"

It is worth noting that I unfortunately must put this in every process (instead of in the config file) because the sampleName is an input channel.

The script then looks like this (I have simplified it):

currentDir=$PWD
echo 'Status             : '$4 > $2/$1.status;
cat  $3 >> $2/$1.status;
echo  'Nextflow Work Dir  : '$currentDir >> $2/$1.status;
date +"Completion Time    : %d-%b-%Y %T" >> $2/$1.status;
echo "========= COMMAND.ERR ===============" >> $2/$1.status;
cat $currentDir/.command.err >> $2/$1.status;
Stephen Kelly
@stevekm
May 16 18:08

I have spotted many cases in nextflow/awesome where people use more than one input (outputs of other processes) without joining by key. Unless I miss something they probably running into similar issue without knowing it

@eugene.bragin_gitlab generally if you have multiple input channels I think its safest to have it set up in a way that each input channel only emits a single item, so that the process only executes a single time with the expected inputs. If you wanted the process to execute multiple times with different inputs then you need to combine all the inputs into a single channel

what does it mean when a process produces the necessary output file, but still gives this error: Missing output file(s) Jie.enriched.rds expected by process randomNum

@najitaleb not sure exactly but I would start by looking inside the work dir for the process and seeing if the file is/isnt present, then enter that dir and try to bash .command.run and see what happens. Also take a close look at the process itself inside the Nextflow script since you might have an output: set using a variable that is no longer used in the actual process, or you could even end up with that same variable name used in the global scope of the script (outside the process) and you are actually getting the global version of the variable instead of the one in the process itself... yeah these are all things I usually look through and some things I have seen with such errors

najitaleb
@najitaleb
May 16 18:20
Ok thank you. I'll take a look at these things
najitaleb
@najitaleb
May 16 19:31
It seems I resolved it@pditommaso . Your tip about the working directory was right, and I also had docker -d detached as part of my runOptions, which would print the container ID and cause an error immediately. Thanks for all the help