These are chat archives for nextflow-io/nextflow

29th
Jul 2017
Paolo Di Tommaso
@pditommaso
Jul 29 2017 06:21
likely a problem of firewall, if you upload the complete log to pastebin.com I can give it a look
Sergey Venev
@sergpolly
Jul 29 2017 13:48
@pditommaso I commented that issue #412 with the log-file and closed it - it's good now, I think.
If you have a chance to look at the logs https://www.dropbox.com/s/x3b5g97h53pv9fc/fixed_nnn.txt?dl=0
there is interesting stuff happening with job 4739086. LSF reports status UNKNWN for the job several times in a row, and nextflow eventually decides to check .exitcode - which isn't there.
nextflow decides that the job is complete, again for unknown reason
Sergey Venev
@sergpolly
Jul 29 2017 13:53
does not kill it, but re-submitts it for the 3rd time, even though I have maxRetries=2
Sergey Venev
@sergpolly
Jul 29 2017 14:09

initial submit 1st time:

[5e/199aee] Submitted process > map_runs (library:HeLa1 run:lane5 chunk:10)

timeout and resubmit 2st time - this is the job 4739086:

WARN: Process `map_runs (library:HeLa1 run:lane5 chunk:10)` terminated with an error exit status (140) -- Execution is retried (1) [81/656d46]
Re-submitted process > map_runs (library:HeLa1 run:lane5 chunk:10)

problem because LSF reports unknown job status and resubmit for 3rd time (even though maxRetries=2) . 4739086 is not killed though - it finishes by itself 2 mins later.

[23/303f4f] Submitted process > parse_runs (library:HeLa1 run:lane5 chunk:10 parsing) WARN: Process `parse_runs (library:HeLa1 run:lane5 chunk:10 parsing)` terminated for an unknown reason -- Likely it has been terminated by the external system -- Execution is retried (1)
[55/d35d1b] Re-submitted process > parse_runs (library:HeLa1 run:lane5 chunk:10 parsing)

From that point pipeline execution proceeds without issues

Sergey Venev
@sergpolly
Jul 29 2017 14:44
would increasing exitReadTimeout from the default 90sec make nextflow wait for 4739086 longer?