These are chat archives for nextflow-io/nextflow

27th
Apr 2015
Michael L Heuer
@heuermh
Apr 27 2015 18:56
any idea why a process running under nextflow would hang when .command.out reaches 64K or thereabouts?
Paolo Di Tommaso
@pditommaso
Apr 27 2015 18:57
um, is there any error message in the log file?
Michael L Heuer
@heuermh
Apr 27 2015 19:00
No, one of the programs being called spits out a lot of status-bar like output to stdout, and it appears to stop when .command.out hits about 64k
     0 -rw-rw-r-- 1          0 Apr 27 12:04 .command.begin
     0 -rw-rw-r-- 1          0 Apr 27 12:04 .command.err
    64 -rw-rw-r-- 1      65505 Apr 27 12:04 .command.out
     4 -rw-rw-r-- 1       1483 Apr 27 12:04 .command.run
     4 -rw-rw-r-- 1        378 Apr 27 12:04 .command.sh
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:01
what executor are you using ?
Michael L Heuer
@heuermh
Apr 27 2015 19:02
just the native one at the moment
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:03
what if you print the process tree, for example with ps fx
what's the status of the posix process? running? suspended ?
also what version of NXF are you using ?
Michael L Heuer
@heuermh
Apr 27 2015 19:07
nextflow v0.13.3, will have to run again to look at the process tree
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:08
since there's no .exitcode file it suggests that the process does not terminate
Michael L Heuer
@heuermh
Apr 27 2015 19:09
yeah the process is still going
$ ps -ef | grep SSAKE
    64998  64990  2 14:09 pts/8    00:00:01 /usr/bin/perl /usr/local/bin/SSAKE -f downsampled.ssake.reformatted.fa.gz.tmp -b downsampled.ssake.d/downsampled -w 1 -h 1 -p 1 -m 50 -o 30 -c 1 -e 0.90 -k 4 -a 0.1 -x 20
$ ps fx 64998
   PID TTY      STAT   TIME COMMAND
 62063 ?        S      0:06 sshd: user@pts/8  
 62064 pts/8    Ss     0:00  \_ -bash
 64750 pts/8    Sl+    0:02      \_ /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Xverify:none -XX:+TieredCompila
 64986 pts/8    S+     0:00          \_ /bin/bash -ue .command.run
 64990 pts/8    S+     0:00              \_ /bin/bash -ue /home/user/flow/work/0f/d3e827b4c5183ba20575a92808c
 64991 pts/8    S+     0:00                  \_ /bin/bash -ue .command.run
 64993 pts/8    S+     0:00                  |   \_ tee .command.out
 64992 pts/8    S+     0:00                  \_ /bin/bash -ue .command.run
 64994 pts/8    S+     0:00                  |   \_ tee .command.err
 64998 pts/8    S+     0:01                  \_ /usr/bin/perl /usr/local/bin/SSAKE -f downsampled.ssake.reforma
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:14
weird, it looks suspended
well, more than looks it is suspended
Michael L Heuer
@heuermh
Apr 27 2015 19:15
ok what does that mean?
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:16
that is process is not running, generally because it is waiting for I/O
Michael L Heuer
@heuermh
Apr 27 2015 19:17
is there some arbitrary limit to bash redirect or tee? maybe length of a single line?
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:18
I don't know, but I'm starting to think that it may be related to the tee that is used to split and save the stdout and stderr streams
Michael L Heuer
@heuermh
Apr 27 2015 19:20
"One Linux pipe(7) man page says that pipe capacity is 65536 bytes since Linux 2.6.11"... http://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:22
um, that's bad
I will investigate that
In the meanwhile you can try to capture that the SSAKE output in your process and redirect to a file
or downgrade to nextflow 0.12.5
Michael L Heuer
@heuermh
Apr 27 2015 19:31
yeah redirecting SSAKE stdout sounds like a good workaround
Paolo Di Tommaso
@pditommaso
Apr 27 2015 19:51
let me know if it works
Michael L Heuer
@heuermh
Apr 27 2015 20:08
it does, and we're not really losing anything as it writes most things to a log file in addition to stdout
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:10
ok
Michael L Heuer
@heuermh
Apr 27 2015 20:35

sorry, possibly something else, what does this mean?

Apr-27 15:22:33.078 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]

and it appears that the entire workflow has come to a halt

Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:37
it means that there are 31 tasks running
for which it is waiting the termination
Michael L Heuer
@heuermh
Apr 27 2015 20:39
task id 23 is waiting on 31 tasks? it shouldn't have that many dependencies
actuallly, it doesn't have any process dependencies, it's just reading from a channel, see https://github.com/nmdp-bioinformatics/flow/blob/master/main.nf#L41
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:42
how many fastqToSsake have been submitted ?
Michael L Heuer
@heuermh
Apr 27 2015 20:45
46 of 96
$ cat .nextflow.log | grep Submitted | grep fastqToSsake | wc -l 46
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:47
so that log line says that 31 tasks are pending in the running queue
It's not directly related to tasks dependency
Michael L Heuer
@heuermh
Apr 27 2015 20:49
task id 23 doesn't have a .command.begin file, so it hasn't started yet
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:50
but I supposed that has been "scheduled"
Michael L Heuer
@heuermh
Apr 27 2015 20:50
ps doesn't show any processes running by that use except for the nextflow jvm process
yeah there is a dir in work for task id 23
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:51
Can you share a tail of the log file
Michael L Heuer
@heuermh
Apr 27 2015 20:52
.nextflow.log? sure
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:52
yep
Michael L Heuer
@heuermh
Apr 27 2015 20:53
Apr-27 15:19:48.849 [Actor Thread 13] INFO  nextflow.processor.TaskDispatcher - [03/7bf93e] Submitted process > fastqToSsake (41)
Apr-27 15:20:11.312 [Actor Thread 2] INFO  nextflow.processor.TaskDispatcher - [f7/f11264] Submitted process > interleave (13)
Apr-27 15:20:39.658 [Actor Thread 21] INFO  nextflow.processor.TaskDispatcher - [6a/2e227d] Submitted process > interleave (11)
Apr-27 15:20:57.151 [Actor Thread 15] INFO  nextflow.processor.TaskDispatcher - [95/30918e] Submitted process > fastqToSsake (46)
Apr-27 15:22:33.078 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]
Apr-27 15:27:33.137 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]
Apr-27 15:32:33.196 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]
Apr-27 15:37:33.261 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]
Apr-27 15:42:33.327 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]
Apr-27 15:47:33.401 [Thread-4] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 31 -- first: TaskHandler[id: 23; name: fastqToSsake (23); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/flow/work/eb/b3e23cda58b3bc9aff875228c8451c]
we have a docker image nmdpbioinformatics/docker-flow:1.0-snapshot if you want to play along, it has some path dependencies on the host though, see https://github.com/nmdp-bioinformatics/docker-flow/blob/master/Dockerfile
Paolo Di Tommaso
@pditommaso
Apr 27 2015 20:59
ok I will try to build the docker image
Michael L Heuer
@heuermh
Apr 27 2015 21:09
I tried running again a separate nextflow with only the FASTQ files for task id 23 as input and things went ok, so it's probably not anything with those particular files
Paolo Di Tommaso
@pditommaso
Apr 27 2015 21:20
I can't manage to test it now (it's late night here)
I will do on tomorrow
anyway I would suggest to try to run it again with a previous version of nextflow to be sure that is not the problem we discussed
It's enough that you set the env var
export NXF_VER=0.12.5
let me know if it changes something
Michael L Heuer
@heuermh
Apr 27 2015 21:27
no problem . . . we're of course trying to figure it out on our end, thanks for your help so far
Paolo Di Tommaso
@pditommaso
Apr 27 2015 21:29
you are welcome
Michael L Heuer
@heuermh
Apr 27 2015 21:30
I'm hoping to make it to BOSC in July, I'll buy you a beer or several if you'd like :)
Paolo Di Tommaso
@pditommaso
Apr 27 2015 21:31
LOL
hope to see you there
Michael L Heuer
@heuermh
Apr 27 2015 22:33
So we tried running again with one sample and with twelve samples and everything was ok. Then we tried 48 samples and got to the stall/hang again. After ^Cing nextflow and running again with -resume, it does the same thing. Oddly enough, it is 31 pending processes again
...
[60/b950ac] Submitted process > interleave (34)
^CWARN: Killing pending processes (31)
or perhaps running with -resume didn't work
$ tail .nextflow.log
Apr-27 17:29:13.885 [Thread-1] DEBUG n.processor.TaskPollingMonitor - Killing process (30): TaskHandler[id: 87; name: interleave (34); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/work/60/b950ace4c4b07a3597e21a0471b6f2]
Apr-27 17:29:13.886 [Thread-1] DEBUG n.processor.TaskPollingMonitor - Failed killing pending process: TaskHandler[id: 87; name: interleave (34); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/work/60/b950ace4c4b07a3597e21a0471b6f2] -- cause: null
Apr-27 17:29:13.886 [Thread-1] DEBUG n.processor.TaskPollingMonitor - Killing process (31): TaskHandler[id: 63; name: interleave (21); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/work/7d/6e13b1c2e17aeced5c244729ce981e]
Apr-27 17:29:13.886 [Thread-1] DEBUG n.processor.TaskPollingMonitor - Failed killing pending process: TaskHandler[id: 63; name: interleave (21); status: SUBMITTED; exit: -; workDir: /mnt/nextflow/work/7d/6e13b1c2e17aeced5c244729ce981e] -- cause: null