These are chat archives for nextflow-io/nextflow

3rd
Apr 2019
Jonathan Manning
@pinin4fjords
Apr 03 07:49

@rsuchecki thanks, but that will prevent the workflow from dying ever won't it? I do want it to die, just not until it's done all the work it can.

e.g. if I have a process A that populates a channel with multiple items, which are then consumed together by process B, I don't want failures in A to be ignored and allow B to happen with a subset of outputs. But I do want all of the As that are possible to happen so that when I re-run it's just the erroring As that have to be re-done before B.

Jonathan Manning
@pinin4fjords
Apr 03 08:26

Another Q. I have a process whose memory requirements I would like to depend on the number of lines in an input file (part of a set input). I thought I could do this by the following:

memory { (10 + sdrfFile.numLines()) * 200.KB  }

(where sdrfFile is my input).

This doesn't work, the jobs fail to submit with errors like:

Apr-03 09:04:49.182 [Task submitter] INFO  nextflow.processor.TaskProcessor - [24/b493fd] NOTE: Error submitting process 'quantify (1)' for execution -- Execution is retried (1)

What am I doing wrong?

micans
@micans
Apr 03 09:56
@pinin4fjords that's what it will do, doing all the work it can. Anything that is ignored becomes a dead end, its output will not spawn further processes / progress down the DAG, but all healthy processes will be taken as far as they can, and if there are still inputs in the queue these will also be processed. For our rnaseq pipeline I use ignore by default and retry for specific processes. I use ignore by default as with thousands of samples there is often some exceptional/freak failure and it is a pain if the pipeline dies.
teoKusa!
@teoKusa_twitter
Apr 03 11:05
Hi, a question for the channel. I have a process that creates a number of output file, then each of these file needs to be used in a second process in combination with another SPECIFIC file
so for instance, my first process generates file data_A, and the second process needs to consume data_A specifically with data_B that comes from another source
what's the best way to handle this with nextflow?
Sorry if the question is a bit beasic
(note that data_A can be in used together with data_B, data_C, data_D...)
Maxime Garcia
@MaxUlysse
Apr 03 11:14
@teoKusa_twitter Did you looked into the combining operators? https://www.nextflow.io/docs/latest/operator.html#combining-operators
It depends if you need to sync or not the channels
I'll go for combine or merge if no sync if required or join if you need to sync channels
teoKusa!
@teoKusa_twitter
Apr 03 11:23
looking into that now, thanks @MaxUlysse
Maxime Garcia
@MaxUlysse
Apr 03 11:33
you're welcome
You can also have a look at https://nextflow-io.github.io/patterns/index.html there could be some examples
Jonathan Manning
@pinin4fjords
Apr 03 11:59
@micans Thanks for that. So just to be clear, a process taking as input a collect() from another channel (e.g. an RNA-seq aggregation step) won't run if there's an error in the upstream process, even if that error is 'ignored'?
micans
@micans
Apr 03 12:00
@pinin4fjords oooohhh, umm. It wil collect the finished outputs. In my case, I get a count matrix on those samples that have finished.
Jonathan Manning
@pinin4fjords
Apr 03 12:01
@micans Ahh, that's what I was afraid of. So I'd have to manually make the aggregation fail if the number of inputs wasn't right.
micans
@micans
Apr 03 12:03
Does it matter a lot? Is a partial result not useful, or is it a waste of computing? My aggregate step is fairly simple, and the partial result is still useful.
Jonathan Manning
@pinin4fjords
Apr 03 12:11
@micans so this is a production system, we're doing analyses and publishing the results. I can't be losing assays due to random failures, so I do need the fail to happen (eventually). Some sort of dummy process is order, to check the size of the channel at the end.
micans
@micans
Apr 03 12:16
@pinin4fjords indeed that can be controlled in many ways. You could just compare the sample count in the aggregate shell script with the input sample count, or keep track with a counter in the nextflow code .. there are many ways to do that. I favour explicit control like this over a nextflow termination. I have opened an issue for matters relating to this: nextflow-io/nextflow#903 - It is currently pending the modules development I believe.
Jonathan Manning
@pinin4fjords
Apr 03 12:23
@micans okay thanks
Anthony Underwood
@aunderwo
Apr 03 18:35

Hi all . Does anybody have any experience with Nextflow and singularity

weird error about files not existing - debugging as follows:

bash-4.2$ pwd
/lustre/scratch118/infgen/team212/au3/singularity/work
bash-4.2$ singularity exec /lustre/scratch118/infgen/team212/au3/singularity/ghru-assembly.sif /bin/bash -c "zcat /lustre/scratch118/infgen/team212/au3/singularity/assembly_test/fastqs/ERR230474_1.fastq.gz | head"
gzip: /lustre/scratch118/infgen/team212/au3/singularity/assembly_test/fastqs/ERR230474_1.fastq.gz: No such file or directory

change up a diretcory

bash-4.2$ pwd
/lustre/scratch118/infgen/team212/au3/singularity
bash-4.2$ singularity exec /lustre/scratch118/infgen/team212/au3/singularity/ghru-assembly.sif /bin/bash -c "zcat /lustre/scratch118/infgen/team212/au3/singularity/assembly_test/fastqs/ERR230474_1.fastq.gz | head"
@ERR230474.1 1/1
TGTATCAAAACAGCTTGGGAAATAATTTATAAAGTATGTATAAGAACTGTATAAGGTATTCAAACATTGTAAACACTCATGCTTCGGACCAAACTCATGGTGATGTTATGAAATTTGATTGCTCGCATCGTGTATTTCTATCTTTAATCG
+
?????BBBDDDDDDDEGGGGGGIIIIIIIIIIIICGHHEFHIIIHHIIIHIIIIIH9AAFHHIIIIIIIIHIIIIIIHIIIIIHIHHHHHIHIIIIIIIIACGHIHIIIIHHHGDGHHGHHHHHHHHHHHFGAFFGFGGFGGGGGGGGGE
@ERR230474.2 2/1
TAGCTCATTGATTATCTAGTCATAATTCAAGCAACTACTACAATATAACAAAATCCTTTTTATAACGCAAGTTCATTTTATGCTACTGCTCAATTTTTTTACTTTTATCGATTAAAGATAGAAATACACGATGCGAGCAATCAAATTTCA
+
AAAAABBBDDDDDDDEGGGGGGIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIHHHHHGHIHIHIIIIIIIIIIIIIIIIIIIIIHHIHHHHHHHHHHHHDHHHHHHHFHHHHHGGGGHGGGGGGGGGGGGGGGHHE
@ERR230474.3 3/1
TTAGTGAGTGTATCAAAACAGCTTGGGAAATAATTTATAAAGTATGTATAAGAACTGTATAAGGTATTCAAACATTGTAAACACTCATGCTTCGGACCAAACTCATGGTGATGTTATGAAATTTGATTGCTCGCATCGTGTATTTCTATC
Same permissions on both directories but shouldn't matter since it's using absolute paths
drwxr-sr-x 13 au3 team212       4096 Apr  3 19:12 work

drwxr-sr-x 8 au3 team212 4096 Apr  3 19:26 singularity
Anthony Underwood
@aunderwo
Apr 03 18:56
It's something to do with bind paths.
If I bind the parent directory
pwd
/lustre/scratch118/infgen/team212/au3/singularity/work
bash-4.2$ singularity exec --bind /lustre/scratch118/infgen/team212/au3/singularity /lustre/scratch118/infgen/team212/au3/singularity/ghru-assembly.sif /bin/bash -c "zcat /lustre/scratch118/infgen/team212/au3/singularity/assembly_test/fastqs/ERR230474_1.fastq.gz | head"
@ERR230474.1 1/1
TGTATCAAAACAGCTTGGGAAATAATTTATAAAGTATGTATAAGAACTGTATAAGGTATTCAAACATTGTAAACACTCATGCTTCGGACCAAACTCATGGTGATGTTATGAAATTTGATTGCTCGCATCGTGTATTTCTATCTTTAATCG
+
?????BBBDDDDDDDEGGGGGGIIIIIIIIIIIICGHHEFHIIIHHIIIHIIIIIH9AAFHHIIIIIIIIHIIIIIIHIIIIIHIHHHHHIHIIIIIIIIACGHIHIIIIHHHGDGHHGHHHHHHHHHHHFGAFFGFGGFGGGGGGGGGE
@ERR230474.2 2/1
TAGCTCATTGATTATCTAGTCATAATTCAAGCAACTACTACAATATAACAAAATCCTTTTTATAACGCAAGTTCATTTTATGCTACTGCTCAATTTTTTTACTTTTATCGATTAAAGATAGAAATACACGATGCGAGCAATCAAATTTCA
+
AAAAABBBDDDDDDDEGGGGGGIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIHHHHHGHIHIHIIIIIIIIIIIIIIIIIIIIIHHIHHHHHHHHHHHHDHHHHHHHFHHHHHGGGGHGGGGGGGGGGGGGGGHHE
@ERR230474.3 3/1
TTAGTGAGTGTATCAAAACAGCTTGGGAAATAATTTATAAAGTATGTATAAGAACTGTATAAGGTATTCAAACATTGTAAACACTCATGCTTCGGACCAAACTCATGGTGATGTTATGAAATTTGATTGCTCGCATCGTGTATTTCTATC
Michael Webb
@michaelwebb_twitter
Apr 03 20:32

Hi everyone. I'm using nextflow on my local machine (MacBook Pro) with the simplest possible .nf script (one process that calls one shell file that writes a one-line text file). Whenever I run nextflow run, it takes a full 10 seconds for anything to happen. I see the line

Launching `new.nf` [astonishing_chandrasekhar] - revision: 6d7dadae5b

after ~2 seconds, but then don't see

[warm up] executor > local

until another 8 seconds have elapsed. Is this normal? Is there any way to speed it up? Having to wait 10 seconds for every run to start makes rapid prototyping... less rapid ;) Thanks!

Kevin Sayers
@KevinSayers
Apr 03 21:00
@michaelwebb_twitter it should not be that slow. Do you see the same behavior if you just have a process which does something simple like echo a string?
Michael Webb
@michaelwebb_twitter
Apr 03 21:33
@KevinSayers that's reassuring! I do see the same behavior with a very simple process.
E.g., file simple.nf with contents
process simpleProcess {
    """
    echo "Hi there!"
    """
}
and then in terminal I run nextflow run simple.nf
That still takes 10 seconds
Terminal looks like
michaelwebb$ nextflow run simple.nf
N E X T F L O W  ~  version 19.01.0
Launching `simple.nf` [spontaneous_swirles] - revision: 77252fceec
[warm up] executor > local
[9b/f8899b] Submitted process > simpleProcess
Michael Webb
@michaelwebb_twitter
Apr 03 21:40
I just tried a fresh install of nextflow (using curl -s https://get.nextflow.io | bash), same problem
my java -version is
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)