These are chat archives for nextflow-io/nextflow

11th
Dec 2018
Rahul Pisupati
@rbpisupati
Dec 11 2018 08:46
Hi, Do anyone have any suggestions to get all the pairwise combinations? Something like this, https://stackoverflow.com/questions/40092474/what-is-the-fastest-way-to-get-all-pairwise-combinations-in-an-array-in-python
Paolo Di Tommaso
@pditommaso
Dec 11 2018 08:49
Rahul Pisupati
@rbpisupati
Dec 11 2018 08:55
;)
Alexander Peltzer
@apeltzer
Dec 11 2018 14:40
Can I check the size of an array of files in NXF?
input:
    file input_files from featureCounts_to_merge.collect()

    output:
    file 'merged_gene_counts.txt'

    script:
    //if we only have 1 file, just use cat and pipe output to csvtk. Else join all files first, and then remove unwanted column names.
    def list = "${input_files}".size()
    def merge = ("${input_files.toList().size()}" == 1) ? 'cat' : 'csvtk join -t -f "Geneid,Start,Length,End,Chr,Strand,gene_name"'
the size() just returns e.g. 48 if there is just one sample inside, so probably the size of the filename here...
I'd rather want to know how long the array/list of files in input_files is
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:41
alex the basics! input_files.size() not "${input_files}".size() :joy:
Alexander Peltzer
@apeltzer
Dec 11 2018 14:42
Aaaaah
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:42
otherwise you get the string length !
Alexander Peltzer
@apeltzer
Dec 11 2018 14:42
Ok
I always strand with these issues -.-
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:43
:smile:
Alexander Peltzer
@apeltzer
Dec 11 2018 14:44
Hm
size 5140 now
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:44
well you have many ..
too many?
Alexander Peltzer
@apeltzer
Dec 11 2018 14:45
4 would be expected :-(
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:45
oops
Alexander Peltzer
@apeltzer
Dec 11 2018 14:45
I'll dump the channel and see what that contains
[DUMP] /Users/alexanderpeltzer/IDEA/nf-core/RNAseq/work/b7/af90188083fc4b0510f947bdaae82f/SRR4238359_subsamp.sorted_gene.featureCounts.txt
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:47
just a single file? therefore that's not a list
safest way x instanceof Path ? 1 : x.size()
this thing must gone in a future version
Alexander Peltzer
@apeltzer
Dec 11 2018 14:49
Aah okay I see
That seems to work :-)
Didn't know it s a different thing :-)
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:53
that's a quirk behaviour I want to fix at some point
Alexander Peltzer
@apeltzer
Dec 11 2018 14:55
Should I open an issue for you?
Then you don't have to do it...
Paolo Di Tommaso
@pditommaso
Dec 11 2018 14:55
welcome :smiley_cat:
Stephen Kelly
@stevekm
Dec 11 2018 15:32

I see... not sure but it looks that e.g. for file copying it allocates as many threads as available cpu cores
https://github.com/nextflow-io/nextflow/blob/ac2be51dd255151e6ab5056f35811b44b8610af2/src/main/groovy/nextflow/file/FilePorter.groovy#L159-L171

How can I change this? This is a seriously problem on our system because I am often running several instances of Nextflow, resulting in the system being tied up with 100's of file copying threads. I would rather have like 2 file copying threads per Nextflow instance.

Paolo Di Tommaso
@pditommaso
Dec 11 2018 15:33
filePorter.maxThreads = n in the config file
Stephen Kelly
@stevekm
Dec 11 2018 15:34

By the way, this may not apply to your use case but I think usually JVM is best left alone (that is not a NF specific comment).

I dont have any Java experience but I agree with the sentiment, if it cannot be set directly then I was hoping maybe I could find some way to restrict system resources availble to Nextflow when running. Our system has cgroups but it does not appear to be configurable to end users, its tied to the SLURM cluster submission, and I prefer to run Nextflow from the head node instead

Paolo Di Tommaso
@pditommaso
Dec 11 2018 15:36
modern jvm honour cgroups limits
Stephen Kelly
@stevekm
Dec 11 2018 15:37
Ok that would work if I submit Nextflow as a SLURM job
Paolo Di Tommaso
@pditommaso
Dec 11 2018 15:38
I guess so, make some tests
Stephen Kelly
@stevekm
Dec 11 2018 15:38
I have done it but there are issues with it
Paolo Di Tommaso
@pditommaso
Dec 11 2018 15:38
it's an hard life
Stephen Kelly
@stevekm
Dec 11 2018 15:38
it becomes impossible to cleanly halt a running pipeline. Using 'scancel' or any other cluster job kill command does not allow Nextflow to clean up all its running cluster jobs
so if I scancel, I get left with all of Nextflow's jobs still running on the cluster
same for SGE
only way to avoid this that I've found is to ssh directly into the node running Nextflow and send a 'kill' to the running process
Paolo Di Tommaso
@pditommaso
Dec 11 2018 15:40
um, interesting point, you may open an issue for that
Stephen Kelly
@stevekm
Dec 11 2018 15:42
I've set it up before in a Makefile wrapper but its super complicated and not fun to deal with
I saw mention somewhere that Nextflow can write out its own process id to a file, is there a way to enable that from the command line args?
that would help some
Paolo Di Tommaso
@pditommaso
Dec 11 2018 15:43
when you use the -bg option it creates a .nextflow.pid file
Stephen Kelly
@stevekm
Dec 11 2018 15:54
ok thanks
submitted here nextflow-io/nextflow#968
Paolo Di Tommaso
@pditommaso
Dec 11 2018 16:06
:ok_hand:
Tobias Neumann
@t-neumann
Dec 11 2018 19:03

anybody seen stuff like this when starting nextflow on AWS batch?

Dec-11 18:49:32.751 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] centrifugePaVE
  status=ACTIVE
  port 0: (queue) closed; channel: -
  port 1: (cntrl) -     ; channel: $

[process] centrifugeRefSeq
  status=ACTIVE
  port 0: (queue) closed; channel: -
  port 1: (cntrl) -     ; channel: $

[process] centrifugeENA
  status=ACTIVE
  port 0: (queue) closed; channel: -
  port 1: (cntrl) -     ; channel: $

Pipeline is basically stuck at this stage:

[warm up] executor > awsbatch
PhilPalmer
@PhilPalmer
Dec 11 2018 19:21
@t-neumann could it be copyin files across?
Tobias Neumann
@t-neumann
Dec 11 2018 19:24
well it's just starting up - for that it would need to copy some indices from s3 to the EBS
but it did not start up any instances, so I don't see where copying should happen
Anthony Underwood
@aunderwo
Dec 11 2018 23:21

@t-neumann Yes I see exactly this occaisionally. It hangs at the warm up stage. I cancel try again, it still hangs and then an undetermined time later I run the command again, having changed nothing, and it runs just fine. Very strange behaviour that I haven't be able to debug.

It's not even that it's waiting for the spot price instances to become available since the jobs don't appear in the batch queue when it hangs

Have others seen this?

I am running on eu-west-2 with spot instances specified at 30% list price

Rad Suchecki
@rsuchecki
Dec 11 2018 23:47

@stevekm Not sure if you noticed Paolo's earlier answer that you can in fact set it directly using

filePorter.maxThreads = n in the config file