These are chat archives for nextflow-io/nextflow

13th
Nov 2018
Tobias Neumann
@t-neumann
Nov 13 2018 09:04

Hi,

I wanted to setup a dynamic queue assignment based on the task.attempt but something is off because nextflow cannot parse the awsbatch section anymore properly:

awsbatch {
            aws.region = 'eu-central-1'
            aws.client.storageEncryption = 'AES256'
            process.queue { task.attempt > 1 ? 'awsconvertexcess' : 'awsconvertworkload' }
            executor.name = 'awsbatch'
            executor.awscli = '/home/ec2-user/miniconda/bin/aws'
            singularity.enabled = false
            docker.enabled = true
            process.publishDir = [
                [path: 's3://obenauflab/results', mode: 'copy', overwrite: 'true', pattern: "*fq.gz"],
              ]
        }

This is the resulting errror:

 No signature of method: groovy.util.ConfigObject.queue() is applicable for argument types: (_nf_config_f554f07e$_run_closure1$_closure10$_closure13) values: [_nf_config_f554f07e$_run_closure1$_closure10$_closure13@69f63d95]
  Possible solutions: use([Ljava.lang.Object;), clear(), clone(), size(), values(), dump()

groovy.lang.MissingMethodException: No signature of method: groovy.util.ConfigObject.queue() is applicable for argument types: (_nf_config_f554f07e$_run_closure1$_closure10$_closure13) values: [_nf_config_f554f07e$_run_closure1$_closure10$_closure13@69f63d95]
Possible solutions: use([Ljava.lang.Object;), clear(), clone(), size(), values(), dump()
....

Can anybody spot what's wrong with the process.queue directive? I used what is suggested here https://www.nextflow.io/docs/latest/process.html#dynamic-directives

Paolo Di Tommaso
@pditommaso
Nov 13 2018 09:34
config attributes always use = therefore it should be process.queue = { .. }
Tobias Neumann
@t-neumann
Nov 13 2018 09:53
that's one of these moments, where I feel like a total novice again
saved my day again @pditommaso couldn't put my finger there if my life depended on it
Paolo Di Tommaso
@pditommaso
Nov 13 2018 10:08
lol
Tobias Neumann
@t-neumann
Nov 13 2018 10:10

Now a curious problem:

I try to launch a bigger job on 50 samples from s3

nextflow run obenauflab/virus-detection-nf -r TCGAconversion --inputDir s3://obenauflab/test -profile awsbatch -with-report -w s3://obenauflab/work

and it start's submitting but is always killed after 31 jobs:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
N E X T F L O W  ~  version 0.30.2
Launching `obenauflab/virus-detection-nf` [serene_goodall] - revision: 4f38f97609 [TCGAconversion]

 parameters
 ======================
 input directory          : s3://obenauflab/test
 ======================

[warm up] executor > awsbatch
....
[d7/97d7a7] Submitted process > bamToFastq (5f5df391-e2e4-4cde-8ac6-8c1f4cd7e2c0_gdc_realn_rehead)
[dc/90c665] Submitted process > bamToFastq (570d9e4b-eced-42e2-bb69-3d3f73d90649_gdc_realn_rehead)
[9a/8f8cca] Submitted process > bamToFastq (aff16bf9-3f00-48ca-9414-40d689369f0d_gdc_realn_rehead)
[6a/6fcf57] Submitted process > bamToFastq (8d773838-2b0f-488c-b395-1f703d92034e_gdc_realn_rehead)
Killed

No error in the log, it just breaks and the jobs hang around in the batch queue as RUNNABLE

Paolo Di Tommaso
@pditommaso
Nov 13 2018 10:12
check better the log file, something should be reported
Tobias Neumann
@t-neumann
Nov 13 2018 10:13
that's the last lines of .nextflow.log
Nov-13 11:08:27.149 [Task submitter] INFO  nextflow.Session - [dc/90c665] Submitted process > bamToFastq (570d9e4b-eced-42e2-bb69-3d3f73d90649_gdc_realn_rehead)
Nov-13 11:08:28.020 [Task submitter] DEBUG n.executor.AwsBatchTaskHandler - [AWS BATCH] submitted > job=094ddc5b-c45f-4d9d-aa09-58e098c804ca; work-dir=s3://obenauflab/work/9a/8f8cca1741e945be6006cb35ae5377
Nov-13 11:08:28.020 [Task submitter] INFO  nextflow.Session - [9a/8f8cca] Submitted process > bamToFastq (aff16bf9-3f00-48ca-9414-40d689369f0d_gdc_realn_rehead)
Nov-13 11:08:28.653 [Task submitter] DEBUG n.executor.AwsBatchTaskHandler - [AWS BATCH] submitted > job=a520da63-eabb-40d5-9f7b-8fe798911417; work-dir=s3://obenauflab/work/6a/6fcf57e9ec1d6acefec4281a129db8
Nov-13 11:08:28.653 [Task submitter] INFO  nextflow.Session - [6a/6fcf57] Submitted process > bamToFastq (8d773838-2b0f-488c-b395-1f703d92034e_gdc_realn_rehead)
Paolo Di Tommaso
@pditommaso
Nov 13 2018 10:15
Killed
figure out what is killing the NF process
Tobias Neumann
@t-neumann
Nov 13 2018 10:26
I will have a look. but it can't be really cpus/memory or can it?
Riccardo Giannico
@giannicorik_twitter
Nov 13 2018 11:46

Hi guys,
How can a process output a 'set' channel using regexp?

Details:
I know outside of a process I can use this: Channel.fromFilePairs("${params.dir}/*.fastq.gz",size:1) {file -> file.name.split(/_S\d+_L/)[0]} to obtain something like this:

[[sample1, sample1_S1_L001.fastq.gz],
[sample2, sample2_S1_L001.fastq.gz ]]

but how can I do the same inside of a process "output" ?

Riccardo Giannico
@giannicorik_twitter
Nov 13 2018 11:53
If it is not possible I can do it in bash, but then my question will be:
How can a process "output" a value from a variable created inside the bash script?
output: 
   val (myvar) into ch_myvar  # obviously this is not working.... 
"""
myvar="myvalue"
"""
Rad Suchecki
@rsuchecki
Nov 13 2018 12:57
As for your second question @giannicorik_twitter you can't assign a value to a NF var from inside a script that's one-way only. Related to #903. It would be a nice feature but under the hood this would have to go via files....
Alexander Peltzer
@apeltzer
Nov 13 2018 14:39
I know that I can parse csv files directly and use a closure to access their content in NXF directly, but can I compare two independent csv files directly using the same approach?
Paolo Di Tommaso
@pditommaso
Nov 13 2018 16:11
what do you mean compare ?
micans
@micans
Nov 13 2018 16:16
@riccardo this does not answer your question, but I prefer to have a list of sampleid in a file (one per line), an input directory as params argument, and then have the process/script simply read dir/sampleid+somestuff.tar.gz. I find that the logical flow of things, starting from the sample ID, rather than infering a sample ID from a file name. In our pipeline we have a val(sampleid) in almost all processes, they are initially read from a samplefile. I find that a clean way of doing things. My 2 cts, applies to my situation ... and does not answer your question.
Jose Espinosa-Carrasco
@JoseEspinosa
Nov 13 2018 16:24
@giannicorik_twitter you can try this:
output: 
   stdout into ch_myvar 
"""
myvar="myvalue"
echo \$myvar
"""
Oriol Guitart
@oguitart
Nov 13 2018 16:31
Hi,
I would like to know how you could synchronize several processes without the need of files. I mean, for instance, I have a process that fills data into a DB and I want another process to wait for the DB process to finish. I guess I need to use the input and output but I don't see how.
Thanks,
micans
@micans
Nov 13 2018 16:33
Oh, I directed it at the wrong person too, sorry @riccardo my message was not for you. Please excuse my feeble contribution.
Nice one, @JoseEspinosa
Oriol Guitart
@oguitart
Nov 13 2018 16:38
Perfect, thanks!
Paolo Di Tommaso
@pditommaso
Nov 13 2018 16:38
:+1:
Riccardo Giannico
@giannicorik_twitter
Nov 13 2018 16:52
@micans don't worry :) thanks for the suggestions anyway :D
@JoseEspinosa that's brilliant! Thank you! :D
Jose Espinosa-Carrasco
@JoseEspinosa
Nov 13 2018 17:01
You are welcome :thumbsup: