These are chat archives for nextflow-io/nextflow

16th
Jul 2018
Len Trigg
@Lenbok
Jul 16 2018 06:17
How do we make nextflow delete work directories once the pipeline has completed OK? I find a reference at #649 but is this supposed to be automatic or do we need to invoke it in our pipeline onComplete function?
Tobias Neumann
@t-neumann
Jul 16 2018 08:38
hi @pditommaso . so over the weekend suddenly all my nextflow processes stopped working (tried two independent workflows). nothing was updated or changed from my side, but at the moment none of the files from the process inputs are inked into the working directories any more I discovered. any ideas where that is coming from?
tbugfinder
@tbugfinder
Jul 16 2018 09:06
@t-neumann Probably you should add some logs here - otherwise it is a broad guess.
Tobias Neumann
@t-neumann
Jul 16 2018 09:26
well the logs are looking fine - that's why my guess is so broad unfortunately
.nextflow.log
Jul-16 11:25:30.352 [main] DEBUG nextflow.cli.Launcher - $> /users/tobias.neumann/.local/bin/nextflow run obenauflab/virus-detection-nf --inputDir test -profile slurm -resume
Jul-16 11:25:30.424 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.30.2
Jul-16 11:25:30.786 [main] DEBUG nextflow.scm.AssetManager - Git config: /groups/zuber/zubarchive/USERS/tobias/.nextflow/assets/obenauflab/virus-detection-nf/.git/config; branch: master; remote: origin; url: https://github.com/ObenaufLab/virus-detection-nf.git
Jul-16 11:25:30.844 [main] DEBUG nextflow.scm.AssetManager - Git config: /groups/zuber/zubarchive/USERS/tobias/.nextflow/assets/obenauflab/virus-detection-nf/.git/config; branch: master; remote: origin; url: https://github.com/ObenaufLab/virus-detection-nf.git
Jul-16 11:25:31.054 [main] DEBUG nextflow.scm.AssetManager - Git config: /groups/zuber/zubarchive/USERS/tobias/.nextflow/assets/obenauflab/virus-detection-nf/.git/config; branch: master; remote: origin; url: https://github.com/ObenaufLab/virus-detection-nf.git
Jul-16 11:25:31.054 [main] INFO  nextflow.cli.CmdRun - Launching `obenauflab/virus-detection-nf` [compassionate_koch] - revision: 359ecfbd87 [master]
Jul-16 11:25:31.773 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /groups/zuber/zubarchive/USERS/tobias/.nextflow/assets/obenauflab/virus-detection-nf/nextflow.config
Jul-16 11:25:31.774 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /groups/zuber/zubarchive/USERS/tobias/.nextflow/assets/obenauflab/virus-detection-nf/nextflow.config
Jul-16 11:25:31.780 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `slurm`
Jul-16 11:25:31.843 [main] DEBUG nextflow.config.ConfigBuilder - Available config profiles: [standard, sge, slurm]
Jul-16 11:25:31.849 [main] WARN  nextflow.config.ConfigBuilder - It seems you never run this project before -- Option `-resume` is ignored
Jul-16 11:25:31.866 [main] DEBUG nextflow.Session - Session uuid: 02173979-c0f6-4ec5-9f2d-38212de9be52
Jul-16 11:25:31.867 [main] DEBUG nextflow.Session - Run name: compassionate_koch
Jul-16 11:25:31.868 [main] DEBUG nextflow.Session - Executor pool size: 56
Jul-16 11:25:31.878 [main] DEBUG nextflow.cli.CmdRun -
  Version: 0.30.2 build 4867
  Modified: 16-06-2018 17:49 UTC (19:49 CEST)
  System: Linux 3.10.0-693.17.1.el7.x86_64
  Runtime: Groovy 2.4.15 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13
  Encoding: UTF-8 (UTF-8)
  Process: 23439@login-02 [172.16.61.3]
  CPUs: 56 - Mem: 125.9 GB (1.1 GB) - Swap: 0 (0)
Jul-16 11:25:31.899 [main] DEBUG nextflow.Session - Work-dir: /scratch-ii2/users/tobias.neumann [fhgfs]
Jul-16 11:25:32.017 [main] DEBUG nextflow.Session - Session start invoked
Jul-16 11:25:32.020 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Jul-16 11:25:32.020 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Jul-16 11:25:32.153 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jul-16 11:25:32.155 [main] INFO  nextflow.Nextflow -
Jul-16 11:25:32.155 [main] INFO  nextflow.Nextflow -  parameters
Jul-16 11:25:32.155 [main] INFO  nextflow.Nextflow -  ======================
Jul-16 11:25:32.156 [main] INFO  nextflow.Nextflow -  input directory          : test
Jul-16 11:25:32.156 [main] INFO  nextflow.Nextflow -  Centrifuge index         : /groups/obenauf/Software/indices/virus/Centrifuge/centrifuge/p+h+v
Jul-16 11:25:32.156 [main] INFO  nextflow.Nextflow -  Salmon index             : /groups/obenauf/Software/indices/hg38/salmon/gencode.v28.salmon
Jul-16 11:25:32.156 [main] INFO  nextflow.Nextflow -  bwa index                : /groups/obenauf/Software/indices/virus/Centrifuge/bwa/p+h+v_unique.fa
Jul-16 11:25:32.156 [main] INFO  nextflow.Nextflow -  ======================
Jul-16 11:25:32.156 [main] INFO  nextflow.Nextflow -
Jul-16 11:25:32.165 [main] DEBUG nextflow.Channel - files for syntax: glob; folder: test/; pattern: *.bam; options: null
Jul-16 11:25:32.268 [main] DEBUG nextflow.util.CacheHelper - Config settings
it's a bit overcrowded, but that's the last lines. all looking normal
Jul-16 11:25:32.403 [Actor Thread 6] DEBUG nextflow.container.SingularityCache - Singularity found local store for image=docker://obenauflab/virusintegration:latest; path=/groups/zuber/zubarchive/USERS/tobias/.singularity/obenauflab-virusintegration-latest.img
Jul-16 11:25:32.541 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process bamToFastq (CCKTHANXX_4#66882_ACAGAT) > jobId: 502805; workDir: /scratch-ii2/users/tobias.neumann/84/2d3e877822fb7e414c3be7bdab835d
Jul-16 11:25:32.544 [Task submitter] INFO  nextflow.Session - [84/2d3e87] Submitted process > bamToFastq (CCKTHANXX_4#66882_ACAGAT)
Jul-16 11:25:32.579 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process bamToFastq (CAFW2ANXX_6#RNAseq-MCC-25-1-GCCACA) > jobId: 502806; workDir: /scratch-ii2/users/tobias.neumann/c5/cb9f3419acfb37c1c1389d68ed36b7
Jul-16 11:25:32.579 [Task submitter] INFO  nextflow.Session - [c5/cb9f34] Submitted process > bamToFastq (CAFW2ANXX_6#RNAseq-MCC-25-1-GCCACA)
Jul-16 11:25:32.606 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process bamToFastq (BSF_0449_HTJ32BBXX_4) > jobId: 502807; workDir: /scratch-ii2/users/tobias.neumann/f7/997b43d39b49a4b86687640653ce40
Jul-16 11:25:32.606 [Task submitter] INFO  nextflow.Session - [f7/997b43] Submitted process > bamToFastq (BSF_0449_HTJ32BBXX_4)
Paolo Di Tommaso
@pditommaso
Jul 16 2018 10:07
@Lenbok NF does not delete work directories, simplest solution delete it with a small small bash wrapper
@t-neumann not sure to understand, what does it mean they stopped working ?
sendivogius
@sendivogius
Jul 16 2018 10:34
Hi, I try to run my pipeline on k8s and I need to specify ImagePullSecrets to access my private registry. How can I do this?
Paolo Di Tommaso
@pditommaso
Jul 16 2018 10:40
Not supported at this time, please open an issue for that
sendivogius
@sendivogius
Jul 16 2018 10:53
Thank, I will create one.
Tobias Neumann
@t-neumann
Jul 16 2018 11:16
@pditommaso so they would run as normal, but not link/copy the source files from the process input to the working directory and then run forever, not failing
basically that's what all my work dirs look like
tobias.neumann@login-02 [BIO] .../virusIntegration/nf $ ls -lah /scratch-ii2/users/tobias.neumann/34/2f960358fe8d0cf96b0e3ac989f08a
total 8.0K
drwxr-xr-x 2 tobias.neumann pavri.grp    3 Jul 16 13:18 .
drwxr-xr-x 7 tobias.neumann pavri.grp    5 Jul 16 13:18 ..
-rw-r--r-- 1 tobias.neumann pavri.grp 2.8K Jul 16 13:18 .command.run
-rw-r--r-- 1 tobias.neumann pavri.grp  330 Jul 16 13:18 .command.sh
-rw-r--r-- 1 tobias.neumann pavri.grp 3.5K Jul 16 13:18 .command.stub
Paolo Di Tommaso
@pditommaso
Jul 16 2018 12:38
are these directories accessible from the computing nodes in your cluster ?
Luca Cozzuto
@lucacozzuto
Jul 16 2018 12:40
dear all, I would like to parse a file that is an output of a script and to send some values to some nextflow variables (no channels)...
Luca Cozzuto
@lucacozzuto
Jul 16 2018 12:45
I am wondering how to do it...
Paolo Di Tommaso
@pditommaso
Jul 16 2018 12:48
x = yourParseFunction(file('foo'))
or for multiple values
(x,y,z) = yourParseFunction(file('foo'))
Luca Cozzuto
@lucacozzuto
Jul 16 2018 12:50
is this inside the channel?
Paolo Di Tommaso
@pditommaso
Jul 16 2018 12:51
forget channel
Luca Cozzuto
@lucacozzuto
Jul 16 2018 12:52
because of you my pipelines have more channels than Venice... and you tell me "forget channel"?
my file is in a channel
Paolo Di Tommaso
@pditommaso
Jul 16 2018 12:59
and the problem is ?
Luca Cozzuto
@lucacozzuto
Jul 16 2018 13:00
a = effective_stat.flatten()

a.eachLine {  str ->
        println "line ${count++}: $str"
    }
ERROR ~ No signature of method: groovyx.gpars.dataflow.DataflowQueue.eachLine() is applicable for argument types: (_nf_script_b357fe68$_run_closure13) values: [_nf_script_b357fe68$_run_closure13@d816dde]
Possible solutions: chain(groovy.lang.Closure), each(groovy.lang.Closure)
tbugfinder
@tbugfinder
Jul 16 2018 13:02
I would like to zcat several input files however the number of objects in the input channel is very very large (>50000). This means there are too many objects for zcat to process (or maybe it's the length of all arguments). Could zcat run through the list of input files incrementally maybe after every 1000 objects?
zcat -c $myinputfiles >> output.gz
Luca Cozzuto
@lucacozzuto
Jul 16 2018 13:03
@tbugfinder I suggest you to use a for loop
micans
@micans
Jul 16 2018 13:27
@pditommaso I have added more information to feature request, but will also attach a diagram.
Tobias Neumann
@t-neumann
Jul 16 2018 13:34
@pditommaso yes they are accessible from the cluster
Paolo Di Tommaso
@pditommaso
Jul 16 2018 13:37
I have no idea, you may want to provide the .nextflow.log file
Tobias Neumann
@t-neumann
Jul 16 2018 13:47
Paolo Di Tommaso
@pditommaso
Jul 16 2018 13:53
does a simple nextflow run hello -process.executor slurm works ?
Tobias Neumann
@t-neumann
Jul 16 2018 13:55
Looking good
tobias.neumann@login-02 [BIO] .../virusIntegration/nf $ nextflow run hello -process.executor slurm
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
N E X T F L O W  ~  version 0.30.2
Pulling nextflow-io/hello ...
 downloaded from https://github.com/nextflow-io/hello.git
Launching `nextflow-io/hello` [sleepy_liskov] - revision: c9b0ec7286 [master]
[warm up] executor > slurm
[11/9e7f9d] Submitted process > sayHello (1)
[60/f4103e] Submitted process > sayHello (2)
[ee/329852] Submitted process > sayHello (4)
[2a/42193e] Submitted process > sayHello (3)
Bonjour world!
Ciao world!
Hola world!
Hello world!
Paolo Di Tommaso
@pditommaso
Jul 16 2018 13:56
hence there's something not working with that jobs
try to change in one task work dir and execute it using sbatch .command.run
Tobias Neumann
@t-neumann
Jul 16 2018 14:03
hm..... now indeed it works
so basically somehow nextflow does not call sbatch in those task dirs
Paolo Di Tommaso
@pditommaso
Jul 16 2018 14:06
not sure that's the problem, NF just launch that script
Jul-16 13:18:41.102 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process bamToFastq (CCKTHANXX_4#66882_ACAGAT) > jobId: 503016; workDir: /scratch-ii2/users/tobias.neumann/62/e73d0d5692d83dc29adbe71e06c5d3
do you have an slurm accounting command to see what happened to job 503016 ?
I suspect some config have changed in your cluster, you may need to interact with your sysadmins
Tobias Neumann
@t-neumann
Jul 16 2018 14:10
I will check. I hope that's the issue, then there's someone to blame :)
Paolo Di Tommaso
@pditommaso
Jul 16 2018 14:10
it looks to me that jobs are failing silently
Tobias Neumann
@t-neumann
Jul 16 2018 14:12
thanks - I will talk to IT and will let you know
mimi
@mimi28750642_twitter
Jul 16 2018 14:52
can you call another nextflow script within a nextflow script?

params.workflow = "test1"
params.result_path = '/home/test_data'

input_fastqs = params.result_path + '/' + '*ngsservice.R{1,2}.fastq.gz'
Channel
.fromFilePairs( input_fastqs )
.ifEmpty { error "Cannot find any reads matching: ${input_fastqs}" }
.set { read_pairs }

process call_another_nextflow_script {

input:
set pair_id, file(reads) from read_pairs
val(workflow) from params.workflow

script:
if( workflow == 'test1' )
"""
nextflow run test1.nf
"""

else if( mode == 'test2' )
"""
nextflow run test1.nf
"""

else
error "error}"

}

Tobias Neumann
@t-neumann
Jul 16 2018 15:37
@pditommaso ok I finally found the issue. I ran the nextflow command in a screen that was already quite old. probably something changed in between - and this is why it would still work in a plain shell, but not in the screen. stupid
btw is there a way to selectively say which files go to the publishDir and which ones not? Because right now the output of any process goes to the globally set publishDir, but actually I would like only the files from a couple of processes there and have bigger intermediate files not moved there in the first place
mimi
@mimi28750642_twitter
Jul 16 2018 15:56

@mimi28750642_twitter
can you call another nextflow script within a nextflow script?
params.workflow = "test1"
params.result_path = '/home/test_data'

input_fastqs = params.result_path + '/' + '*ngsservice.R{1,2}.fastq.gz'
Channel
.fromFilePairs( input_fastqs )
.ifEmpty { error "Cannot find any reads matching: ${input_fastqs}" }
.set { read_pairs }

process call_another_nextflow_script {

input:
set pair_id, file(reads) from read_pairs
val(workflow) from params.workflow

script:
if( workflow == 'test1' )
"""
nextflow run test1.nf
"""

else if( mode == 'test2' )
"""
nextflow run test2.nf
"""

else
error "error}"

}

Anthony Underwood
@aunderwo
Jul 16 2018 16:03
Hi all. Can anybody recommend the most idiomatic way to parse the output of a channel and set a variable that will be used as a parameter for a subsequent channel
Paolo Di Tommaso
@pditommaso
Jul 16 2018 17:00
parse the output file, do you mean ?
Anthony Underwood
@aunderwo
Jul 16 2018 20:45
@pditommaso - yes a command I run outputs something to a file or stdout/stderr. I need to parse that output and pull out a value that is fed into a downstream channel. An example is the output of a program 'mash sketch' that can be used to estimate bacterial genome size from raw reads
Anthony Underwood
@aunderwo
Jul 16 2018 22:17

The way I am currently doing it is
within channel1:
output:
file('cmd.out') into read_correction_output

script:
"""
some_command 2> cmd.out # capture stderr
"""

channel2
input:
file('lighter.out') into read_correction_output

script:
"""
average_coverage=\$(cat ${lighter_ouput} | sed -En 's/.+Average coverage is ([0-9]+\.[0-9]+)\s.+/\1/p')
echo \${average_coverage}
"""

wondering if there is a more idiomatic way in nextflow/groovy