These are chat archives for nextflow-io/nextflow

28th
Sep 2018
Evan Floden
@evanfloden
Sep 28 2018 07:26
@tobsecret I assume you need some identifier to know which fastq is from which ENA ID
In this case you could contruct the channels as follows:
Channel
    .from (['ENA_ID1',['id1_1.fq', 'id1_2.fq']], 
           ['ENA_ID2',['id2_1.fq', 'id2_2.fq', 'id3_1.fq', 'id3_2.fq']])
.set { files_produced_by_download_process }

// Split them
files_produced_by_download_process
     .map{ it-> [it[0], it[1].collate(2)] }
     .transpose()
     .set {paired_files}

// Your process here

// Bring them together again
paired_files
     .groupTuple()
     .view()

The first operation returns:

[ENA_ID1, [id1_1.fq, id1_2.fq]]
[ENA_ID2, [id2_1.fq, id2_2.fq]]
[ENA_ID2, [id3_1.fq, id3_2.fq]]

and the second:

[ENA_ID1, [[id1_1.fq, id1_2.fq]]]
[ENA_ID2, [[id2_1.fq, id2_2.fq], [id3_1.fq, id3_2.fq]]]
Luca Cozzuto
@lucacozzuto
Sep 28 2018 08:15
@pditommaso ouch... I just hit the groupTuple problem.. :)
Luca Cozzuto
@lucacozzuto
Sep 28 2018 08:48
morning all
are there some examples of onError related to a single process?
Luca Cozzuto
@lucacozzuto
Sep 28 2018 08:56
and another quick question... is there a way to clean the folder structure when I have a running background execution of Nextflow (i.e. with a watchPath)?
to avoid the creation of a million of folders...
Paolo Di Tommaso
@pditommaso
Sep 28 2018 09:42
yes, rm -rf work when it's finished
Luca Cozzuto
@lucacozzuto
Sep 28 2018 09:42
nope
it won't finish
the idea is to have a running execution waiting for the arrival of files
but with time we would like to remove the folders that are not needed...
how to do it?
Paolo Di Tommaso
@pditommaso
Sep 28 2018 09:49
Luca Cozzuto
@lucacozzuto
Sep 28 2018 09:53
mmmm
well it will be a wonderful feature :)
knowing the DAG once a process finished the latest node if marked as "cache no" it could be implemented a deletion back to the origin...
Luca Cozzuto
@lucacozzuto
Sep 28 2018 13:15
I was thinking also to use the tagging for removing the folders. At the end it could trigger a final process of cleaning that removes back everything that is tagged (a sort of index)
Paolo Di Tommaso
@pditommaso
Sep 28 2018 13:16
what tagging ?
Luca Cozzuto
@lucacozzuto
Sep 28 2018 13:16
 process sendToDB {
    tag { sample_id }...
}
Paolo Di Tommaso
@pditommaso
Sep 28 2018 13:17
what guarantee a downstream task does not need that data?
Luca Cozzuto
@lucacozzuto
Sep 28 2018 13:18
how nextflow knows when to trigger a given process? this cleaning process should be triggered after every process is triggered (with that tag)
of course if you make a mistake in the code and "clean" something that is then needed by another process (with another tag) then you have an error
it should be "responsability" of the user where to place the cleaning proces...
Bioninbo
@Bioninbo
Sep 28 2018 13:33
Hello everybody,
In the .command.sh file, the '\t' are replaced by tabs (i.e. " ").
However, this crashes when I use this to read files in R, so I need to change it manually. Is it possible to prevent this replacement and to keep '\t' ?
Paolo Di Tommaso
@pditommaso
Sep 28 2018 13:34
use \\t in your command or even better save the R command into a file
Bioninbo
@Bioninbo
Sep 28 2018 14:15
I see thanks @pditommaso!
Tobias "Tobi" Schraink
@tobsecret
Sep 28 2018 14:36
@evanfloden Thanks a bunch, that makes lots of sense and I would have taken ages to get that! I'll try and implement it like that.
Tobias "Tobi" Schraink
@tobsecret
Sep 28 2018 22:53
I have a question regarding error-handling. I see the beforeScript and afterScript directives but is there also something like an uponFailure directive? I want to print out to the stdout of the console where nextflow is run, when a particular process fails. In my case this is a process that downloads from a site that does not provide permanent links, so when someone in the future executes my workflow they might find that it doesn't work.
Ashley S Doane
@DoaneAS
Sep 28 2018 23:48
Hi, when using slurm as executor, is it possible to request memory per cpu?
my process executor settings are like this:
process bt2 {
    tag "$Sample"
        publishDir "$results_path/$Sample/$Sample", mode: 'copy'
        //conda 'bowtie2 samtools'

        cpus 4
        executor 'slurm'
        memory '4 GB'
        time '18h'
        scratch true

However, nextflow submission script for this process is:

#!/bin/bash
#SBATCH -D /athena/elementolab/scratch/asd2007/projectshg38/test/work/70/66ad2068aa4523708db2c2dc6aae3a
#SBATCH -J nf-bt2_(Sample_oct2_2_s2test)
#SBATCH -o /athena/elementolab/scratch/asd2007/projectshg38/test/work/70/66ad2068aa4523708db2c2dc6aae3a/.command.log
#SBATCH --no-requeue
#SBATCH -c 4
#SBATCH -t 18:00:00
#SBATCH --mem 4096

For slurm, --mem is for setting per node memory, and --mem-per-cpu is needed for what I want.