These are chat archives for nextflow-io/nextflow

1st
Nov 2018
Maxime HEBRARD
@mhebrard
Nov 01 2018 02:20
hello
Maxime HEBRARD
@mhebrard
Nov 01 2018 02:30
How can I control my flow and threads to achieve sth like: from multiple files, do pre-process in parallel, do Mapping + sorting in sequential, do post-process in parallel... I have limited space and cpus. I found I can run Mapping in sequencial by allocate cpus to the process with the directive "cpus" but how can I force to run sorting (and maybe other process) after Mapping, still in sequencial, before the mapping of my other files run ?
fileA --> preProc --> Map --> Sort -----------------------> postProc-->
fileB --> preProc -----------------------> Map --> Sort --> postProc -->
Maxime HEBRARD
@mhebrard
Nov 01 2018 04:00
Can I use input variable to generate publishDir path ?
process proc {
    publishDir path: "${params.outdir}proc/${mySet[0].name}/", mode: 'copy'
    input: val mySet from chPrev
    output: stdout
    """
    echo ${mySet.name}
    """
}
// ERROR: no such variable mySet
Rad Suchecki
@rsuchecki
Nov 01 2018 04:55
Re second question: Probably not - see #894 @mhebrard
Maxime HEBRARD
@mhebrard
Nov 01 2018 06:15
@rsuchecki actually using saveAs solve my problem
process proc {
    publishDir path: "${params.outdir}proc/", saveAs: { filename -> "${mySet[0].name/$filename" }
    input: val mySet from chPrev
    output: stdout
    """
    echo ${mySet.name}
    """
}
Maxime HEBRARD
@mhebrard
Nov 01 2018 07:14
hmm trying to understand the process directive scratch ... where does my files go ? does the temp folder is deleted automatically, or does it remain somewhere ?
Maxime HEBRARD
@mhebrard
Nov 01 2018 07:28

damn ! I have problem with substitution :

when:
      file("${params.directory}/${mySet[0].name}/*.txt")

// ERROR Cannot get property 'directory' on null object

How come $params is null !!

Winni Kretzschmar
@winni2k
Nov 01 2018 09:37

@winni2k My problem is that I have a project where the start files are already close to 700M

@karinlag I'm not sure I follow exactly, but to get back to your original question about the evilness of hard links: I think what most people find evil (i.e. dangerous) about hard links is that deleting or changing a hard-linked file will also change other files that the user does not know about. However, hard links are also put to great use in for example sequential backups a la Time Machine. I think if you restrict yourself to creating hard links to only files in the work directory, then you are likely to minimize confusion.

Maxime HEBRARD
@mhebrard
Nov 01 2018 09:43
I figured out how to use scratch directive... quite useful to save space !
so at run time, a folder is created in the tmp directory and all the files are writen there... then when the process finish, the files captured by the output directive are copyied in the work directory then the temp folder is automaticaly deleted :)
Maxime Garcia
@MaxUlysse
Nov 01 2018 09:46
Very useful indeed
Maxime HEBRARD
@mhebrard
Nov 01 2018 09:48
the info I was missing by reading the doc was to be sure that "the temp folder created by NF is automatically deleted at the end of the process"
I didn't want to let a huge folder in some random place on my machine ^^"
Maxime Garcia
@MaxUlysse
Nov 01 2018 10:40
@aunderwo Hi, I noticed some typos on your blog post about AWS, so I'm making PRs ;-)
(just so you know that it's me)
micans
@micans
Nov 01 2018 13:30
I've noticed this syntax: file (fastqc:'fastqc/*') from ch_multiqc_fastqc.collect().ifEmpty([]) in an nf-core pipeline; what does the fastqc: part do there? @pditommaso
Maxime Garcia
@MaxUlysse
Nov 01 2018 15:14
Hi @pditommaso we're trying AWSbatch with @alneberg and we have some issues when pointing to an s3 bucket for the results
Does it has to be an empty bucket?
we were using the same one for the work and for the results, something like: s3://sarek/work and s3://sarek/results, and for each different run it's failing untill we specify a different bucket for the results, like s3//sarek/results2
I'll try with specifying s3://sarek only for results and create another bucket fo the work
Maxime Garcia
@MaxUlysse
Nov 01 2018 15:19
Thank you all for being the most fantastic rubber duck channel ever
Maxime Garcia
@MaxUlysse
Nov 01 2018 15:38
So definitively, when I specify a non empty s3 bucket for a result directory I get the following error:
ERROR ~ Unexpected error [UnsupportedOperationException]
Maxime Garcia
@MaxUlysse
Nov 01 2018 15:44
Does anyone testing aws had the same issue?
Paolo Di Tommaso
@pditommaso
Nov 01 2018 16:28
it should work, open an issue with an example and error stack trace
@micans it's an undocumented syntax that combines the semantics for file (fastqc) AND file('fastqc/*')
micans
@micans
Nov 01 2018 16:29
Thanks! :+1:
Paolo Di Tommaso
@pditommaso
Nov 01 2018 16:30
:+1:
micans
@micans
Nov 01 2018 18:03
Is it possible to output a value defined in the env scope (or more generally any config scope)?
micans
@micans
Nov 01 2018 18:12
I'd like to log.info it