These are chat archives for nextflow-io/nextflow

7th
Dec 2018
Karin Lagesen
@karinlag
Dec 07 2018 09:39
I have a question here: on our cluster, we have two disk areas, /projects and /work
/work consists of faster disks and have a better connection to the compute nodes (slurm setup btw)
thus for jobs where the data stays on the drives, they prefer us to keep that data there
however, we also have an option to put things onto a $SCRATCH disk, which is local storage on the nodes.
if we could use that somehow with nextflow, that would allow us to have the data on /projects instead to start with
however, not sure how all of this is put together under the hood, hence my question here
Raoul J.P. Bonnal
@helios
Dec 07 2018 10:18

I am experiencing this problem with PBS (NotPro)

$ pbs-config --version
4.2.5
N E X T F L O W  ~  version 18.10.1
Launching `main.nf` [distracted_coulomb] - revision: 1d8a541966
[warm up] executor > pbs
[6d/71054f] Submitted process > velocyto (1)
WARN: [PBS] queue (myq) status cannot be fetched > exit status: 2

dic-06 08:52:40.699 [Task monitor] WARN  n.executor.AbstractGridExecutor - [PBS] queue (myq) status cannot be fetched > exit status: 2

I saw a similar issue about lsf.
Possible solution from @pditommaso
you may want to give a try to latest edge release

NXF_VER=18.11.0-edge nextflow run .. etc

Fixed the problem

Rad Suchecki
@rsuchecki
Dec 07 2018 10:59
You should probably consider speed, space and whether a given drive is backed up @karinlag . Assuming /work is not backed up and /project is, I'd run NF from /project with -work-dir /work/some-sub-dir and for specific processes where it can make a difference go for scratch '$SCRATCH' directive
Pierre Lindenbaum
@lindenb
Dec 07 2018 11:14
@pditommaso FYI. ( Old thread: https://gitter.im/nextflow-io/nextflow?at=5bdc57066ab3f85bddf26bb2 ) I'm back to this : writing a new Executor and using the fat nextflow-18.12.0-SNAPSHOT-all.jar . I fixed my problem (no.nextflow/history written: fixed by just setting the Launcher.cliString string to non-null (I wrote a short PR for this : nextflow-io/nextflow#965 ). For now my fork seems to run fine with our custom job scheduler. Cool.
Paolo Di Tommaso
@pditommaso
Dec 07 2018 11:31
@lindenb cool, if you want to contribute your executor happy to review your PR
Pierre Lindenbaum
@lindenb
Dec 07 2018 12:00
@pditommaso I'm afraid it's too localized (?) they use https://github.com/cea-hpc/bridge
@pditommaso I'll submit a PR later, but if you feel it's too localized, feel free to reject/close it.
If it's not just a cluster in your workstation it's welcome :smile:
Karin Lagesen
@karinlag
Dec 07 2018 12:21
@pditommaso where can I find more info on -work-dir? Could not find it as an option to nextflow (command line), nor in the documentation?
Pierre Lindenbaum
@lindenb
Dec 07 2018 12:46
@karinlag nextflow run -h :
    -w, -work-dir
       Directory where intermediate result files are stored
Karin Lagesen
@karinlag
Dec 07 2018 12:46
doh
sorry :)
Alexander Peltzer
@apeltzer
Dec 07 2018 13:20
There is also new cli-docs coming
Thomas Zichner
@zichner
Dec 07 2018 13:29
Hi Paolo and others.
Using the error strategy retry jobs get re-submitted in case they crash.
I made to observations regarding this: It seems that these re-tried jobs do not "listen" to the maxForks directive. Furthermore, it seems that only 8 re-submitted jobs can run in parallel.
Is this the case?
Ideally, there should be no difference between "normal" and "re-submitted" jobs with regard to their parallelization (in both cases up to maxForks processes should run in parallel).
Karin Lagesen
@karinlag
Dec 07 2018 13:43
ok, so I am using this one to help figure out how tings work re disks
I am noticing it is pointing to the directory above work, and not the actual task directory itself (i.e. work/xx/hash
any way to give the actual process directory?
Thomas Zichner
@zichner
Dec 07 2018 15:09
FYI: I submitted an issue for the problem I described; see #966
PhilPalmer
@PhilPalmer
Dec 07 2018 17:28
Hey, can someone please help me understand how to repeat a process for each file. I thought this was the default behaviour of Nextflow if you specify input files using a glob pattern however this does not seem to be working for me
I have this but even though there are two channels/fastqs for the read files bwa only executes for the first one. I'm sure this is pretty basic. I think its caused by the other inputs but am not sure how to solve it. Thanks
params.reads = "testdata/*.fq"
Channel.fromFilePairs( params.reads, size: 1 )

process BWA {
    input:
    file fasta from fasta_bwa
    file bwa_index_amb from bwa_index_amb
    set val(name), file(reads) from reads_bwa

    output:
    set val(name), file("${name}.sam") into sam

    """
    bwa mem -M -R '@RG\\tID:${name}\\tSM:${name}\\tPL:Illumina' $fasta $reads > ${name}.sam
    """
}
Karin Lagesen
@karinlag
Dec 07 2018 17:53
Others should correct me if I'm wrong, but from what I can see, you haven't actually named your input channel
I have
Channel
    .fromFilePairs( params.reads, size:params.setsize )
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
    .into{fastqc_reads; read_pairs}
PhilPalmer
@PhilPalmer
Dec 07 2018 17:55
Oh sorry, you're quite right. When I copied it I removed some stuff for brevity including .into { reads_samplename; reads_bwa }
Karin Lagesen
@karinlag
Dec 07 2018 17:56
It's easy to become blind to your own code
Stephen Kelly
@stevekm
Dec 07 2018 18:16
I am trying to figure out how to export details about each process and task executed in Nextflow
Paolo Di Tommaso
@pditommaso
Dec 07 2018 18:20
old school tab separated file produced by -with-trace
Stephen Kelly
@stevekm
Dec 07 2018 18:22

Like if I have this process:

process some_task {
    tag "${x}"
    publishDir "output", mode: 'move', overwrite: true

    input:
    val(x) from Channel.from([1,2])

    output:
    file("${output_dir}/*")

    script:
    output_dir = "${x}"
    """
    mkdir "${output_dir}"
    touch "${output_dir}/foo.txt"
    """
}

I want to be able to access all the fully evaluated aspects of the task execution object, so that I might be able to store a record of some of it, somewhere, for usage later after the workflow has ended. Like, I would want to record that process 'some_task' executed for '1', and output file/dir '1/foo.txt' and it was published in 'output/'

I want to be able to access all of this stuff, much more than what is in the 'trace' file
like export the entire 'task' or 'process' object as a JSON or something
or just store all these details in a database or something
related to my question posted here: https://groups.google.com/forum/#!topic/nextflow/ZC-FIpj-SkY ; if I could get access to these kinds of details, then maybe I could build a wrapper that my other programs can use to look at the Nextflow output and do something like "find foo data from some_task for sample 1 in Nextflow output" and then this could be used to look up the relevant task and map back to the output file in question
Paolo Di Tommaso
@pditommaso
Dec 07 2018 18:30
there's an on going discussion here nextflow-io/nextflow#903
Stephen Kelly
@stevekm
Dec 07 2018 18:32
oh that looks like exactly it thanks
I could probably hack my own if I was able to access these attributes from within an executing task:
Paolo Di Tommaso
@pditommaso
Dec 07 2018 18:33
at your risk
Stephen Kelly
@stevekm
Dec 07 2018 18:35

like

process some_task {
    tag "${x}"
    publishDir "output", mode: 'move', overwrite: true

    input:
    val(x) from Channel.from([1,2])

    output:
    file("${output_dir}/*")

    script:
    output_dir = "${x}"
    """
    mkdir "${output_dir}"
    touch "${output_dir}/foo.txt"

    <send_to_SQL_db> publishDir="this.publishDir" outputFiles="this.outputFiles" taskName="this.name"
    """
}

is that like a thing you could do? not advisable?

Stephen Kelly
@stevekm
Dec 07 2018 19:00
yeah looks like you cannot access the task's own attributes with that syntax:
params.outputDir = "output"
params.foo = 'bar'
process some_task {
    tag "${x}"
    echo true
    publishDir "${params.outputDir}", mode: 'move', overwrite: true

    input:
    val(x) from Channel.from(1)

    output:
    file("${output_dir}/*")

    log.info "${this}"
    log.info "${this.params}"
    log.info "${this.params.foo}"

    script:
    output_dir = "${x}"
    """
    mkdir -p "${output_dir}"
    touch "${output_dir}/foo.txt"

    echo "this: ${this}"
    echo "this.params: ${this.params}"
    echo "this.params.foo: ${this.params.foo}"
    """
}
$ ./nextflow run main.nf
N E X T F L O W  ~  version 18.10.1
Launching `main.nf` [crazy_poitras] - revision: f3253ed9d7
_nf_script_1a0e9a36@5f4d427e
[outputDir:output, output-dir:output, inputDir:input, input-dir:input, foo:bar]
bar
[warm up] executor > local
[65/b4975e] Submitted process > some_task (1)
this: _nf_script_1a0e9a36@5f4d427e
this.params: [outputDir:output, output-dir:output, inputDir:input, input-dir:input, foo:bar]
this.params.foo: bar