These are chat archives for nextflow-io/nextflow

4th
Aug 2016
Mokok
@Mokok
Aug 04 2016 11:54
i gave a new look to my script, it works fine and i get everything i want thanks to your suggestions ;)
  1 #!/user/bin/env nextflow
  2
  3 process exitAStatus {
  4         echo true
  5
  6         output:
  7         stdout into status_chan
  8         file 'outF.txt' into out_chan
  9
 10         script:
 11         """
 12         set +e
 13         (set -e; echo "toto"; exit 42; echo "this won't be executed due to exit")> outF.txt
 14         exitVal=\$?
 15         echo "exit{\$exitVal}"
 16         """
 17 }
Mokok
@Mokok
Aug 04 2016 13:40
Is there a way to launch Nextflow only once ?(like a deamon) This, to run/submit jobs on demand without running Nextflow for each one.
If not what's the minimal needs in terms of memory for one single Nextflow instance ? (without considering any running jobs, only nextflow)
Paolo Di Tommaso
@pditommaso
Aug 04 2016 13:44
well, if u launch it once, it will run only one time :)
what do u mean exactly ?
Mokok
@Mokok
Aug 04 2016 13:45
^^
i mean every time i want to run a script i do:
./nextflow run ./path/to/the/script.nf [some options]
but i guess it launches one Nextflow instance to handle the scrip run, uh ?
Paolo Di Tommaso
@pditommaso
Aug 04 2016 13:48
of course, each run it's a separate JVM instance
Mokok
@Mokok
Aug 04 2016 13:48
is there a way to run Nextflow once like a daemon
Paolo Di Tommaso
@pditommaso
Aug 04 2016 13:49
nope
do you have memory constraints ?
Mokok
@Mokok
Aug 04 2016 13:50
i may
Paolo Di Tommaso
@pditommaso
Aug 04 2016 13:51
how many instances are you planning to launch .. ?
Mokok
@Mokok
Aug 04 2016 13:54
i'm not sure for now (will ask to someone who knows), that's why i wanted to know about a daemon feature, or the Nextflow minimal memory requirement
Paolo Di Tommaso
@pditommaso
Aug 04 2016 13:55
let me check
Screen Shot 2016-08-04 at 15.56.17.png
a basic script needs 45 MB
but obviously greatly depends by your script
Paolo Di Tommaso
@pditommaso
Aug 04 2016 14:00
also the JVM tends to allocate as much as it can, so it is important that to tune correctly the JVM min-max heap size if need to keep the mem usage under control
Mokok
@Mokok
Aug 04 2016 14:01
ofc, but i wanted to know about NF, the tasks to be run are supposed to be known, and their requirement too, NF was the only unknown part of the calculation.
thks for info and advice ;)
Paolo Di Tommaso
@pditommaso
Aug 04 2016 14:02
um, your approach seems a bit optimistic :)
I would suggest to profile an execution of your pipeline to get an better estimation
Mokok
@Mokok
Aug 04 2016 14:04

i use to be optimistic... even if disillusions often involve some costs, it's a better way to live ^^

But you're completely right, profiling is needed :+1:

Paolo Di Tommaso
@pditommaso
Aug 04 2016 14:05
this is an excellent tool if you need http://yourkit.com
Mokok
@Mokok
Aug 04 2016 14:08
thanks for sharing, gonna give a look at it
Chris Fields
@cjfields
Aug 04 2016 15:41
@pditommaso is there a recommended way of moving/copying the work dir .command.* files over for jobs, but renaming them per sample ID or file ID? This is infrequent but I find in cases where I have R/perl/python code in the script section I want to retain these for downstream work.
I could use the stdout directive and subscribe to it, but this doesn't work for stderr
Paolo Di Tommaso
@pditommaso
Aug 04 2016 15:44
not sure to understand your question, do you want to rename nextflow .command.* script files for sample ID?
Chris Fields
@cjfields
Aug 04 2016 15:45
essentially yes, and then have them moved/copied over vis publishDir. Could also create named directories and moved the files within.
Paolo Di Tommaso
@pditommaso
Aug 04 2016 15:46
to keep track of the provenance in your pipeline I guess ..
right?
Chris Fields
@cjfields
Aug 04 2016 15:47
yup
Paolo Di Tommaso
@pditommaso
Aug 04 2016 15:47
ok
currently you will need to manage manually, thus you will need to prepend a bash cp/mv command to your script, give a name of your choice and handle them as any other file output
however I'm working on a log command that will simplify to track tasks execution and provenance
have a look to this thread
Chris Fields
@cjfields
Aug 04 2016 16:22
ok. That works for me, I can always create the Rscript and call from bash for the time being. I mainly wanted to do this within the process, like so:
#!/usr/bin/env nextflow

process processMEDIP {
    executor 'pbs'
    cpus 1
    queue 'default'
    memory "4 GB"
    module RMod
    publishDir "${params.results}/test-capture", mode: "copy"

    input:
    set pairId, file(alns) from MEDIPPairs

    output:
    file '*.{RDS,out}' into hMeDIP_RData

    script:
    """
    #!/usr/bin/env Rscript
    library(MEDIPS)
    library('${params.bsGenome}')
    BSgenome = '${params.bsGenome}'
    uniq = 1e-3
    extend = 230
    shift = 0
    ws = 100

    hMeDIP = MEDIPS.createSet(
        file = "${alns[0]}",
        BSgenome = BSgenome,
        extend = extend,
        shift = shift,
        uniq = uniq,
        window_size = ws
    )

    saveRDS(hMeDIP, "${alns[0].getName()}.MEDIP.RDS")

    input = MEDIPS.createSet(
        file = "${alns[1]}",
        BSgenome = BSgenome,
        extend = extend,
        shift = shift,
        uniq = uniq,
        window_size = ws
    )

    saveRDS(input, "${alns[1].getName()}.Input.RDS")
    """
}
Paolo Di Tommaso
@pditommaso
Aug 04 2016 16:24
it looks great
Chris Fields
@cjfields
Aug 04 2016 16:24
Well, that's a weird gitter issue
Paolo Di Tommaso
@pditommaso
Aug 04 2016 16:24
which one? formatting ?
Chris Fields
@cjfields
Aug 04 2016 16:27
Markdown formatting on gitter.im; it's not working within the backticks completely. Gave up and added it to bash
Paolo Di Tommaso
@pditommaso
Aug 04 2016 16:28
:+1: