These are chat archives for nextflow-io/nextflow

28th
Mar 2017
Karin Lagesen
@karinlag
Mar 28 2017 10:54
good afternoon
I am looking at what they\ve done here:
where their commandline becomes:
$ nextflow alignment.nf -profile alignment --threads 2 --output my_alignment_output
I was convinced we had to use run?
or is it implied if you don\t say anyting else?
Paolo Di Tommaso
@pditommaso
Mar 28 2017 10:57
Not understanding. What is implied?
Karin Lagesen
@karinlag
Mar 28 2017 10:57
run
I thought that to run nextflow script, you had to do:
nextflow run scriptname
but here, they don't have run
Paolo Di Tommaso
@pditommaso
Mar 28 2017 10:59
Ahh
Well, we maintain that CLI syntax for backward compatibility
The preferred way is to use run
Karin Lagesen
@karinlag
Mar 28 2017 11:00
ok, thanks!
was starting to feel a bit confused there...
Paolo Di Tommaso
@pditommaso
Mar 28 2017 11:01
:)
Karin Lagesen
@karinlag
Mar 28 2017 11:28
another question
I am seeing cluster options with the params and not with the process watchamacallit
I would havve thought tha tthat was a process thing?
(assuming slurm here)
Maxime Garcia
@MaxUlysse
Mar 28 2017 11:45
Karin Lagesen
@karinlag
Mar 28 2017 11:46
excactly
L17
ohhh
didn't notice there were two
so, how do they work?
The idea is to give each process this specific option -A $params.project which in our cases is the project bumber to able to launch slurm jobs on our swedish clusters
Karin Lagesen
@karinlag
Mar 28 2017 11:52
ah, now I see it :)
Maxime Garcia
@MaxUlysse
Mar 28 2017 11:52
this params.clusterOptions is global, but here it's set to false, so nothing is append to this option
Karin Lagesen
@karinlag
Mar 28 2017 11:53
so you give that on the command line when you run it...?
Maxime Garcia
@MaxUlysse
Mar 28 2017 11:54
when we run it we use something like:
nextflow run ... --project B121212154 ... so -A B121212154 is sent via the slurm job
Karin Lagesen
@karinlag
Mar 28 2017 11:55
awesome, this is really useful :)
thanks for the explanation :)
Maxime Garcia
@MaxUlysse
Mar 28 2017 11:55
Or you can have this params.project defined in a config file, and not specifying it in the command line
Karin Lagesen
@karinlag
Mar 28 2017 11:56
:)
Maxime Garcia
@MaxUlysse
Mar 28 2017 11:56
But you still have the possiblity to change it via command line
You're welcome
I'm guessing you're using slurm as well
Karin Lagesen
@karinlag
Mar 28 2017 11:57
that I am
Phil Ewels
@ewels
Mar 28 2017 12:32
Exactly :+1: The bonus clusterOptions allows any other arbitrary string to be added on at the end too, also by user config or command line
Karin Lagesen
@karinlag
Mar 28 2017 14:22
ok, so... I am trying to create two config files that can be included with the pipeline: one that is computer specific (executors, queues, cpus per program, memory ++) and another that contains input/ooutput directories, parameters to programs, and so o
  1. does this make sense? and 2. still have a bit of a confusion re process vs params, and the syntax
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:27
if you are undecided, start with one and then you will split when needed
the less is the better ..
Karin Lagesen
@karinlag
Mar 28 2017 14:27
I know :)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:27
confusion re process vs params, and the syntax
we can help you, what's the problem :)
they have params and then $something
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:28
they guys of SciLifeLab are pros!
Karin Lagesen
@karinlag
Mar 28 2017 14:29
I know, trying to learn from The Best :)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:29
makes sense!
ok, maybe the $ in the config file but it's just a way to escape the process file name
first of all params != process
params allows you to specify script parameters in the config file instead of typing on the command line
tho you can still override them in the CLI if needed
so far so good ?
Karin Lagesen
@karinlag
Mar 28 2017 14:31
yes :)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:31
great
Karin Lagesen
@karinlag
Mar 28 2017 14:31
(I so owe you dinner btw)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:31
ahahah
I hope so ;)
then process in the config allows you to define process directives such as cpus, mem, time, etc
ok ?
Karin Lagesen
@karinlag
Mar 28 2017 14:32
yes
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:33
let go into the syntax
default configuration applied to all processes in your script
process.cpus = 4 
process.memory = 8.GB
.. etc
Karin Lagesen
@karinlag
Mar 28 2017 14:34
and I can do this as ``` process {
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:34
or alternative syntax with the same semantic
Karin Lagesen
@karinlag
Mar 28 2017 14:34
cpu = x
}
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:34
process {
  cpus = 4 
  memory 8.GB
}
Karin Lagesen
@karinlag
Mar 28 2017 14:34
so far so good :)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:34
these are two different notation to define the same resources
don't confuse the latter with the process in the pipeline script
that would define task .. forget for now
Karin Lagesen
@karinlag
Mar 28 2017 14:35
I'm onboard with that
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:35
great
now the dollar thing
immagine in your script you have process name foo that requires 16 cpus instead of 4
you can specify that writing
process.$foo.cpus = 16
or
process {
  $foo{ 
    cpus = 16 
  }
}
(i prefer definitely the first)
does it make sense ?
Karin Lagesen
@karinlag
Mar 28 2017 14:38
clear as clean water :)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:38
fantastic, now you are a pro :D
I need to leave temporary now
Karin Lagesen
@karinlag
Mar 28 2017 14:38
thanks!
Paolo Di Tommaso
@pditommaso
Mar 28 2017 14:38
:+1:
Karin Lagesen
@karinlag
Mar 28 2017 14:39
and thankyou for being so available here!
Maxime Garcia
@MaxUlysse
Mar 28 2017 14:50
@pditommaso Thanks a lot, but we do all come here when we have a difficult questions ;-)
Phil Ewels
@ewels
Mar 28 2017 14:57
@pditommaso - can't work out if we have a typo here: clusterOptions = { "-A $params.project " + (params.clusterOptions ?: '') }
Any difference between $params and just params? I guess the second one should have a dollar prefix?
Karin Lagesen
@karinlag
Mar 28 2017 15:18
@ewels he said something above about the dollar sign escaping the process filename
not sure if that helps?
not sure how to use it in "real life" inside the code
Phil Ewels
@ewels
Mar 28 2017 15:20
Yeah, I didn't really understand what he meant by that :P
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:20
Any difference between $params and just params? I guess the second one should have a dollar prefix?
nope
it's fine in that way
Karin Lagesen
@karinlag
Mar 28 2017 15:20
I'm using println to figure out params at the moment
Phil Ewels
@ewels
Mar 28 2017 15:20
aha - it's because it's inside the double quotes the first time?
Karin Lagesen
@karinlag
Mar 28 2017 15:21
double quotes allow for expansion! Just remembered that now
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:21
yes, you need $ to interpolate variable in strings, just like bash
@karinlag yes!
Phil Ewels
@ewels
Mar 28 2017 15:21
sorry, didn't see that when I pasted it above
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:21
you could refactor like this
Phil Ewels
@ewels
Mar 28 2017 15:21
ok good, so I can breathe easy :sweat_smile:
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:22
clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}" }
or
clusterOptions = { "-A ${params.project} ${params.clusterOptions ?: ''}" }
Phil Ewels
@ewels
Mar 28 2017 15:22
any advantage to the second one?
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:23
well, this is basically the same, just for symmetry it uses ${ } in both cases
Phil Ewels
@ewels
Mar 28 2017 15:23
:+1:
Karin Lagesen
@karinlag
Mar 28 2017 15:23
but... re process.$foo.cpus
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:23
it turns out that a great things is that you can program also the config file
but... re process.$foo.cpus
forget that
Karin Lagesen
@karinlag
Mar 28 2017 15:24
do I actually inside nf script itself type in the dollar, or leave it out?
(as in, how do I use it in a sentence :))
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:25
$ it's a valid character identifier, so you can have any name starting with
$var1 = 1
that's different when you use it inside a double quoted string i.e.
println "$var1"
then it replace the variable var1 with its value in the string
Karin Lagesen
@karinlag
Mar 28 2017 15:29
I got that (I think), but what about when I use it in params (maybe I
maybe I'm having a brainfart here)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:29
:D
but what about when I use it in params
for example ?
Karin Lagesen
@karinlag
Mar 28 2017 15:30
but from what I see @ewels doing, they seem to do params.$something.paramname = something
where the $something seems to refer to a process
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:30
can you point out a code snippet ?
Karin Lagesen
@karinlag
Mar 28 2017 15:31
yep
process {
  executor = 'slurm'
  cpus = { 1 * task.attempt }
  memory = { 8.GB * task.attempt }
  time = { 2.h * task.attempt }
  clusterOptions = { "-A $params.project " + (params.clusterOptions ?: '') }
  errorStrategy = { task.exitStatus == 143 ? 'retry' : 'finish' }
  maxRetries = 3
  maxErrors = '-1'

  // Environment modules and resource requirements
  $makeSTARindex {
    module = ['bioinfo-tools', 'star/2.5.1b']
    cpus = { 10 * task.attempt }
    memory = { 80.GB * task.attempt }
    time = { 5.h * task.attempt }
  }
}
which basically becomes process.$makeSTARindex.time, for instance
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:33
yes, that's not a params !
Karin Lagesen
@karinlag
Mar 28 2017 15:34
ah doh!!!!
facepalm
so how does that smart stuff work?
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:34
we have an emoticons for that 🤦‍♂️
oops does not work here :D
so how does that smart stuff work?
what do you mean ?
Karin Lagesen
@karinlag
Mar 28 2017 15:35
so, one of these become process.$makeSTARindex.time (an example)
Phil Ewels
@ewels
Mar 28 2017 15:36
@karinlag I think it's just a special case for process names, that they're always prepended with a dollar
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:36
yes
Phil Ewels
@ewels
Mar 28 2017 15:36
I think it's separate from the general variable expansion stuff
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:36
that's just an alternative syntax for
process.$makeSTARindex.time
Karin Lagesen
@karinlag
Mar 28 2017 15:36
so this is what connects process options from config files to the process they belong to?
Phil Ewels
@ewels
Mar 28 2017 15:36
yes :+1:
Karin Lagesen
@karinlag
Mar 28 2017 15:37
AMAZING! I finally got it!
Phil Ewels
@ewels
Mar 28 2017 15:37
process.time sets the default time for all processes. process.$makeSTARindex.time sets the time for just that process (makeSTARindex)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:37
thanks @ewels
Phil Ewels
@ewels
Mar 28 2017 15:37
Team effort! :tada:
Karin Lagesen
@karinlag
Mar 28 2017 15:37
(sorry for the exuberance, I just knew I smelled something cool that I hadn't understood yet :))
Phil Ewels
@ewels
Mar 28 2017 15:38
haha, no worries. The code you're referring to has had a long history, slowly evolving as we got to understand more and more of the NF features.
If you scroll up this gitter history far enough, you'll probably find me asking the same questions
Karin Lagesen
@karinlag
Mar 28 2017 15:38
seems like I'm cribbing from the best then :)
Phil Ewels
@ewels
Mar 28 2017 15:39
hah, not sure about that. The best gitter-procrastinators perhaps :sunglasses:
(and on that note, I have to leave!)
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:40
:wave:
Karin Lagesen
@karinlag
Mar 28 2017 15:40
...any way of printing process... not variables, but options?
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:41
not sure to understand
Karin Lagesen
@karinlag
Mar 28 2017 15:41
ok
I'm ensuring that I get all the right input/options etc
so I just wrote a oneliner saying
println params
which then shows me all the params stuff that I have from various config files
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:42
yes, you can do that
there's also the nextflow config command for that
Karin Lagesen
@karinlag
Mar 28 2017 15:43
that\s another thing... how is a project defined in nf?
is it the nf file, the directory it\s in, or what?
thanks for the config btw, works like a charm!
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:44
there's a very simple requirement if you want to run your script directly from a GitHub repo
to call the pipeline script main.nf and have it in the project root
then there are three special directories:
  1. bin/ that is automatically added to the $PATH
  2. templates/ that you can use for script templates
  3. lib/ is added to the java classpath
that's all
Karin Lagesen
@karinlag
Mar 28 2017 15:48
but... when you talk about a project, that is then the directory with subdirs and things that the nf script(s) is in?
that is the entity?
Paolo Di Tommaso
@pditommaso
Mar 28 2017 15:49
for project I mean any directory structure you can/want to upload as a GitHub repository
makes sense ?
Phil Ewels
@ewels
Mar 28 2017 15:52

is it the nf file, the directory it\s in, or what?

It's the directory that the script is in, usually (if you follow the convention described by @pditommaso above)

Karin Lagesen
@karinlag
Mar 28 2017 15:55
makes mucho sense, thanks
Mike Smoot
@mes5k
Mar 28 2017 16:08
Hi @pditommaso, we're running a pipeline that worked with older versions of NF with version 0.24.1 and we're getting warnings in the logs: Mar-28 16:03:00.581 [Thread-1] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 0 -- first: null and the pipeline isn't proceeding. We know the processes that haven't been reached yet, but we haven't been able to pinpoint a problem beyond that. Any ideas?
Paolo Di Tommaso
@pditommaso
Mar 28 2017 16:09
um, basically it hangs ..
Mike Smoot
@mes5k
Mar 28 2017 16:10
Yup
Paolo Di Tommaso
@pditommaso
Mar 28 2017 16:10
usually this depends on some channel that is not closed properly . .
is it deterministic ?
Mike Smoot
@mes5k
Mar 28 2017 16:11
So far, yes.
Paolo Di Tommaso
@pditommaso
Mar 28 2017 16:12
if you don't have any idea what could be the problematic channel
I would bisect the code until you don't find the problematic point
Mike Smoot
@mes5k
Mar 28 2017 16:13
Ok. I'll do some digging and see what I can find.
Paolo Di Tommaso
@pditommaso
Mar 28 2017 16:13
it would be nice to have a way to display open channels, tho
it could be a nice feature request :)
Mike Smoot
@mes5k
Mar 28 2017 16:15
That would definitely help!
Paolo Di Tommaso
@pditommaso
Mar 28 2017 16:15
yes, I'm realising that
Karin Lagesen
@karinlag
Mar 28 2017 17:36
ok
I have an input set wit 3x2 files, like this:
(ariba) karinlag@eris[work] ls /home/karinlag/PycharmProjects/testdata/short/*R{1,2}_001.short.fastq.gz    [ 7:36]
/home/karinlag/PycharmProjects/testdata/short/Angen-bacDNA2-78-2013-01-4718_S29_L001_R1_001.short.fastq.gz
/home/karinlag/PycharmProjects/testdata/short/Angen-bacDNA2-78-2013-01-4718_S29_L001_R2_001.short.fastq.gz
/home/karinlag/PycharmProjects/testdata/short/Angen-bacDNA2-79-2013-01-4835_S30_L001_R1_001.short.fastq.gz
/home/karinlag/PycharmProjects/testdata/short/Angen-bacDNA2-79-2013-01-4835_S30_L001_R2_001.short.fastq.gz
/home/karinlag/PycharmProjects/testdata/short/Angen-bacDNA2-92-2013-01-5057_S44_L001_R1_001.short.fastq.gz
/home/karinlag/PycharmProjects/testdata/short/Angen-bacDNA2-92-2013-01-5057_S44_L001_R2_001.short.fastq.gz
(ariba) karinlag@eris[work]                                                                                [ 7:36]
and this is my nf.script:
Channel
    .fromFilePairs( params.reads )
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
    .set { read_pairs }

process run_ariba_mlst_prep {
    publishDir params.out_dir + "/" + params.mlst_results

    output:
    file "mlst_db" into mlst_db

    """
    ariba pubmlstget "$params.mlst_scheme" mlst_db
    """
}

process run_ariba_mlst_pred {
    publishDir params.out_dir + "/" + params.mlst_results

    input:
    set pair_id, file(reads) from read_pairs
    file "mlst_db" from mlst_db

    output:
    file "${pair_id}" into pair_id

    """
    mkdir ${pair_id}
    ariba run mlst_db/ref_db ${reads} pair_id

    """
}
and I'm screwing something up, because I only get results from the last set of files, the Angen-bacDNA2-92-2013-01-5057_S44_L001 files
params.reads = "/home/karinlag/PycharmProjects/testdata/short/*R{1,2}_001.short.fastq.gz"
so that does give me the paired files (I think)
and I only have two dirs in my work dir, so I am only running one of each of these, som I am screwing up my input somehow.