These are chat archives for nextflow-io/nextflow

20th
Sep 2017
Phil Ewels
@ewels
Sep 20 2017 04:16
I don't think that will work, no - you can't have the env scope inside the params scope
I think it'd probably be easiest to keep everything as params and then just use command line flags for your script instead..
Or if you really want to use environment variables, just set them from params variables inside your script block..
Phil Ewels
@ewels
Sep 20 2017 04:35
or maybe actually, if you set the paths as params but then use this to set the envs at run time..
params {
  genome = 'hg38'
  genomes {
    'hg38'  {
      bwa = 'hg38.bwa.fasta'
      dnase_bed =  "dnase_all_p10_ucsc.bed.gz"
    }
  }
}
env {
  DNASE_BED = { params.[params.genome].dnase_bed }
}
..that may work?
Ashley S Doane
@DoaneAS
Sep 20 2017 06:27
aah thanks that's nice
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:08
brilliant !
there's just a glitch in the syntax, the following works
env {
  DNASE_BED = params.genomes[params.genome].dnase_bed
}
Ashley S Doane
@DoaneAS
Sep 20 2017 07:09
ok cool thanks!
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:10
the only problem is that the user enters an invalid parameter it will return an empty env var
you may want to make it more robust doing the following
env {
  DNASE_BED = params.genome in params.genomes ? params.genomes[params.genome].dnase_bed : null
}
then in the script validate the params.genome with the following statement
assert params.genome in params.genomes, "Unknown genome entry: `$params.genome`"
or
Ashley S Doane
@DoaneAS
Sep 20 2017 07:12
in each process, my scripts are often using many of the same arguments (annotation files). so I was just llooking for a way to set these without making so many channels.
maybe using ENV was not the best...
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:12
if( !(params.genome in params.genomes)) error "Unknown genome entry: `$params.genome`"
Ashley S Doane
@DoaneAS
Sep 20 2017 07:12
was just how I was doing it before using bash
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:13
not a good idea :)
input files need to be declared properly to stage them correctly in the work dir
Ashley S Doane
@DoaneAS
Sep 20 2017 07:14
right, makes sense.
is there a way to define persistent variables that can be reused in many channels?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:20
give me an example of what you would like to do
Ashley S Doane
@DoaneAS
Sep 20 2017 07:25
so for example, I use a bed file of black listed regions in several processes. The scripts I call from nextflow take this bed file as a command line arg. So I would like to be able to set a variable that can be used in several processes, like:
    """
    processAlignment.nf.sh ${bamfile} ${BLACKBED}
    """
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:28
you can use a generic global variable or the config file trick to do that
Ashley S Doane
@DoaneAS
Sep 20 2017 07:30
cool thanks. that's great I will try. :)
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:30
the problem you will break the requirement to define the input files in the NF process, hence your pipeline won't be portable and won't work using containers
Ashley S Doane
@DoaneAS
Sep 20 2017 07:31
right... I could include the scripts to build the genome references. Mostly lots of files from ENCODE
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:31
we are looking how to improve this, but for now there isn't an easy solution other than declare all inputs
which are stored in shared file system, I guess
Ashley S Doane
@DoaneAS
Sep 20 2017 07:32
could almost imagine some kind of file container with annotation builds. hg38 ENCODE best practice, etc
actually one could put files into a singularity container.... any thanks for your help, really appreciate it!
Paolo Di Tommaso
@pditommaso
Sep 20 2017 07:36
:ok_hand:
Phil Ewels
@ewels
Sep 20 2017 08:15
Personally I prefer keeping reference files out of containers, as it makes them massive and slow if you want to update anything. Also it makes it harder to run the same pipeline for lots of different genomes..
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:15
I agree
Phil, how is the status of the report?
Phil Ewels
@ewels
Sep 20 2017 08:16
It's not difficult to split up channels using .into{} and then use them for multiple processes, so I think that's the nicest way to go in my opinion
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:17
not putting pressure, just to understand if you want me to review/merge it or you need to work on it/need help
Phil Ewels
@ewels
Sep 20 2017 08:17
@pditommaso - slowly! I'm slowly slowly getting closer to being able to get what I want out of the groovy code. Once I have it dumped into the template then things will rapidly speed up as I know how to write javascript :P
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:17
ok, cool! no hurry at all
Phil Ewels
@ewels
Sep 20 2017 08:18
at the moment I'm spending lots of time messing around trying to figure out how class types work and how to iterate over them etc.
eg. I have class nextflow.script.WorkflowMetadata but if I do each {} on it, I get a single csv list emitted
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:19
umm, not sure to understand
Phil Ewels
@ewels
Sep 20 2017 08:19
equally if I try to reference sub-variables directly (workflow : getWorkflowMetadata(), workflow.scriptFile) then that doesn't work
so basically, this doesn't seem to return a normal key:value map I think:
private WorkflowMetadata getWorkflowMetadata() {
        nfsession.binding.getVariable('workflow') as WorkflowMetadata
}
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:20
well, no. It's a plain class not a Map ..
Phil Ewels
@ewels
Sep 20 2017 08:20
eg. this doesn't do what I expect:
getWorkflowMetadata().each { k,v ->
    workflow[it.k] = it.v
}
yeah, that was the conclusion I was coming to :)
So atm I'm trying to figure out how to make it into a Map
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:21
ok, but it would be possible to hack to make it work as Map
Phil Ewels
@ewels
Sep 20 2017 08:21
either that, or just knowing how to access the class variables directly
eg. I tried this:
def wfmd = getWorkflowMetadata()
println wfmd.scriptId
but I don't think that works..
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:22
yes, in this way of course
that's the standard notation to access method fields
Phil Ewels
@ewels
Sep 20 2017 08:23
ah ok, maybe I should try that again
I was aiming at this:
def wfmd = getWorkflowMetadata()
def workflow = [
        "scriptId" : wfmd.scriptId,
        "scriptFile" : wfmd.scriptFile,
        "scriptName" : wfmd.scriptName,
        "repository" : wfmd.repository,
        "commitId" : wfmd.commitId,
        "revision" : wfmd.revision,
        "projectDir" : wfmd.projectDir,
        "start" : wfmd.start,
        "container" : wfmd.container,
        "commandLine" : wfmd.commandLine,
        "nextflow" : wfmd.nextflow,
        "workDir" : wfmd.workDir,
        "launchDir" : wfmd.launchDir,
        "profile" : wfmd.profile,
        "sessionId" : wfmd.sessionId,
        "resume" : wfmd.resume,
        "runName" : wfmd.runName
]
as a hack to make it into a Map ;)
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:25
let check for something less verbose
Phil Ewels
@ewels
Sep 20 2017 08:25
if that works then I'm happy for now - you can refactor all of my crappy code in the PR review ;)
aha! yes I just managed to get it into the HTML. ok great :D
I think I was messing up the templating syntax before, multitasking with too many languages - groovy templating is not the same syntax as python jinja2 templating!
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:27
:)
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:30
Hi all. I am loading a module inside a process that requires an export of a environmental variable BEFORE loading the module. I have added env.WORKFLOW_NAME = 'staphylococcus-aureus-typing' in my nextflow.config file but the .command.env shows its loading the module first then doing the export. I need it to do it before the loading of the module. What I also did was to use beforeScript 'export WORKFLOW_NAME="staphylococcus-aureus-typing"' in my process but that didnt work either. Any ideas into how I can set the enviroment variable before nextflow loads my module?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:30
@ewels have a look at this
import org.codehaus.groovy.runtime.InvokerHelper

class Foo {

  String foo = 'Hello'

  String bar = 'World'

  def get(String name) { 
    InvokerHelper.getProperty(this,name)
  }
}

def x = new Foo()
println x.foo
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:37
@alshahib_twitter setting the var as env.WORKFLOW_NAME=xx should work
how are you loading the module, using directly module load xx in the command script ?
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:40
This is my nextflow.config file:
export WORKFLOW_NAME="staphylococcus-aureus-typing"
And this is my process:
process mlst_typing {

    beforeScript 'export WORKFLOW_NAME="staphylococcus-aureus-typing"'

    module 'phe/mlst_typing'

    input:
    file fastq_files from processed_fastqs_for_mlst
    val(workflow) from workflow

    output:
    file('mlst_typing')


    script:

    """
    mlst_typing.py -w $workflow -i .
    """ 
}
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:40
that won't work
have you tried to the following in the config
env.WORKFLOW_NAME="staphylococcus-aureus-typing"
w/o beforeScript
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:42
Sorry yes I pasted the wrong thing.
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:42
ok
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:42
I have this in my config:
env.WORKFLOW_NAME="staphylococcus-aureus-typing"
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:42
ok
umm, try directly to put module load 'phe/mlst_typing' in top of your script instead of using module directive
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:43
.command.env:
nxf_module_load phe mlst_typing
export WORKFLOW_NAME="staphylococcus-aureus-typing"
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:44
oops
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:44
I have other processes that I use module 'modulename' and it works
its loading the module fine, its just not setting env before loading module
Paolo Di Tommaso
@pditommaso
Sep 20 2017 08:49
put the module load command directly in your script
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 08:51
@pditommaso Yes did that and its worked. thanks
Phil Ewels
@ewels
Sep 20 2017 09:13
Hi @pditommaso - it looks like the report onFlowComplete() handler is being fired before the final WorkflowMetadata variables are being set (success, complete, duration, exitStatus, errorMessage, errorReport) - any ideas how I can get at them?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 09:23
shit
I can try to fix it, does't make much sense
Phil Ewels
@ewels
Sep 20 2017 09:26
:clap: thanks!
Paolo Di Tommaso
@pditommaso
Sep 20 2017 09:35
anybody willing to review the new join operator nextflow-io/nextflow#460 ?
Evan Floden
@evanfloden
Sep 20 2017 09:42
The issue write up looks good to me.
Phil Ewels
@ewels
Sep 20 2017 10:13
@pditommaso - the last chunk of data that I can't current find is the resource requests for each task (I'd like to be able to normalise the usage against what was requested for each task). Is this possible?
Venkat Malladi
@vsmalladi
Sep 20 2017 12:10
Is there a way to write out a temp file using collectFile and then reading that back in?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:15
sure, from a process or another operator ?
Venkat Malladi
@vsmalladi
Sep 20 2017 12:16
Currently I was using Channel
Channel
.fromPath( params.reads )
.flatten()
.map { file -> [ file.toString(), file.getFileName().toString() ].join(",")}]
.collectFile(name: 'fileList.csv', newLine: true)
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:17
sure, the result of collectFile is a channel
you could do
Channel
  .fromPath( params.reads )
  .flatten()
  .map { file -> [ file.toString(), file.getFileName().toString() ].join(",")}]
  .collectFile(name: 'fileList.csv', newLine: true)
  .set { file_list_ch }
Venkat Malladi
@vsmalladi
Sep 20 2017 12:17
ah okay
and that will give the name of the fileList.csv path
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:18
yes, that channel emits the path of the resulting file
Venkat Malladi
@vsmalladi
Sep 20 2017 12:18
cool thanks
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:19
:+1:
Venkat Malladi
@vsmalladi
Sep 20 2017 12:19
trying to play around with verifying the list of files provided is in the metadata file also provided
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 12:39
How do I specify the current directory for work/xx/xxxx for that particular process. I need it to use in inside my script. dot works with other processes but with one particular process its not working as the python script running in the process requires the full path of the working directory. Adding $PWD does not work as it gives the path that Im running the nextflow script from. Thanks
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:40
use \$PWD
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:45
@ewels > the last chunk of data that I can't current find is the resource requests for each task
do you mean using dynamic resources allocation ?
Anthony Underwood
@aunderwo
Sep 20 2017 12:53
@pditommaso How easy would it be to make the AWS nextflow ami available in other regions?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:53
you should be able to copy it
Anthony Underwood
@aunderwo
Sep 20 2017 12:54
using AWS commands?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:54
yes, I guess both the console e the cli tools
frankly I don't remember how but it's possible, check in the AWS docs, let me know if you have any problem
Anthony Underwood
@aunderwo
Sep 20 2017 12:55
OK - I'll take a look just wanted to know if it was poss
Is it possible to specify a docker image not on dockerhub?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:56
from NF ?
Anthony Underwood
@aunderwo
Sep 20 2017 12:56
yes we want to run in AWS specifying a private git repo for the workflow and a private docker image (Whilst in dev)
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:57
you can use any image that can be pulled by the docker pull command
however, NF does not authenticate the docker user for a private registry
Anthony Underwood
@aunderwo
Sep 20 2017 12:59
so you'd have to run docker login first?
Paolo Di Tommaso
@pditommaso
Sep 20 2017 12:59
hence you need to manage the authentication on your own to access private repo
not so trivial when taking in consideration a distributed execution, see #45
we may improve it in future, but now you will need to manage it by yourself
Anthony Underwood
@aunderwo
Sep 20 2017 13:00
OK.
Phil Ewels
@ewels
Sep 20 2017 13:30
@pditommaso > using dynamic resources allocation? --> yes, basically whatever actually ended up in the cluster job (or similar) for that task
Paolo Di Tommaso
@pditommaso
Sep 20 2017 13:30
I got it :+1:
I can add them
Phil Ewels
@ewels
Sep 20 2017 13:31
ok great! Thanks :)
Just saw the comment on the issue, I'll reply there too for consistency
Paolo Di Tommaso
@pditommaso
Sep 20 2017 13:31
great
Ali Al-Shahib
@alshahib_twitter
Sep 20 2017 15:41
@pditommaso many thanks that works
I want to use 'module use' and not 'module load'. Is there a way to do that in nextflow or do I have to write it inside my script?
Anthony Underwood
@aunderwo
Sep 20 2017 16:05
@alshahib_twitter you may mean module use followed bymodule load
mahdi-b
@mahdi-b
Sep 20 2017 19:03
HI @pditommaso, FWI I just spoke with someone at Google Education about NextFlow. They mentioned that they will get in touch regarding a potential GoogleCloudDriver
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:04
interesting! but what's GoogleCloudDriver ?
a storage service similar dropbox ?
mahdi-b
@mahdi-b
Sep 20 2017 19:06
Sorry, I meant a driver to start running NextFlow on Google Could Platform
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:06
ohh, that would be cool
may I ask in what organisation do you work ?
mahdi-b
@mahdi-b
Sep 20 2017 19:08
University of Hawaii
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:08
ah yes! :)
you are coming to RECOMB in barcelona, next month right?
mahdi-b
@mahdi-b
Sep 20 2017 19:09
Our University has some agreement with Google and it would great to be able to run jobs on GPC when the cluster is oversubscribed
Yes!
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:10
fantastic
would be nice to have a chat when you will come in Barcelona
just drop an email when you will be around
Félix C. Morency
@fmorency
Sep 20 2017 19:38
that's cool
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:40
wait and see ;)
Félix C. Morency
@fmorency
Sep 20 2017 19:44
:O
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:44
what?
Félix C. Morency
@fmorency
Sep 20 2017 19:45
I'm looking forward to it.
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:45
good! have you put NF stickers on your car ? :joy:
Félix C. Morency
@fmorency
Sep 20 2017 19:46
Not mine but my collegue. I still owe you a picture
Paolo Di Tommaso
@pditommaso
Sep 20 2017 19:46
LOL
Félix C. Morency
@fmorency
Sep 20 2017 20:06
IMG_20170920_155937.jpg
The NF mobile.
Paolo Di Tommaso
@pditommaso
Sep 20 2017 20:07
:joy: :joy: :joy: :joy: