These are chat archives for nextflow-io/nextflow

27th
Feb 2019
Anthony Underwood
@aunderwo
Feb 27 03:13
Solved. Lack of understanding of classpath. Needed to specify the jar file
 export NXF_CLASSPATH=/home/ubuntu/aws-java-sdk/lib/aws-java-sdk-1.11.506.jar
David Cotter
@davidcotter
Feb 27 10:28
Hello. I have an input CSV to a nextflow pipeline that defines what tasks should be run. The first step is to get the sample names from a column of the CSV and extract those samples from a multi-sample file then in the next step process each one. I do not know how to combine those two channels shere the line combined = is below
samples = Channel
    .fromPath(params.input_file)
    .splitCsv(header:true, sep:'\t')
    .map{ row->tuple(row.sample_id, row.details) }


process split_sample_file {
    input:
    file multisample_file from multisample_file
    file sample_list_csv from sample_list_csv

    output:
    file('*.out') into split_samples

    script:
    """
    split.py $sample_list_csv $multisample_file
    """
}


combined = samples.combine(split_samples)

process process_ {
    input:
    set sample_id, details, file(th_file) from combined
    script:
    """
    echo $sample_id
    """
}
Maybe I am going about it the wrong way - thanks.
Jonathan Manning
@pinin4fjords
Feb 27 11:33
@davidcotter I don't quite follow the logic of what you're doing- others may have more input. But I do an analagous thing (I think) by having sample names and other info in the same csv/tsv file. e.g. I have multiple rows for the same sample ID, corresponding to different FASTQ files. If you then create tuples from each row keyed by sample ID (as you're doing- though don't forget to set() to a channel), you can use groupTuple() to organise them and process them together.

A question of my own: how do I refer to config variables of arbitrary depth from other config variables? The following does not work:

env {
    WORKFLOW_RESULTS_DIR = "${workflow.projectDir}/results"
 }

So I'm obviously missing something important.

David Cotter
@davidcotter
Feb 27 12:16
@pinin4fjords Thanks I will try a few things here see if I can get it working. . For your one I think ${workflow.projectDir} will be a BASH env variable since it is in an env block same as $PATH in this example. https://www.nextflow.io/docs/latest/config.html#scope-env - you need a groovy variable in an bash type block which hopefully is !{workflow.projectDir} if it behaves like shell: https://www.nextflow.io/docs/latest/process.html#shell
Paolo Di Tommaso
@pditommaso
Feb 27 12:42
the proper variables for paths are baseDir and workDir
Jonathan Manning
@pinin4fjords
Feb 27 12:59
Thanks @pditommaso , but how would I refer to e.g. workflow.baseDir from the 'env' scope in order to construct an environment variable?
Paolo Di Tommaso
@pditommaso
Feb 27 13:01
baseDir is a config variable, not a workflow one
env {
    WORKFLOW_RESULTS_DIR = "${baseDir}/results"
 }
Jonathan Manning
@pinin4fjords
Feb 27 13:32
I could have sworn I tried that, but thanks it does work. I actually need to point to execution dir rather than the workflow dir though- what's the proper variable for that?
Paolo Di Tommaso
@pditommaso
Feb 27 13:32
$PWD ?
Jonathan Manning
@pinin4fjords
Feb 27 13:34
Ahh- simple as that. Thanks.
Paolo Di Tommaso
@pditommaso
Feb 27 13:35
:+1:
Jonathan Manning
@pinin4fjords
Feb 27 15:18

I'm trying to run workflows within workflows using repos on GitHub, and I'm getting:

 Unknown project `ebi-gene-expression-group/scxa-smartseq-workflow` -- NOTE: automatic download from remote repositories is disabled

Is that disabling intentional? Can I 'enable' it?

Paolo Di Tommaso
@pditommaso
Feb 27 15:20
grep is the best friend of developers
Jonathan Manning
@pinin4fjords
Feb 27 15:21
Thanks- was just in the process of hunting that down :-)
Paolo Di Tommaso
@pditommaso
Feb 27 15:21
there could be NXF_OFFLINE in your env ?
Jonathan Manning
@pinin4fjords
Feb 27 15:24
Yep, it was there from earlier dev, mystery solved - thanks :-)
Paolo Di Tommaso
@pditommaso
Feb 27 15:24
welcome
Stephen Kelly
@stevekm
Feb 27 16:38

@jguhlin

conda isn't found

not sure if it will solve your actual problem but when I use conda, I end up putting something like source /shared/miniconda3/bin/activate in my process.beforeScript, since I never keep conda in my PATH. If you have a scripted login, you might be able to do the same and embed it here, such that it runs before your task gets fully executed and hopefully puts conda in your PATH

Stephen Kelly
@stevekm
Feb 27 16:45
@davidcotter did you get it to work? I am also a little confused at what you are trying to do but I agree it seems like .groupTuple might help. Also, in cases like these, I often end up using .combine and .filter, in order to first get all combinations of ID's from the two channels, then filter down to only entries where the IDs from both channels match; this way, the shorter list (the one you read in from .csv initially) should limit the final output
@davidcotter in general though I think you might benefit from trying to do the channel sample resolution before running any processes, so you dont need to have extraneous processes that try to influence execution logic. Its better to keep the logic surrounding which samples get processed inside the Channels and overall workflow, and keep the processes more bare.
just my opinion
lastwon1216
@lastwon1216
Feb 27 18:33
hello nextflowers! I have a script that contains 6 different processes that has to execute in order, but 3rd process (unloading reference genome) finishes its process way too faster than I run it separately with same input files which usually take about ~1 min, but it takes ~1sec to finish and passes to next process. Could there be possible reason if you ever experienced this kind of issue?
Dan Fornika
@dfornika
Feb 27 19:00
@lastwon1216 what do you mean by 'unloading' reference genome? Does that process complete correctly?
lastwon1216
@lastwon1216
Feb 27 19:02
so I am working on rnaseq pipeline using STAR to load and unload reference genome, and the 3rd process is to unload reference genome after the alignment
sometimes, it gives an error saying that there is nothing to unload even though I am always testing with same inputs
Alexander Peltzer
@apeltzer
Feb 27 19:09
Any specific requirements for such a pipeline that cannot be fulfilled easily ? Otherwise could use existing ones such as https://github.com/nf-core/rnaseq ;-)
lastwon1216
@lastwon1216
Feb 27 19:11
no, no such specific requirement
works perfectly fine when I test it separately. occurs if I run within the script.
I already looked that one, but I have to load and unload the reference genome every time I run. that's why..
Alexander Peltzer
@apeltzer
Feb 27 20:07
Not sure I understand that load/ unload thing though
Dan Fornika
@dfornika
Feb 27 20:50
@lastwon which executor are you using? (https://www.nextflow.io/docs/latest/executor.html)
lastwon1216
@lastwon1216
Feb 27 20:54
sge
Dan Fornika
@dfornika
Feb 27 21:02
So processes are executed as SGE jobs, probably on different machines. If you're unloading the genome in a separate process, there's no guarantee that it's running on a machine that has the genome loaded in memory, is there?
I'm not very familiar with STAR, but it sounds like the genome loading/unloading system depends on having access to a consistent shared memory between analyses. Do you have that?
lastwon1216
@lastwon1216
Feb 27 21:39
yes, probably that running on different machines would be the reason..
lastwon1216
@lastwon1216
Feb 27 21:48
I wonder if there is anyway to alternate that for STAR runners @dfornika thank you though
Joseph Guhlin
@jguhlin
Feb 27 22:22
@stevekm Thanks, I've switched the nextflow script to use bash -l instead and that seems to be working for now. Your solution is probably more elegant so I'll take a look once I've got things running
Rad Suchecki
@rsuchecki
Feb 27 23:31
@lastwon1216 loading/unloading genome references is optional in STAR and @dfornika is correct about it only making sense in the context of all processes accessing shared memory. Consider using nf-core pipeline mentioned by @apeltzer or if, for some reason, you can't, at least have a look at how they use STAR in NF