Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 20 21:08
    brooksomics commented #1497
  • Jun 20 06:35
    pditommaso commented #1628
  • Jun 20 00:28
    stale[bot] closed #1628
  • Jun 19 17:36
    rob-p commented #2177
  • Jun 19 17:21
    pditommaso commented #2177
  • Jun 19 17:16
    rob-p commented #2177
  • Jun 19 17:13
    pditommaso commented #2177
  • Jun 19 17:07
    rob-p commented #2177
  • Jun 19 17:00
    pditommaso commented #2177
  • Jun 19 16:54
    rob-p commented #2177
  • Jun 19 16:53
    pditommaso commented #2177
  • Jun 19 16:50
    rob-p commented #2177
  • Jun 19 16:45

    pditommaso on fix-globstar-pattern

    Add integration test (compare)

  • Jun 19 16:30
    pditommaso commented #2177
  • Jun 19 16:27
    rob-p opened #2177
  • Jun 19 11:37
    pditommaso locked #2175
  • Jun 19 11:37
    pditommaso closed #2175
  • Jun 19 11:36
    pditommaso milestoned #2037
  • Jun 19 11:36
    pditommaso closed #2037
  • Jun 19 11:36
    pditommaso commented #2037
Nicole
@NLELEZ

I am just running one set of paired reads to see how this works, and I can't figure out how to run the pipeline without downloading the reads first.

Do I need to download the reads to my computer first in order to get a 'path to fastq file', in order to create the csv sheet for implementing the pipeline?

The paired sample reads are here from ENA
https://www.ebi.ac.uk/ena/browser/view/SRS8234006

I can't seem to figure out if there is a 'path to file' directly into ENA without downloading the fastq files themselves...if this is even possible?

grateful for any advice. thank you!!
maxulysse
@maxulysse:matrix.org
[m]
@NLELEZ: I'd recommend going directly to the nf-core/rnaseq slack channel: https://nfcore.slack.com/channels/rnaseq (use this link to get an invite if needed: https://nf-co.re/join/slack )
Nicole
@NLELEZ
thank you!
Florian Wuennemann
@FloWuenne
Hi all, I am fairly new to writing nextflow pipelines and DSL2. I am having a process that emits a list of tuples, each with 3 values and I want to connect this output channel, to another process that runs for each tuple in the list. Currently, it only runs the 2nd process for the first tuple in the list. What function do I have to call on the output channel of process 1 to make process 2 run as many times as there are tuples in the list?
1 reply
Juan MONROY-NIETO
@jmonroynieto
Hello, is there a boolean variable for the recent stub mode? For example, I can use workflow.resume to figure out what mode I am running in during execution. I'd be looking for something similar.
3 replies
Hugues Fontenelle
@huguesfontenelle

Hi!
I'd like to have a profile in which I can specify some processes to be skipped:

process.withLabel: 'research' {
  when = false
}

But that doesn't work, since when is a declaration, not a directive.
I can see a workaround, such as declaring a params.research = true and then, in each process, write when: (!params.research) but the logic polutes a bit my processes.
Any suggestion? Thanks!

namiller2015
@namiller2015

Hi All,
Has anyone used the VEP Docker image in Nextflow with additional reference files? The VEP Docker image requires the reference files to be mounted to the image from a local directory. I can do this easily outside of Nextflow but I am not sure how to tell Nextflow to mount a directory when launching the docker image.

I also want to eventually do this in an AWS Batch environment. Right now I'm just doing testing on a single EC2 instance. Thanks!

maxulysse
@maxulysse:matrix.org
[m]
@namiller2015: the way I do it currently (which is not optimal is to have a VEP image for the genome I want to use with the cache already downloaded)
1 reply
I managed to tell Nextflow to use the cache but haven't tried that with AWS
I'm guessing if you have the VEP cache on a specific S3 bucket it should be possible to use it
maxulysse
@maxulysse:matrix.org
[m]
So yes, in sarek we're running VEP via Docker and Singularity with supplied cache or cache from the containers
1 reply
or with Conda with supplied cache
I tried to make it as open as possible so that it could be use with cache if available
I'm currently in the process of DSL2-ising snpEff and VEP
maxulysse
@maxulysse:matrix.org
[m]
the way the container with the cache is set upt, the cache is in the .vep folder
it's a ternary if, so if I have a cache (need both parameters) then it's the cache is in the current directory (it's the cache folder given to Nextflow which is linked in the work directory)
otherwise the cache in the container (in the .vep folder)
t-f-freeman
@t-f-freeman

I'm looking for recommendations on how to combine tuple channels.

I have a process that outputs multiple bam files and their associated sample names as a tuple channel, e.g.: tuple val(names), path(bams) into bams_ch. Viewing the bams channel gives output like:

[sample1, sample1_bwM_sort.bam]
[sample2, sample2_bwM_sort.bam]

My next process compares those two bam files and outputs the comparison as a single output file. Essentially, the command is something like compareBams --input sample1_bwM_sort.bam sample2_bwM_sort.bam --output sample1_sample2_bwM_sort_compare.txt.

What I can't figure out is how to structure the input channel for this bam comparison process so that 1) it brings in both bam files at once so it can compare them; and 2) I can combine the two sample names, separated by '_', to use them as a prefix for the output file.

3 replies
arnaudbore
@arnaudbore
Is there a way to create a channel if another channel is empty ? outside of a process :) Thank you !!
2 replies
Alan Hoyle
@alanhoyle
We want to be able to share config files between multiple similar workflows that are in separate git repos. It looks like git allows this kind of thing with git submodule. Would that work with Nextflow? I see @pditommaso mentioned something reminiscent of this in 2019, but I'm wondering if this would work.
Paolo Di Tommaso
@pditommaso
sure it would work
have a look also to
Sofia Stamouli
@sofiastam_gitlab

Hello,

How can I add native code in a process? I tried exec process block but nothing is printed in the output. For example:

process test {

tag "$name"

input:
set val(name), file (json) from test_ch

exec:
Gson gson = new Gson();
System.out.println("test");

}

I cannot figure out what might go wrong here. Any help?

Paolo Di Tommaso
@pditommaso
without error message is impossible to say
Sofia Stamouli
@sofiastam_gitlab
@pditommaso there is no error message. work directory looks empty, no files like .command.out, .command.log, .command.err etc
9 replies
Asaf Peer
@asafpr
Hi everyone. I'm having trouble with disk space using singularity, any idea how I change the directory in which singularity is trying to write? The default tmp is probably full. Thanks
5 replies
Mayeul Marcadella
@Marcadella_gitlab
Hi, Is there a way to specify two k8s executors? What I am trying to achieve is to define two queue sizes for my k8s executor.
Laurence E. Bernstein
@lebernstein
@Marcadella_gitlab Sure. Just use 2 different profiles.
Matthieu Pichaud
@MatPich_twitter
Hi there,
How do you perform your processings, save the results in a folder, clean the crowded work directory every now and then, and still, resume the analytics and avoid processing the samples that were already processed?
AFAIK when the work directory is deleted, "-resume" cannot work and storeDir is an hazardous option.
4 replies
Bill Welch
@billw1955
From nextflow-21.04.1-all console, getting Failed to find Nextflow Console extension. Do I have to build from source?
2 replies
Laurence E. Bernstein
@lebernstein
I have a process that creates a list of sample names and file sizes to stdout and I want to create a channel with this information in tuples. I am trying to use splitCsv but Nextflow tells me:
"Pipeline failed: Object of class 'groovyx.gpars.dataflow.DataflowVariable' does not support 'splitter' methods"
process get_sample_sizes {
input:
    val samplePath
output:
    stdout emit: sampleSizes
shell:
  '''
    du !{samplePath} | sort -rh | awk -F / '{print $NF","$1}' | sed -e 's/[[:space:]]*$//'
  '''
}

workflow my_workflow {
  main:
    get_sample_sizes( samplePath)

   Channel.from( get_sample_sizes.out.sampleSizes )
                     .splitCsv(header: ['sampleName', 'sampleSize'] )
                     .map{ row-> tuple(row.sampleName, row.sampleSize) }
                     .set { sample_size_ch }
}
@MatPich_twitter I think the answer is.. you don't. If you remove any "work" directories, the resume can not function because you don't have the temporary files left anymore. This is a tradeoff you have to make. Holding on to the work directories for as long as you might want to resume and then clearing them out. The other option is adding your own "resume" functionality to the workflow using publishDir and grabbing previous intermediate results yourself.
Matthieu Pichaud
@MatPich_twitter
Thanks a lot @lebernstein
Julianus Pfeuffer
@jpfeuffer
Hi! Does anyone know, how one can "emit" a multi-channel into two new channels?
ch1.mix(ch2)
  .join(ch3.mix(ch4))
  .multiMap{ it ->
      branch1: it[1]
      branch2: it[2]
  }
  .into{ch_new1; ch_new2}
I was trying the above but it says: "Multi-channel output cannot be applied to operator into for which argument is already provided"
Unfortunately, setonly allows one channel to be set, if I saw that correctly.
Laurence E. Bernstein
@lebernstein
Can a channel be sorted and then output as another channel? Is this done using toSortedList() and then fromList()? Or should I use collect() with groupTuple() and just group them with the same size as before? What is the proper syntax for such a thing?
KyleStiers
@KyleStiers

Does anyone have a clean solution for adding directories to a channel based on the contents of that directory for flag files? I want to only use a channel for input into the pipeline if it does not contain flags for completion (for data retention purposes the directories that are complete must stay there).

Something like this:

Channel.fromPath("params.target_dir/*", type:'dir')
       .filter(!it.contains("complete.flag")
       .tap{jobs_to_run_ch}
2 replies
kaitlinchaung
@kaitlinchaung
Hi! I was wondering if there is anyway to access all the params passed in from a config file?
If I run the pipeline with this command:
nextflow run main.nf -c test.conf
I would like to access all the parameters in test.conf params{} for the creation of a README file, ideally without typing out each parameter.
Thank you!
Laurence E. Bernstein
@lebernstein
@kaitlinchaung All the parameters end up in the nextflow.config file located in the launch directory. You can read that
3 replies
Simon Pearce
@SPPearce
Hi,
I'd like to filter a channel of bam files for those having at least a certain number of reads, to stop downstream processes from giving me an error (not enough reads to call copy number changes, or use ngscheckmate for sample matching).
What is the easiest way to do that in nextflow?
namiller2015
@namiller2015

Hi All,
I'm trying to bind mount a directory when launching a docker image in Nextflow. Based on reading https://www.nextflow.io/docs/latest/config.html
this seems doable using the Scope docker "temp" settings. But I'm having trouble getting it to work. I only want to do this for a specific process.

In my process I'm trying

input:
          path(ref_genome)

 docker.temp = "${ref_genome}"

But this returns an error saying ref_genome is not a variable. Ref_genome is simply a string pointing to a directory in an S3 bucket. Is this not allowed?

Thanks!

Sofia Stamouli
@sofiastam_gitlab

Hello,

I posted before regarding how to use a file from exec process block. I got a bit further but not much.

process test {

tag "$name"

input:
set val(name), file (json_file) from test_ch.view()

exec:
Gson gson = new Gson();   
Reader reader = new FileReader(json_file);
Analysis result = gson.fromJson(reader, Analysis.class);

But I get the error:

Caused by:
  java.lang.IllegalStateException: Expected BEGIN_OBJECT but was BEGIN_ARRAY at line 1 column 2 path $

Source block:
  Gson gson = new Gson();
  Reader reader = new FileReader(json_file);
  Analysresultat analysresultat = gson.fromJson(reader, Analysresultat.class);

This code works fine in a java environment and I cannot figure out what I might do wrong here. I also tried reading the json file as a string but it did not work either. Any help?

Florian Wuennemann
@FloWuenne

I have a quick question regarding running my nextflow pipeline. I have some test data inside the nextflow directory. Now if I wanna run tests with this data and I specify a relative path, I get the following error

nextflow run main.nf -resume \
-with-report "../nextflow_reports/test_report.html" \
-with-timeline "../nextflow_reports/test_timeline.html" \
--grm_plink_input "./test_data/input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly.{bed,bim,fam}" \
--phenoFile "./test_data/input/pheno*.txt" \
--phenoCol "y_binary" \
--covarColList "x1,x2" \
--bgen_filebase "genotype_100markers" \
--bgen_path "./test_data/input" \
--sampleFile "./test_data/input/samplefile_test_input.txt" \
--outdir "../saige_test_out"

 Not a valid path value: ./test_data/input/samplefile_test_input.txt

What's the correct way to define input files with relative file paths?

3 replies
William L. Close
@wclose

I feel like I must be missing something basic about DSL2 syntax. I'm in the process of converting a workflow from DSL -> DSL2 and have run into an issue where processes only use the first item in a channel. I've figured out that it only happens when I have more than one input into the process and removing the additional inputs (e.g., a val) allows the process to use all items in the channel. I'm using v21.04.0 (installed through conda) and here's a super simple minimum working example that reproduces the issue on my machine:

test.nf

process TEST {
    input:
    tuple val(id), val(other)
    val test // <- commenting this out and the corresponding line in main.nf executes the process twice

    output:
    path "test.txt", emit: test

    script:
    """
    touch test.txt
    """
}

main.nf

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2


include { TEST } from './test.nf'

ch_test = Channel.from( ["abc", "123"], ["def", "456"])

workflow {
    TEST ( 
        ch_test,
        Channel.from( "iii" )
    )
    TEST.out.view()
}

Running it

nextflow run main.nf

Thanks in advance for any help!

1 reply
Thomas A. Christensen II
@MillironX
Anyone here have advice on how best to use Julia packages in a pipeline? Right now I'm using the julia:alpine docker container and inlining using Pkg; Pkg.add("..."), but I feel like there must be a better way.