by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 16:22
    iferres opened #1702
  • Aug 10 17:30
    Puumanamana commented #1540
  • Aug 08 23:36
    stale[bot] labeled #1475
  • Aug 08 23:36
    stale[bot] commented #1475
  • Aug 07 22:53
    birnbera opened #1701
  • Aug 07 18:49
    poquirion commented #842
  • Aug 07 05:38
    charles-plessy commented #1697
  • Aug 07 02:29
    abhi18av commented #1699
  • Aug 07 01:13
    nongbaoting closed #1698
  • Aug 07 00:52
    illusional edited #1694
  • Aug 06 20:19
    krokicki commented #1695
  • Aug 06 19:49
    pditommaso closed #1699
  • Aug 06 19:49
    pditommaso commented #1699
  • Aug 06 18:07
    sstadick opened #1700
  • Aug 06 17:59
    abhi18av edited #1699
  • Aug 06 17:59
    abhi18av opened #1699
  • Aug 06 15:04
    pditommaso commented #1698
  • Aug 06 15:04
    nongbaoting edited #1698
  • Aug 06 15:04
    nongbaoting edited #1698
  • Aug 06 15:03
    nongbaoting edited #1698
Karin Lagesen
@karinlag
I suppose this is a common thing to do, hence my wondering if there is a pattern.
Luca Cozzuto
@lucacozzuto
Hi @karinlag you can join add the value of this channel to the one you have for linking it in every folder, no?
Karin Lagesen
@karinlag
@lucacozzuto not sure I'm following you here?
Luca Cozzuto
@lucacozzuto
maybe I did not get it right :) Can you make an example?
I understood you have several read sets in a channel and you want to use a folder repeatedly when analyzing the read sets in a process?
Karin Lagesen
@karinlag
this is analog to the mapping situation
I have a database that I want several read sets to be mapped against
so. I am getting the db filename in as a params.
I then create a value channel (I think I should anyhow)
in this case
vir_db = Channel.value(params.db_filename)
I then think I should do the following inside the process:
file virulence_db from vir_db
so I can do the following inside my script session
command $virulence_db $read_set
I am just wondering if this is the way to go about it
7 replies
Karin Lagesen
@karinlag
OK, so there is conseptually something I am not getting here
consider this example:
amr_db = '/cluster/projects/nn9305k/db_flatfiles/specific_genes_bifrost/amr/card_db'
reads = '../testreads/*R{1,2}.fastq.gz'

amr_val_ch = Channel.value(amr_db)
                    .view()
reads_ch = Channel.fromFilePairs(reads)
                  .view()

process test {

    input:
    file amr_db_dir from amr_val_ch
    tuple val(sample_id), file(reads) from reads_ch

    """
    echo testcommd ${amr_db_dir} ${reads}
    """

}
this is my output:
(base) [karinlag@login-1.SAGA ~/tmp/nf_test]$ nextflow run vartest.nf -process.echo
N E X T F L O W  ~  version 20.07.1
Launching `vartest.nf` [condescending_lamarr] - revision: 2c2cbb804d
/cluster/projects/nn9305k/db_flatfiles/specific_genes_bifrost/amr/card_db
[mutdup, [/cluster/home/karinlag/tmp/testreads/mutdup_R1.fastq.gz, /cluster/home/karinlag/tmp/testreads/mutdup_R2.fastq.gz]]
[mutant, [/cluster/home/karinlag/tmp/testreads/mutant_R1.fastq.gz, /cluster/home/karinlag/tmp/testreads/mutant_R2.fastq.gz]]
executor >  local (2)
[53/d2037c] process > test (1) [100%] 2 of 2 ✔
testcommd input.1 mutant_R1.fastq.gz mutant_R2.fastq.gz

testcommd input.1 mutdup_R1.fastq.gz mutdup_R2.fastq.gz
also, I get a tmp directory inside of my work directory, and inside of there I have a work directory structure (ab/hexnumber) with a file named input.1
4 replies
Karin Lagesen
@karinlag
that one again contains the text string that I have for my amr_db
2 replies
(aka, I am confused)
Abhinav Sharma
@abhi18av

Hello everyone,

I am trying to download the genomes from NCBI using the accension ID - ERR776668 and I've run into the following error

Illegal character in authority at index 6: ftp://err776668      ftp.sra.ebi.ac.uk

Does anyone have a clue or pointer?

Konrad Rokicki
@krokicki
Hi everyone, I just wanted to follow up on my issue nextflow-io/nextflow#1695. Since support for executable containers was removed, it appears that there's no way to execute a container's entrypoint without digging up its Dockerfile and copy/pasting the entrypoint command into your nextflow pipeline. We have a lot of containers, essentially one per pipeline step, and this workflow (plus maintaining the entrypoint in two places) is less than ideal. Does anyone have a workaround or better approach to this? We'd like to treat the containers as black boxes as much as possible, and I'm wondering if that's something anyone has grappled with in the context of Nextflow.
3 replies
Jonathan Oribello
@J-81
@karinlag
amr_db = 'tmp/db.fake'
reads = 'tmp/testreads/*R{1,2}.fastq.gz'

amr_val_ch = Channel.fromPath(amr_db)
                    .view()
reads_ch = Channel.fromFilePairs(reads)
                  .view()

process test {
    echo true

    input:
    file amr_db_dir from amr_val_ch
    tuple val(sample_id), file(reads) from reads_ch

    """
    echo testcommd ${amr_db_dir} ${reads}
    """

}
Changed Channel.value to Channel.fromPath
output is
N E X T F L O W  ~  version 20.07.1
Launching `test.nf` [happy_woese] - revision: 3b0e4e4880
/home/joribello/Documents/OneDrive/Research/Spring_2020_Research/dataset_prep_nf/tmp/db.fake
[fakeRead, [/home/joribello/Documents/OneDrive/Research/Spring_2020_Research/dataset_prep_nf/tmp/testreads/fakeRead_R1.fastq.gz, /home/joribello/Documents/OneDri
ve/Research/Spring_2020_Research/dataset_prep_nf/tmp/testreads/fakeRead_R2.fastq.gz]]
executor >  local (1)
[f2/0b2a81] process > test (1) [100%] 1 of 1 ✔
testcommd db.fake fakeRead_R1.fastq.gz fakeRead_R2.fastq.gz
9 replies
Richard Corbett
@RichardCorbett
Hi folks. Can anyone point me to an example where a single singularity container is used and contains multiple tools that are each used in their own processes? Also, Is it possible to access data files that are packaged in a container as part of a process?
Felix Thalén
@fethalen_gitlab
What should I use for Bash evaluation (backticks or $( … )) in Bash scripts when using Nextflow?
2 replies
Daniel E Cook
@danielecook
I put together some slides for my lab on nextflow - would be great to get feedback or if anyone has done something similar would be good to collect these somewhere
1 reply
nextflow.pdf
THis is the first half...still need to do the second
Paolo Di Tommaso
@pditommaso
Very nice this tutorial, I really appreciated it. A few tips:
  • Modify Channels => I prefer saying transform channels, also make it clear that an operator takes a channel and its result is a new one (the result of the transformation)
  • slide 30 (and following), file conversion in the operator is not needed any more provided you are using path instead of file in the process and the file path is absolute
  • slide 33, instead of view consider dump operator
  • slide 45, clarify that's (objects) dataflow streaming, not byte-level stream however logical parallelization is the same (!)
Rad Suchecki
@rsuchecki

Really clear & comprehensive @danielecook! Some really minor points:

  • slide 35 - I'd add an arrow from path(dna) in input to $dna in the script block to make the distinction between between NF and bash variables even clearer.
  • slide 36 - possibly arrow again to !{dna}?
  • slides 36/37 - consistency !{dna} vs !dna?

I'd also consider introducing nextflow.config perhaps by defining a couple of execution profiles - good opportunity to demonstrate an easy switch to singularity/docker.

Sean Johnson
@srjohnson_gitlab
I'm very confused by the lack of PublishDir in DSL2. How are we supposed to get data out of the work directory?
I've tried using "copyTo", but I get errors. like "Unknown method invocation copyTo on UnixPath type"
or: Unknown process directive: _out_file
2 replies
Soumitra Pal
@soumitrakp_gitlab
Feature request: Extend nextflow by supporting indexed channels and inner-join of channels in processes instead of the current implementation using cartesian products.
1 reply
m93
@mmeier93

I have a question about using Nextflow channels vs Groovy x= file() syntax when working with files in Nextflow. From what I understand, if we have a params.input="$baseDir/test-data/sample.txt" (i.e. a single file)

Option 1) Using Groovy: data = file(params.input) produces an object of class sun.nio.fs.UnixPath

If you later want to use the file in a process, you can do:

process testA {

    input:
    file x from data

    output:
    file '*' into result_ch

    script:
    """
    program -options  blah blah $x 
    """
}

Option 2) Making a channel such as below also produces an object of class sun.nio.fs.UnixPath

Channel
    .fromPath(params.input)
    .set {data_ch}`

If you later want to use the file in a process, you can do:

process testA {

    input:
    file x from data_ch

    output:
    file '*' into result_ch

    script:
    """
   program -options  blah blah $x 
    """
}

So my question is: why chose one option over the over, if both options end up with Nextflow "understanding" you have a file object you want to use for a process? Should you not be making a Nextflow channel if you have a single file?

I feel I am missing something here. I know that using option 1) can be useful if you want to manipulate a file, a file name (or its path) later on in a process, prior to the script: string command:

process testB {

    input:
    tuple val(chromosome), val(sample), file("sample.bed") from channel_ch
    file(data)

    output:
    tuple val(chromosome), val(sample), file("sample.log")

    script:
    ref1 = file(data + ref_path.name +  "/" + sprintf(params.refHapFilesPattern, chromosome))

    """
    program -options  $samples.bed $ref1 \\
    """
}

I guess what I'm asking, more broadly is:

  • Is there a general situation (s) where should you be using Groovy syntax rather than a Nextflow channel
  • Is there a general situation where one should use Groovy variables prior to the """ part of the process "script:".

Thanks

Paolo Di Tommaso
@pditommaso
the advantage of channel.fromPath is that creates a channel trigger one or more process executions, depending the matching files
the file approach does not allow you to control the execution cardinality of the process
m93
@mmeier93

Ok thanks, I think I understand. You mean that the channel will be able to accept or more files, depending on a matching pattern. As opposed to file() which matches just one file.

I guess more broadly speaking, I was unsure about file() because I have seen that syntax and/or just Groovy variables being used/manipulated prior to the command string in a process "script:".

I.e. I noticed various processes in nf-core/rnaseq using Groovy variables inside the "script" part like so (see below). This syntax surprised me at first but I guess the idea is that you are using Groovy variable in the scope a specific process so you can dynamically change it between different process instances?

* PREPROCESSING - Build STAR index
 */
if (!params.skipAlignment) {
  if (params.aligner == 'star' && !params.star_index && params.fasta) {
      process makeSTARindex {
          label 'high_memory'
          tag "$fasta"
          publishDir path: { params.saveReference ? "${params.outdir}/reference_genome" : params.outdir },
                     saveAs: { params.saveReference ? it : null }, mode: 'copy'

          input:
          file fasta from ch_fasta_for_star_index
          file gtf from gtf_makeSTARindex

          output:
          file "star" into star_index

          script:
          def avail_mem = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : ''
          """
          mkdir star
          STAR \\
              --runMode genomeGenerate \\
              --runThreadN ${task.cpus} \\
              --sjdbGTFfile $gtf \\
              --genomeDir star/ \\
              --genomeFastaFiles $fasta \\
              $avail_mem
          """
      }
  }
Paolo Di Tommaso
@pditommaso
yes, after script: you can use any valid groovy code, the important thing is that the finally it evaluates to a string, that's the command to be executed
m93
@mmeier93

Ok great thanks very much!

I guess the only thing to be careful about then, is that if I used the Groovy file()rather than a channel to obtain a file, I still need to specify that file in the input: of my process , before using it in the script:. Like so:

// Obtain file
data = file(params.input) 

// Use file as one of the inputs of a process and manipulate it in the  "script:" part
process testB {

    input:
    tuple val(chromosome), val(sample), file("sample.bed") from channel_ch
    file(data) // This is necessary if I want to use "data" after "script:"

    output:
    tuple val(chromosome), val(sample), file("sample.log")

    script:
    // Here I can then manipulate "data"
    ref1 = file(data + ref_path.name +  "/" + sprintf(params.Pattern, chromosome))

    """
    program -options  $samples.bed $ref1 \\
    """
}
Paolo Di Tommaso
@pditommaso
file manipulation in the process should be avoided, since the task take care of staging/mounting the input files, the ref1 in your example may not be accessible
m93
@mmeier93
Ah yes I see, I understand. Thanks very much!
paulderaadt
@paulderaadt
is it possible to alter .join(failOnMismatch: true) 's error message?