These are chat archives for nextflow-io/nextflow

21st
May 2019
Paolo Di Tommaso
@pditommaso
May 21 05:59
I guess some mess in the container env, make sure it uses bash (not sh) and has ps installed
Alaa Badredine
@AlaaBadredine_twitter
May 21 13:45
Is there a way to get the directory path of Nextflow script and script name ?For example, we have nextflow run /media/Nextflow/scripts/AwesomeScript.nf. I would like to have /media/Nextflow/scripts/ and AwesomeScript.nf
is there a function that allows to do this ?
Anthony Ferrari
@af8
May 21 14:11
@pditommaso it was indeed missing bash functionalities in the container. Thanks
micans
@micans
May 21 15:41

It seems I can do this in a config scope:

v_samtools = '1.9'
v_hisat2   = '2.1.0'

process {
  withName: crams_to_fastq  { conda = "bioconda::samtools=$v_samtools"      }
  withName: star            { conda = "bioconda::star=2.5.4a bioconda::samtools=$v_samtools" }
  withName: hisat2_align    { conda = "bioconda::hisat2=$v_hisat2"              }
  withName: hisat2_sort     { conda = "bioconda::hisat2=$v_hisat2 bioconda::samtools=$v_samtools" }
}

which is nice. I wonder -- where do the variables v_samtools and v_hisat2 live? Would I be able to access them in main.nf? This question is purely hypothetical, there is no use I can think of right now.

Paolo Di Tommaso
@pditommaso
May 21 17:19
only in the config file
micans
@micans
May 21 17:25
Cool. This was designed as a feature I suppose!
Paolo Di Tommaso
@pditommaso
May 21 17:34
of course :)
micans
@micans
May 21 17:34
:bow:
Olga Botvinnik
@olgabot
May 21 17:46
Hi sorry if this has been asked before but I'm unable to find it. How can one add the option to trim or not trim the reads? I allow for several different possible fastq/fasta inputs and concatenate them, but then I want to add the option of whether or not to trim the reads. Right now, this is what this looks like:
// Concatenate all possible input channels
 sra_ch.concat(samples_ch, read_pairs_ch, fastas_ch)
  .into{ read_files_fastqc; read_files_trimming }


  process fastp {
      tag "$name"
      publishDir "${params.outdir}/fastp", mode: 'copy',
          saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"}

      input:
      set val(name), file(reads) from read_files_trimming

      output:
      file "*_fastp.{zip,html}" into fastp_results
      val name into sample_ids
      file "${name}_R1_fastp_trimmed.fastq.gz" into read1_trimmed
      file "${name}_R2_fastp_trimmed.fastq.gz" into read2_trimmed

      script:
      read1 = reads[0]
      read2 = reads[1]
      """
      fastp --in1 $read1 --in2 $read2 \
        --length_required ${params.minlength} \
        --thread ${task.cpus} \
        --overrepresentation_analysis \
        --out1 ${name}_R1_fastp_trimmed.fastq.gz \
        --out2 ${name}_R2_fastp_trimmed.fastq.gz \
        -h ${name}_fastp.html \
        -j ${name}_fastp.json
      """
  }
}

process sourmash_compute_sketch {
    tag "${sample_id}_${sketch_id}"
    publishDir "${params.outdir}/sketches", mode: 'copy'

    // If job fails, try again with more memory
    // memory { 8.GB * task.attempt }
    errorStrategy 'retry'
  maxRetries 3

    input:
    each ksize from ksizes
    each molecule from molecules
    set sample_id from sample_ids.collect()
  set read1 from read1_trimmed.collect()
  set read2 from read2_trimmed.collect()
// truncated
It seems that I may want to use mix or similar to combine read_files_trimming and some variable like read_files_trimmed but I can't figure out how to output sample_id, tuple(read1, read2) from fastp to do this
Olga Botvinnik
@olgabot
May 21 21:19
I thought something like this could work:
if (!params.no_trimming) {
  /*
   * STEP 2 - trim reads - Fastp
   */
  process fastp {
      tag "$name"
      publishDir "${params.outdir}/fastp", mode: 'copy',
          saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"}

      input:
      set val(name), file(reads) from read_files_trimming

      output:
      file "*_fastp.{zip,html}" into fastp_results
      val name into sample_ids
      file "${name}_R1_fastp_trimmed.fastq.gz" into read1_trimmed
      file "${name}_R2_fastp_trimmed.fastq.gz" into read2_trimmed

      script:
      read1 = reads[0]
      read2 = reads[1]
      """
      fastp --in1 $read1 --in2 $read2 \
        --length_required ${params.minlength} \
        --thread ${task.cpus} \
        --overrepresentation_analysis \
        --out1 ${name}_R1_fastp_trimmed.fastq.gz \
        --out2 ${name}_R2_fastp_trimmed.fastq.gz \
        -h ${name}_fastp.html \
        -j ${name}_fastp.json
      """
  }
} else {
  name, reads = read_files_trimming
  read1_trimmed = reads[0]
  read2_trimmed = reads[1]
  sample_id = name
}
micans
@micans
May 21 21:48
Laurence E. Bernstein
@lebernstein
May 21 21:57
Suddenly I realized my workflow is not behaving properly at all and I know why.
The issue has been discussed here but I wanted to know what is the simple/best/proper solution.
I have a normal workflow with processes A --> B then A + B --> C. (C uses the input from A and B). When I run multiple samples in parallel there is no guarantee that when I run C I will get matching inputs from A and B, even though B requires A because channels are filled in order of when they happen to run. Therefore I get incorrect and non-deterministic results from C!
What is the proper way to handle this? I know they are all supposed to have the same sample name.
Olga Botvinnik
@olgabot
May 21 22:22
@micans thank you! that looks helpful. I'm having trouble understanding what is happening on lines 6-8. Is foo_ch getting set as empty if params.skip, otherwise as input_ch, and then if params.skip then bar_ch gets set as input_ch otherwise it is empty?
micans
@micans
May 21 22:24
@olgabot yes that's it!
Olga Botvinnik
@olgabot
May 21 22:24
thanks @micans !
can you help me understand how to output a tuple of tuple(name, tuple(read1, read2) in the output?
micans
@micans
May 21 22:27
@olgabot I use something like this: set val(samplename), file("*[12]_fastp_trimmed.fastq.gz") into some_channel
where you accept it you have set val(samplename), file(fqs) ... in the script you can either use $fqs (this expands to two file names), or ${fqs[0]} ${fqs[1]}. Perhaps some care/check is needed to make sure the files are in the same order. But if the naming is consistent ordering should be lexicographic and it should be fine.
Olga Botvinnik
@olgabot
May 21 22:35

@olgabot I use something like this: set val(samplename), file("*[12]_fastp_trimmed.fastq.gz") into some_channel

!!!! what??!? I didn't know you could do that!

micans
@micans
May 21 22:42
It's very handy :-)
micans
@micans
May 21 23:13
@lebernstein you need https://www.nextflow.io/docs/latest/operator.html#join.
I've made a small script, as I wanted a pattern to describe this. It's a bit contrived in how it sets up the example data, but hopefully it illustrates the point(and I got it right):
#!/usr/bin/env nextflow
// This script mimics merging results by sample ID in the following scenario:
// A  --ch1---> B  ----ch3---,-----ch4--- C
//  `---------ch2------------'

process processA {        // Create a bunch of files, each with its ow sample ID.
  output: set val('dummy'), file('*.txt') into ch_dummy
  script: 'for i in {1..7}; do echo "sample_$i" > f$i.txt; done'
}
// above and below use transpose trick to serialise the files into two channels,
// just so that we have some example data.
ch_dummy.transpose().map { dummy, f -> [f.text.trim(), f] }.view().into { ch1; ch2 }

process processB {
  input:  set val(sampleid), file(thefile) from ch2
  output: set val(sampleid), file('out.txt') into ch3
  script: "(cat $thefile; md5sum $thefile) > out.txt"
}

ch1.join(ch3).set{ ch4 }

process processC {
  input: set val(sampleid), file(a), file(b) from ch4.view()
  script: "echo $sampleid $a $b"
}