These are chat archives for nextflow-io/nextflow

17th
Aug 2018
spaceturtle
@spaceturtle
Aug 17 2018 02:29
Hi, I wonder what would be the best way to debug nextflow script?
Maxime Garcia
@MaxUlysse
Aug 17 2018 07:40
@spaceturtle It depends on how buggy your script is
tbugfinder
@tbugfinder
Aug 17 2018 07:49
@spaceturtle you could start with tracing or .nextflow.log and depending on your script inside nextflow, use the scripts features (bash, python, etc.).
Sander Bervoets
@Biocentric
Aug 17 2018 07:52
@tbugfinder yeah I checked. I increased the memory for salmon quant to 16gb, and that amount appears in the Batch dashboard: vCPUs 8 Memory 16384 MiB. It took forever before the job got allocated and I had to restart the system before that point, so I cant say yet if it works or not. I'll get back to you
spaceturtle
@spaceturtle
Aug 17 2018 08:43
@MaxUlysse @tbugfinder Thanks for the answers.
Hi all, another question: I am now mergeing serveral bam files into one file. When I do that, I need to send the output file into output channel several times but only one time is needed. How to avoid that?
Karin Lagesen
@karinlag
Aug 17 2018 08:53
@spaceturtle how are you merging files?
spaceturtle
@spaceturtle
Aug 17 2018 09:01
@karinlag I built a pipeline to generate bam files from multiple lane data of one sample. For each lane, I generated one bam file. Then I want to merge all bam files from the same sample into one.
tbugfinder
@tbugfinder
Aug 17 2018 09:28
@spaceturtle As far as I understood your question, you're looking for .collect () https://www.nextflow.io/docs/latest/operator.html#operator-collect
Karin Lagesen
@karinlag
Aug 17 2018 10:41
@spaceturtle are you merging them using something like picard mergesamfiles, or another way?
Have to admit that when you said merge, that is what I would think of :)
Rad Suchecki
@rsuchecki
Aug 17 2018 12:17
@alperyilmaz You might have solved your issues with aws by now, but if not, you could adapt this note that this is for AWS EC2, not AWS batch
Rad Suchecki
@rsuchecki
Aug 17 2018 12:43
@LukeGoodsell as long as you don't use glob matching of file names when declaring the outputs, declaring the output file(s) is all you need. No need to rename. Whether this is just a file or file is part of a set.
process blah {
  input:
    file x from ch1

  output:
     file x into ch2
}
LukeGoodsell
@LukeGoodsell
Aug 17 2018 12:55

Thanks, @rsuchecki. It seems ‘rellink’ stageInMode doesn’t play nice with ‘move’ stageOutMode:

#!/usr/bin/env nextflow

in_ch = Channel.from(file("test.txt"))

process test {
    storeDir 'out'
    stageInMode 'rellink'
    stageOutMode 'move'

    input:
    file x from in_ch

    output:
    file x into out_ch

    shell:
    '''
    echo "hi"
    '''
}

out_ch.subscribe { println(it) }

Output:

$ rm -rf out; ./test.nf 
N E X T F L O W  ~  version 0.31.0-SNAPSHOT
Launching `./test.nf` [dreamy_cajal] - revision: 6084d3c68f
[warm up] executor > local
[b3/ac3e30] Submitted process > test (1)
ERROR ~ Error executing process > 'test (1)'

Caused by:
  Missing output file(s) `test.txt` expected by process `test (1)`

Command executed:

  echo "hi"

Command exit status:
  0

Command output:
  hi

Work dir:
  /pipeline_runs/volume_6/l.goodsell/tmp/nf_channel/work/b3/ac3e309a420253f0a74c0b89d3ed7f

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Rad Suchecki
@rsuchecki
Aug 17 2018 13:11
Curious why you want to adjust default stageIn/Out modes?
Rad Suchecki
@rsuchecki
Aug 17 2018 13:25
I think StageOutMode only applies is you use scratch true.
Alper Yilmaz
@alperyilmaz
Aug 17 2018 14:11
@rsuchecki, thanks for the link, I was trying another instruction from @apeltzer , using AWS Batch. If it fails, I'll test the instructions at the link you provided.. thanks..
LukeGoodsell
@LukeGoodsell
Aug 17 2018 14:36
@rsuchecki : I have very large output files; copying them wastes time and space. However, I’ve found that if the output is declared as a val instead of a file, but the downstream process redeclares it as a file, I can pass the inputs into the output channel without duplicating the file, and while working with both ‘rellink’ stageInMode and ‘move’ stageOutMode. This also prevents the input file from being stored to the new output directory, which is nice. See https://gist.github.com/LukeGoodsell/c26232093df7f95f045508cb49f7357e
Mike Smoot
@mes5k
Aug 17 2018 18:17
Hi @pditommaso just letting you know that I was able to successfully run a nextflow pipeline using the k8s executor in an AWS EKS cluster with an EFS partition and the efs-provisioner. The only real hiccup I ran into was that nextflow kuberun expects a directory name with my username to exist in the persistent volume claim. I had to create this dir manually. Is there a way to avoid this? It's not a big deal because I expect that if we go this route for production I'll run from within a pod.