These are chat archives for nextflow-io/nextflow

25th
Sep 2018
misssoft
@misssoft
Sep 25 2018 11:28
docker@ubuntu-docker:~/Nextflow/test_azure$ nextflow kuberun https://github.com/oxfordmmm/rnaseq-nf -v "azurefile5t:/mnt/nextflow" -profile azurekube
Launcher pod spec file: .nextflow.pod.yaml
Pod started: shrivelled-pauling
N E X T F L O W  ~  version 0.31.1
Launching `oxfordmmm/rnaseq-nf` [shrivelled-pauling] - revision: 2cd0c74e93 [master]
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /mnt/nextflow/projects/oxfordmmm/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /mnt/nextflow/projects/oxfordmmm/rnaseq-nf/data/ggal/*_{1,2}.fq
 outdir       : results

[warm up] executor > k8s
WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: /mnt/nextflow/docker/work/singularity
Pulling Singularity image docker://nextflow/rnaseq-nf@sha256:e221e2511abb89a0cf8c32f6cd9b125fbfeb7f7c386a1f49299f48d7735faacd [cache /mnt/nextflow/docker/work/singularity/nextflow-rnaseq-nf@sha256-e221e2511abb89a0cf8c32f6cd9b125fbfeb7f7c386a1f49299f48d7735faacd.img]
ERROR ~ Error executing process > 'fastqc (FASTQC on ggal_liver)'

Caused by:
  Failed to pull singularity image
  command: singularity pull --name nextflow-rnaseq-nf@sha256-e221e2511abb89a0cf8c32f6cd9b125fbfeb7f7c386a1f49299f48d7735faacd.img docker://nextflow/rnaseq-nf@sha256:e221e2511abb89a0cf8c32f6cd9b125fbfeb7f7c386a1f49299f48d7735faacd > /dev/null
  status : 127
  message:
    bash: singularity: command not found



 -- Check '.nextflow.log' file for details
Oops .. something went wrong
Hi @pditommaso , I made some progress on the kuberun in Azure, but got above error, the PVC has no problem, it has all folders created, like projects,work etc...but singularity is not there, should it be there?
Paolo Di Tommaso
@pditommaso
Sep 25 2018 11:35
Nope, K8s docker containers
You should remove singularity.enabled flag you put in the azure profile
Anthony Underwood
@aunderwo
Sep 25 2018 11:51

Hi. So close to having AWS batch fully working :) However there are a couple of issues:
1) most of my processes are completing successfully from Nextflow's perspective but in batch they are marked as failed since .command.out, .command.err can't be written
2) more importantly one process using SPAdes assembler is failing even though it works OK locally with Docker

I think both of these may be due to scratch space. Should I set this to 'auto'??

The logs are

Sep-25 04:06:15.277 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'spades_assembly (ERR668456)' -- Cause: java.nio.file.NoSuchFileException: /
tmp/temp-s3-16680929744549340266/.command.log
Sep-25 04:06:15.281 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'spades_assembly (ERR668456)'

Caused by:
  Process `spades_assembly (ERR668456)` terminated with an error exit status (1)

Command executed:

  spades.py --pe1-1 ERR668456.notCombined_1.fastq.gz --pe1-2 ERR668456.notCombined_2.fastq.gz --pe1-m ERR668456.extendedFrags.fastq.gz --only-assembler  -o . --tmp-dir /tmp/ERR66
8456_assembly -k 21,33,43,53,63,75 --threads 1

Command exit status:
  1

Command output:


  ===== Assembling started.


  == Running assembler: K21

    0:00:00.000     4M / 4M    INFO    General                 (main.cpp                  :  74)   Loaded config from /tmp/nxf.Rufsb7IWQ9/K21/configs/config.info
    0:00:00.000     4M / 4M    INFO    General                 (memory_limit.cpp          :  49)   Memory limit set to 7 Gb
    0:00:00.001     4M / 4M    INFO    General                 (main.cpp                  :  87)   Starting SPAdes, built from N/A, git revision N/A
 ........

    0:00:18.871     4M / 260M  INFO    General                 (file_limit.hpp            :  32)   Open file limit set to 1024
    0:00:18.872     4M / 260M  INFO    General                 (kmer_splitters.hpp        :  89)   Memory available for splitting buffers: 2.33203 Gb
    0:00:18.872     4M / 260M  INFO    General                 (kmer_splitters.hpp        :  97)   Using cell size of 67108864


  == Error ==  system call for: "['/home/bio/.linuxbrew/Cellar/spades/3.12.0/bin/spades-core', '/tmp/nxf.Rufsb7IWQ9/K21/configs/config.info']" finished abnormally, err code: -9
Paolo Di Tommaso
@pditommaso
Sep 25 2018 11:57
Should I set this to 'auto'??
what do you mean ?
Anthony Underwood
@aunderwo
Sep 25 2018 12:00
From the docker scope docs
temp    Mounts a path of your choice as the /tmp directory in the container. Use the special value auto to create a temporary directory each time a container is created.
Paolo Di Tommaso
@pditommaso
Sep 25 2018 12:00
(sorry busy)
Anthony Underwood
@aunderwo
Sep 25 2018 12:00
No worries
Little by little - will get there
Paolo Di Tommaso
@pditommaso
Sep 25 2018 12:42
@aunderwo docker scope is not used with Batch executor
they are marked as failed since .command.out, .command.err can't be written
this is a signal still something wrong with IAM settings
SPAdes assembler is failing even though
hard to say, very strange the reported error, you may try to google for it or report directly to SPAdes author
Alexander Peltzer
@apeltzer
Sep 25 2018 14:11

Maybe somebody has an idea on that one: Trying to group samples based on their prefix basically. So all samples have shared IDs, but were sequenced on different lanes of a sequencer. I have for example something like:

I18R036a16_01_S16_L002_R1_001.fastq.gz
I18R036a16_01_S16_L002_R1_001.fastq.gz
I18R036a14_01_S14_L001_R1_001.fastq.gz
I18R036a14_01_S14_L002_R1_001.fastq.gz

My intent was to have a function to extract the start until the "L" to have the unique ID that is independent of sequencing lane, use this function in a map call on my input channel containing all samples and then run groupTuple operator on this to get all samples with shared prefix grouped up. Seems not to work as expected, but I'm probably just missing something ;-)

Function is here:
def extract_lanes(ArrayList name, String regex) {
  return name.findAll { it ==~ regex }
}
and this is the mapping phase
if("$params.mergeLanes"){
    raw_reads_fastqc
    .map{ it -> [ extract_lanes(it, params.mergeRegex), it ] }
    .groupTuple()
    .set { raw_grouped_fastqs }
}
Alexander Peltzer
@apeltzer
Sep 25 2018 14:17
Seems like it groups them, but doesnt group them based on the regular expression ("^.*?(?=_L)") ... so I end up having all individual samples in a separate group
e.g.
FastQs to process: [] [[I18R036a16_01_S16_L002_R1_001, [/path/RAW/I18R036a16_01_S16_L002_R1_001.fastq.gz]], [I18R036a14_01_S14_L001_R1_001, [/path/RAW/I18R036a14_01_S14_L001_R1_001.fastq.gz]], (and so on)
Paolo Di Tommaso
@pditommaso
Sep 25 2018 14:31
how is defined extract_lanes ?
Alexander Peltzer
@apeltzer
Sep 25 2018 14:32
Function is here:
def extract_lanes(ArrayList name, String regex) {
  return name.findAll { it ==~ regex }
}
Paolo Di Tommaso
@pditommaso
Sep 25 2018 14:33
and how is defined raw_reads_fastqc ?
Alexander Peltzer
@apeltzer
Sep 25 2018 14:35
(I ran with --singleEnd option found here : https://github.com/apeltzer/RNAseq/blob/898676c9c69a2d17c4b367fffb05e257a58e1025/main.nf#L231
/*
 * Create a channel for input read files
 */
...
        Channel
            .from(params.readPaths)
            .map { row -> [ row[0], [file(row[1][0])]] }
            .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" }
            .into { raw_reads_fastqc; raw_reads_trimgalore }
...
(removed a couple of lines, as they perform the input solving for PE data
Paolo Di Tommaso
@pditommaso
Sep 25 2018 14:37
some structure is not matching
each is defined as
[ row[0], [file(row[1][0])]]
that is List[ String, List[ File ] ]
then you have
    raw_reads_fastqc
    .map{ it -> [ extract_lanes(it, params.mergeRegex), it ] }
and to extract_lanes you are passing it that is List[ String, List[ File ] ]
but I guess you want to work in the first element of the pair, right ?
or the file(s) ?
Alexander Peltzer
@apeltzer
Sep 25 2018 14:45
In the first element would be enough I guess
So I can extract all files with same prefix that should merge them together
Paolo Di Tommaso
@pditommaso
Sep 25 2018 14:49
if you need to check the first it would be it[0] ==~ params.mergeRegex
if you to work on files files.collect { it.name }.findAll { it ==~ regex }
Alexander Peltzer
@apeltzer
Sep 25 2018 15:27
Awesome - will try that out !
misssoft
@misssoft
Sep 25 2018 17:04
@pditommaso just a quick update, that test pipeline in Azure k8s is working, in summary, I need to create the folder of /mnt/nextflow/projects and /mnt/nextflow/docker, so the nextflow pod can be deployed successfully.
docker@ubuntu-docker:~/Nextflow/test_azurekube$ nextflow kuberun https://github.com/oxfordmmm/rnaseq-nf -v "azurefile5t:/mnt/nextflow" -profile azurekube
Launcher pod spec file: .nextflow.pod.yaml
Pod started: sad-easley
N E X T F L O W  ~  version 0.31.1
Pulling oxfordmmm/rnaseq-nf ...
 downloaded from https://github.com/oxfordmmm/rnaseq-nf.git
Launching `oxfordmmm/rnaseq-nf` [sad-easley] - revision: d3a5ebc6cc [master]
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /mnt/nextflow/projects/oxfordmmm/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /mnt/nextflow/projects/oxfordmmm/rnaseq-nf/data/ggal/*_{1,2}.fq
 outdir       : results

[warm up] executor > k8s
[c0/48f540] Submitted process > fastqc (FASTQC on ggal_liver)
[b2/3a7c0d] Submitted process > index (ggal_1_48850000_49020000)
[38/991fe9] Submitted process > fastqc (FASTQC on ggal_gut)
[a7/45869e] Submitted process > quant (ggal_gut)
[1e/af9d6f] Submitted process > quant (ggal_liver)
[96/d97b50] Submitted process > multiqc

Done! Open the following report in your browser --> results/multiqc_report.html
Thanks again for the support!
Paolo Di Tommaso
@pditommaso
Sep 25 2018 17:26
Nice! If you are willing to write a quick tutorial to share, it would be a great contribution for the community
Yasset Perez-Riverol
@ypriverol
Sep 25 2018 17:31
Hi all I’m building a community of nextflow workflows that can reuse Pride ( proteomics database ) . If you are interested or have proteomics workflows that one to run for large scale database. Just let us known. We will be able to put the data and infrastructure to run all the wfs.
misssoft
@misssoft
Sep 25 2018 18:30
Yes, @pditommaso, definitely happy to share ....will let you know when complete.
Alexander Peltzer
@apeltzer
Sep 25 2018 18:33

Hi all I’m building a community of nextflow workflows that can reuse Pride ( proteomics database ) . If you are interested or have proteomics workflows that one to run for large scale database. Just let us known. We will be able to put the data and infrastructure to run all the wfs.

Nice - thought about working with http://nf-co.re and utilize the skeleton code we provide there? Could be worth for everyone and if you are planning to release these anyways ...? Just a thought ...

Yasset Perez-Riverol
@ypriverol
Sep 25 2018 18:41
@apeltzer more than happy to share the pipelines in nf-co.re
My idea now is to group everyone that is developing pipelines in proteomics to join efforts to move those pipelines into the public and with the use case of reanalysis public proteomics data. I have seen some pipelines already public in the field of proteomics by they are release for some specific data only or local setup. The idea that I would like to promote is to take nextflow pipelines in proteomics and put (with this coomunity) available for others pipelines that enable to reanalyze data in PRIDE database. If anyone has developed a pipeline to reanalyze they data, we can try to “standirezed” it to analyze data from others. What do you think @apeltzer .
Alexander Peltzer
@apeltzer
Sep 25 2018 19:07
I think its a good idea - why not?
Anthony Underwood
@aunderwo
Sep 25 2018 20:05

When I run on AWS batch I see that files are staged to directories in /tmp e.g

/tmp/nxf.GKHQQ3ZaeG

SPAdes assembler is failing and it is writing the logs to

/tmp/nxf.GKHQQ3ZaeG/spades.log

This isn't getting copied back to the s3 work dir. Is there a way I can get the intermediary files such as the log files from a process that is failing. pubishDIr won't work since the process has not completed