These are chat archives for nextflow-io/nextflow

6th
Feb 2018
Ali Al-Hilli
@Ali_Hilli_twitter
Feb 06 2018 12:02
Hi. I use HPC modules in my nextflow scripts. It used to work when I load them in the script but now it gives a module: command not found. When I run .command.run by itself everything works fine. Here is my script:
process qa_and_trim {

    publishDir "${output_dir}/${pair_id}", mode: 'copy'

    // module 'phe/qa_and_trim'

    input:
    set pair_id, file(file_pair) from read_pairs
    val(workflow) from workflow

    output:
    // set val(output_dir) from params.output_dir
    file('qa_and_trim')
    set pair_id, file('*.processed.R*.fastq.gz') into processed_fastqs_channel

    //  make dir only once for the ${output_dir}/${pair_id}

    script:

    """
    module purge
     module use /phengs/hpc_software/Modules/production
     module load phe/qa_and_trim
    mkdir -p ${output_dir}/${pair_id}
    qa_and_trim.py workflow $workflow \$PWD

    """
}
@pditommaso just to say that when I say used to, it was at least three months ago.
Paolo Di Tommaso
@pditommaso
Feb 06 2018 12:21
so what has changed recently ?
Ali Al-Hilli
@Ali_Hilli_twitter
Feb 06 2018 13:02
@pditommaso Nothing.
Paolo Di Tommaso
@pditommaso
Feb 06 2018 13:02
have you upgraded NF ?
Rickard Hammarén
@Hammarn
Feb 06 2018 13:29

Hi @pditommaso! We are having some trouble with running our pipeline with Singularity on our cluster. See the discussion here for our thoughts; SciLifeLab/NGI-RNAseq#193
The error I get is:

Command error:
  ERROR  : Image path doesn't exists
  ABORT  : Retval = 255

@ewels suggested that [this line] (https://github.com/SciLifeLab/NGI-RNAseq/blob/f86dd01a2f9de7db08ac2cb74f3908112ed75be4/conf/docker.config#L19) needed to be put in curly brackets to delay execution. I tried that but with no luck.

Paolo Di Tommaso
@pditommaso
Feb 06 2018 13:33
Image path doesn't exists
there's no information what's the resolved image file path ?
Phil Ewels
@ewels
Feb 06 2018 13:35
It’s something to do with the order of execution for that variable that contains the container path I think
Maxime Borry
@maxibor
Feb 06 2018 13:38
Hello,
I was running my NF pipeline that stopped because of a lack of disk space. After doing some cleanup (not in work nor results directory) elsewhere on the disk, I relaunched the pipeline with the -resume flag.
It terminated without any error message, but only the rest of the pipeline on one of the samples. Any idea what might have caused this ?
N E X T F L O W  ~  version 0.27.4
Launching `organdiet.nf` [peaceful_monod] - revision: cb53bdc5f8
[warm up] executor > local
[22/92d649] Cached process > fastqc_control (T-blanc)
[4d/b7647a] Cached process > fastqc (T-ULG66-1)
[ed/7bafc9] Cached process > adapter_removal_ctrl (T-blanc)
[14/3b9181] Cached process > fastqc (T-ULG59)
[7f/8fb0d8] Cached process > fastqc (T-ULG82)
[6e/42c41c] Cached process > fastqc (T-ULG66-2)
[5e/4c94cf] Cached process > fastqc (T-ULG92)
[fd/45578c] Cached process > adapter_removal (T-ULG82)
[13/ec8c07] Cached process > adapter_removal (T-ULG66-2)
[66/85903c] Cached process > adapter_removal (T-ULG66-1)
[52/3b5b56] Cached process > adapter_removal (T-ULG92)
[fb/691de9] Cached process > adapter_removal (T-ULG59)
[be/f6cd49] Cached process > ctr_bowtie_db (1)
[3d/4f6dc0] Submitted process > bowtie_align_to_ctrl (T-ULG66-1)
[15/2c35cf] Submitted process > bowtie_align_to_human_genome (T-ULG66-1)
[4f/636691] Submitted process > bowtie_align_to_organellome_db (T-ULG66-1)
[0e/8cea72] Submitted process > extract_mapped_reads (T-ULG66-1)
[4b/52fdeb] Submitted process > multiqc (fastqc_before_trimming/T-ULG66-1_R1)
[68/2f3c15] Submitted process > extract_best_reads (T-ULG66-1)
[cc/0e7a9b] Submitted process > diamond_align_to_nr (T-ULG66-1)
[d0/8041f7] Submitted process > lca_assignation (T-ULG66-1)
[d3/4e2bbb] Submitted process > visual_results (T-ULG66-1)
Paolo Di Tommaso
@pditommaso
Feb 06 2018 13:40
do you mean why some processes have been re-executed ?
Maxime Borry
@maxibor
Feb 06 2018 13:40
So I relaunched it, still the -resume flag, and it's now running on another sample...
No, I mean why the processes that were not yet executed are now only run on only one of my samples.
(instead of 5)
Paolo Di Tommaso
@pditommaso
Feb 06 2018 13:41
something wrong in your code
Maxime Borry
@maxibor
Feb 06 2018 13:44
N E X T F L O W  ~  version 0.27.4
Launching `organdiet.nf` [mad_golick] - revision: 9e7443bee9
[warm up] executor > local
[22/92d649] Cached process > fastqc_control (T-blanc)
[14/3b9181] Cached process > fastqc (T-ULG59)
[ed/7bafc9] Cached process > adapter_removal_ctrl (T-blanc)
[6e/42c41c] Cached process > fastqc (T-ULG66-2)
[7f/8fb0d8] Cached process > fastqc (T-ULG82)
[4d/b7647a] Cached process > fastqc (T-ULG66-1)
[5e/4c94cf] Cached process > fastqc (T-ULG92)
[fd/45578c] Cached process > adapter_removal (T-ULG82)
[13/ec8c07] Cached process > adapter_removal (T-ULG66-2)
[52/3b5b56] Cached process > adapter_removal (T-ULG92)
[66/85903c] Cached process > adapter_removal (T-ULG66-1)
[fb/691de9] Cached process > adapter_removal (T-ULG59)
[be/f6cd49] Cached process > ctr_bowtie_db (1)
[11/5d287e] Submitted process > bowtie_align_to_ctrl (T-ULG82)
Maxime Borry
@maxibor
Feb 06 2018 14:01

Could it be that when I build the channel bringing the bowtie DB files, it "only brings the bowtie index files" once ?
I build my bowtie DB like this:

process build_bowtie_db {
    cpus = 12

    input:
    file(read) from my_fastq_files

    output:
    file "my_bowtie_index*" into my_bowtie_index


    """
    sed '/^@/!d;s//>/;N' $read > db.fa
    bowtie2-build --threads ${task.cpus} db.fa my_bowtie_index
    """
}

And I then align like this:

process bowtie_align {

    cpus = 18
    tag "$name"

    input:
        set val(name), file(reads) from trimmed_reads
        file bt_index from my_bowtie_index

    output:
        set val(name), file('*.fastq') into unaligned_reads

    script:

        sam_out = name+".sam"
        fq_out = name+"_unal.fastq"
        metrics = name+".metrics"
        """
        bowtie2 -x my_bowtie_index -U $reads --no-sq --un $fq_out 2> $metrics
        """
}
Bioninbo
@Bioninbo
Feb 06 2018 14:03
Hello. I probably missed it, but is there an option to publish only certain files of a directory/process ? The same way that we can select the files to send to other channels.
Paolo Di Tommaso
@pditommaso
Feb 06 2018 14:05
@maxibor how is declared my_fastq_files ?
Alexander Peltzer
@apeltzer
Feb 06 2018 14:06
@Bioninbo Just state which ones you'd like to publish with a regular expression. See here: https://www.nextflow.io/docs/latest/process.html#publishdir
Bioninbo
@Bioninbo
Feb 06 2018 14:08
I see! Thanks Alexander!
Maxime Borry
@maxibor
Feb 06 2018 14:14
From a previous process, creating a single fastq file in the my_fastq_file channel
Paolo Di Tommaso
@pditommaso
Feb 06 2018 14:15
if so, replace Channel.fromPath with file(something) and will fix it
Phil Ewels
@ewels
Feb 06 2018 14:19
@pditommaso - regarding @Hammarn's question. We have this line which generates a variable with the docker image dynamically. We use this for docker here and it works fine, but when we try to use it in an evaluated string here with container = { "docker://$wf_container" } it doesn't work. Just returns an empty (null?) value which then gives an empty filename for the singularity image.
Any ideas why? It seems that it's not being executed in the correct order for some reason
Maxime Borry
@maxibor
Feb 06 2018 14:29
I'm not sure I'm following @pditommaso ...
How do I create the Channel then ?
So far, I create it using the Channel.fromFilePairs method, which is used to send the file pair to my first process that merges them into 1 file that is sent in the my_fastq_files channel.
But I still need to read my FilePairs reads from the disk ?
For reference, here is the snippet of my code:
Channel
    .fromFilePairs(params.ctrl, size: 2)
    .ifEmpty { exit 1, "Cannot find any reads matching: ${params.ctrl}"}
    .into { control_read_fastqc; control_read_trimming }

process control_trimming {

    cpus = 18
    tag "$name"

    input:
        set val(name), file(reads) from control_read_trimming

    output:
        set val(name), file('*.collapsed.fastq') into my_fastq_files


    script:
        """
        AdapterRemoval --basename $name --file1 ${reads[0]} --file2 ${reads[1]} --trimns --trimqualities --collapse --threads ${task.cpus}
        rename 's/(.collapsed)/\$1.fastq/' *
        """
}
Maxime Borry
@maxibor
Feb 06 2018 14:34
All right, for reference, I solved the issue. I needed to add a .collect() in the bowtie_align process inputs, eg. my_bowtie_index.collect()
Thanks @pditommaso for your help !
And thanks @ewels for your very clear NF pipelines, it helps to reads others code !
(thinking very hard of creating a NF snippets GitHub repository)
Phil Ewels
@ewels
Feb 06 2018 14:39
:+1:
We intended to start noting down snippets and helpful docs and even made a start here: https://github.com/SciLifeLab/NGI-NextflowDocs
Haven't ever got around to actually putting anything useful there though
Maxime Borry
@maxibor
Feb 06 2018 14:43
I'm not sure I'm following NF best practices yet ;)
But I definitely think reference snippets would be help a lot the adoption of NF. A bit like what you propose with Clusterflow :)
Phil Ewels
@ewels
Feb 06 2018 14:47
I made this the other day which you might find helpful: https://github.com/SciLifeLab/NGI-NFcookiecutter
Ido Tamir
@idot
Feb 06 2018 14:48
hi, if I do my own error handling in a bash script within the process shell, is nextflow somehow catching failed executions before my script?
eg:
   shell:
  '''   while [ $counts -le 15 ]
   do
      curl -o download.bam --retry 10  --continue-at - -O -J -H "X-Auth-Token: $token" "https://api.gdc.cancer.gov/data/${file}"
      excode=$?
      n=$(( n+1 ))
      if [ "$excode" -eq 18 ]; then
        echo "$(date) restarting because of premature end of download $n : $file" >> download.log
      fi
I think nextflow cought the 1. exitcode 18 from curl itself and terminated the process
Caspar
@caspargross
Feb 06 2018 14:52

Hey! Newbie here. What would be the best approach to create a pipeline to compare different tools in some processes?
Should i create conditional statements in a single process which runs different scripts depending on an input parameter?

Or would it be better to assign individual process elements for the different tools?
At the start (file input) and end of the pipeline I wan to use some processes which are indifferent to the chosen tools.

Phil Ewels
@ewels
Feb 06 2018 14:59
@idot - you can tell nextflow that non-zero exit codes are allowed: https://www.nextflow.io/docs/latest/process.html#validexitstatus
This may fix it for you..
or better still, why not just let nextflow handle the retries? https://www.nextflow.io/docs/latest/process.html#errorstrategy
Ido Tamir
@idot
Feb 06 2018 15:01
@ewels will nextflow continue with the loop? I cant let nextflow handle the retries because it submits a new process in a new work directory and the download starts again at 0. I tried it already.
Phil Ewels
@ewels
Feb 06 2018 15:02
@caspargross - depends on the tools! Both should work, personally I would use different processes though. eg. we choose which aligner to use here: https://github.com/SciLifeLab/NGI-RNAseq/blob/master/main.nf#L578
@idot - I see, yes then you have to use your approach :+1:
I'm not totally sure how nextflow will handle the validexitstatus thing, but I guess it should allow the script to keep running
Ido Tamir
@idot
Feb 06 2018 15:06
thanks @ewels
Caspar
@caspargross
Feb 06 2018 15:17
@ewels thanks for the reference! Looks good I will do something similar :)
Thomas Zichner
@zichner
Feb 06 2018 16:06

Hi!
I am trying to access a NF pipeline in a local gitlab installation, but unfortunately, NF cannot find it.

Here is what I tried:
In ~/.nextflow/scm I specified server, platform, and token.
After executing the pipeline, I can see in .nextflow.log that NF tried to access the url https://localgitserver/api/v3/projects/PROJECTNAME, which is the correct address.
And I am able to do curl --header "PRIVATE-TOKEN: <same token as in the scm file>" https://localgitserver/api/v3/projects/PROJECTNAME, getting back a json file with the project information. So, in principle it seems to work.

Do you have any idea what the problem could be?
Thank you very much!

Jean-Christophe Houde
@jchoude
Feb 06 2018 16:29
hi! I'm trying to run a NF pipeline on an OSX machine. I have some issues because some of the processes in the pipeline use bash directives which are not supported in the default OSX bash version (3.something). I upgraded my bash using Homebrew, but when NF generates the .command.sh file, it still uses the #!/bin/bash shebang. Is there any way to tell NF to generate the bash files with another shebang?
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:37
NF is developed on a mac, what exactly is the problem ?
@zichner have you specified user, password and token as specified here ?
Jean-Christophe Houde
@jchoude
Feb 06 2018 16:39
For example, I', trying to use the declare -A call, to declare an array. However, this has some issues with the bash 3.X version that is installed by default on OSX 10.12. So I installed bash 4.4 using homebrew, but by default it installs in /usr/local/bin/bash.
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:40
ahh, I see
Jean-Christophe Houde
@jchoude
Feb 06 2018 16:40
I updated all my settings in the Terminal and environment, and that bash is correctly used. However, when NF generates the .command.sh files, it still puts the old shebang
So I don't know if there a way to specify it in a setting, or if there is any other way to change that
Thomas Zichner
@zichner
Feb 06 2018 16:40
I did specify the token and tried with and without the user name, however did not put in the password since my assumption was that this is not needed if you have a token
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:41
you you can override it setting in the nextflow.config file
Jean-Christophe Houde
@jchoude
Feb 06 2018 16:41
I saw that I can use the scripts à la carte feature, but I would have liked to not have to wrap all my scripts with that.
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:41
process.shell =  ['/usr/local/bin/bash','-uex']
Jean-Christophe Houde
@jchoude
Feb 06 2018 16:41
Perfect! Thanks!
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:42
@zichner bad assumption .. :)
token is need by the gitlab api, user name and password by git
Thomas Zichner
@zichner
Feb 06 2018 16:43
Ok. Good to know. Is there a possibility to use a key pair instead?
(I can do a git clone ... on the machine of interest without the need of a password)
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:44
it would be useful, but it's not an implemented feature, we welcome a PR for that
Thomas Zichner
@zichner
Feb 06 2018 16:45
Ok. Thanks for the information.
In which source file is the git access implemented?
(but don't worry if you do not know by hard, I can find it myself and have a look)
Thanks anyways!
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:47
all that logic is in the AssetManager
Thomas Zichner
@zichner
Feb 06 2018 16:47
Thanks. I will have a look when I find some time.
Paolo Di Tommaso
@pditommaso
Feb 06 2018 16:47
:+1: