Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 12:44
    aubinthomas commented #3269
  • 12:30
    aubinthomas opened #3269
  • 10:18
    cnexcale commented #2668
  • 07:24
    pditommaso closed #3266
  • 07:24
    pditommaso locked #3266
  • 03:33
    liuxinf edited #3266
  • 03:30
    liuxinf opened #3266
  • Oct 04 18:06
    pditommaso commented #2668
  • Oct 03 20:38
    cnexcale commented #2668
  • Oct 03 14:50

    pditommaso on v22.10.0-RC1

    (compare)

  • Oct 03 14:25
    sonatype-lift[bot] commented #3263
  • Oct 03 14:09
    bentsherman synchronize #3263
  • Oct 03 14:09

    bentsherman on 3260-fix-resource-labels-docs

    fix type [ci skip] Signed-off-… (compare)

  • Oct 03 14:08
    bentsherman labeled #3263
  • Oct 03 14:08
    bentsherman review_requested #3263
  • Oct 03 14:08
    bentsherman opened #3263
  • Oct 03 14:08
    Midnighter opened #3262
  • Oct 03 14:08

    bentsherman on 3260-fix-resource-labels-docs

    Add resourceLabels to docs [ci … (compare)

  • Oct 03 13:52
    Midnighter opened #3261
  • Oct 03 13:36
    bentsherman labeled #3260
Johannes Alneberg
@alneberg
nextflow.util.MemoryUnit did the trick
Alaa Badredine
@AlaaBadredine_twitter
oh neat
Johannes Alneberg
@alneberg
Thank you both!
Alaa Badredine
@AlaaBadredine_twitter
you're welcome !
Combiz
@combiz_k_twitter
I'm having some trouble with finding output files when NF is run via singularity. I write a file called 'test.csv' and NF gives " Missing output file(s) test.csv expected by process". If I cd to the NF workdir /rdsgpfs/general/ephemeral/user/ck/ephemeral/TestNF/work/f2/aca9181e283b109ffe55dc5e73d66a I can see the test.csv was produced. The file is saved to the workdir in R using write.table(df, "test.csv") )
Steve Frenk
@sfrenk

I've just started playing around with DSL-2 and I'm trying to pass the output of a process into a new, named channel. I need to join this output channel with another channel further down the workflow, hence chaining opperators directly from the process call doesn't work. I have a script that does something like this:

process1(parameters)

outputChannel = process1.out
    .ifEmpty {
                    error "Stuff not produced"
                }
                .map { <do something> }

But I get the error:

nextflow.Session - Session aborted -- Cause: No signature of method: nextflow.script.ChannelArrayList.ifEmpty() is applicable for argument types: (Script_48650d62$_runScript_closure7$_closure12) values: [Script_48650d62$_runScript_closure7$_closure12@749f539e]

What am I doing wrong?

Steve Frenk
@sfrenk
Also, unrelated question - what's the current status of the potential DSL-2 unit testing feature?
Michael L Heuer
@heuermh
@lindenb @pditommaso Curious if you might write up how your extensions work, and what the right way to do extensions might be; I've wanted to do similar in the past, but adding new dependencies to Nextflow isn't desireable for various reasons
Paolo Di Tommaso
@pditommaso
file splitting? Extend this, Pierre is using a different approach consisting in helper method returning a closure doing the parsing
Stephen Kelly
@stevekm

@taylor.f_gitlab

Hi all, pretty sure I've scoured the docs with no results, but is there any syntax for have a file object work similar to a non-consumable value? For instance, a reference fasta that is getting used multiple times throughout a pipeline. Is there no better way than using .fromPath() each time?

Channel.fromPath('genome.fa').into { ref_fasta1; ref_fasta2; ref_fasta3, .... etc. }

I think in the new DSL2 for Nextflow you no longer have to do this, you can just set it once and use it repeatedly.

spaceturtle
@spaceturtle

Could someone help me to figure out the problem? My code is like this:

pair = [:]
outChannel = Channel.create()
inChannel.subscribe onNext: {
              if(pair.containsKey(it))  {
                       outChannel.bind(it)
                 }
                 else {
                         pair[it] = null
                    }
            }
           onComplete: {
                      outChannel.close()
                }

outChannel.subscribe { println "$it" }

I have checked that values were correctly bound to outChannel but no output from outChannel.subscribe. Anything Wrong?

Michael L Heuer
@heuermh
@pditommaso Sorry, should have looked more closely before asking, I thought perhaps @lindenb had come up with a way of extending Nextflow to add "third-party" functionality. His branch adds htsjdk as a dependency to Nextflow itself.
Stijn van Dongen
@micans
@spaceturtle that is not a self-contained example, is it? What is that code doing, what is the problem it is solving?
Abhinav Sharma
@abhi18av

Hello everone :)

I was trying to find JavaDoc or GroovyDoc for the project but I've not be able to do so. Could anyone here point me to the right direction please?

Nabil-Fareed Alikhan
@happykhan

hi everyone.

I've been trying to set up a chain of processes; bcl2fastq > fastp > multiqc.

ive been going over the docs and i cant seem to figure out; how do i scoop out the demultiplexted reads from bcl2fastq, organize it in to read pairs and then pipe into another process.

a lot of examples use fastqc (working with each fastq.gz individually; which is fair enough)
Nabil-Fareed Alikhan
@happykhan
fastq_output.flatMap().map{ file ->
               if ("${file}".contains("_R1_") || "${file}".contains("_R2_")  ){
                    def key_match = file.name.toString() =~ /(.+)_R\d+_001\.fastq\.gz/
                    def key = key_match[0][1]
                    return tuple(key, file)
               }
            }
            .groupTuple()
            .into{ read_files_fastqc; read_files_fastp}
Figured it out I guess, made sense to use flatmap to tidy up the reads ...
mmatthews06
@mmatthews06
Hey all, is there documentation on a recommended method of running nextflow with a debugger, like in IntelliJ, for development purposes, to set breakpoints and whatnot? I've only just started looking for that specifically, but I would've thought I'd come across it now.
Riccardo Giannico
@giannicorik_twitter
@happykhan Hi, I believe you are searching for this construct here:
:point_up: June 4, 2019 12:39 PM
Combiz
@combiz_k_twitter
According to the HPC/QMUL docs on NF: "Using the SGE executor for parallel jobs causes the master job to hang until it is killed by the scheduler for exceeding walltime. This is due to Apache Ignite not being able to communicate to other pipeline scripts submitted as separate jobs.". Is this generally true of using SGE? Or is it their particular HPC config that means NF can only be used for serial jobs with SGE?
Stephen Kelly
@stevekm

@happykhan

how do i scoop out the demultiplexted reads from bcl2fastq, organize it in to read pairs

I do not do this inside Nextflow, I separate my demultiplexing pipeline from the rest of my analysis. I run a script on the demultiplexing output to coordinate the sample R1 R2 pairs into a new samplesheet as the input for the analysis pipeline.

Demultiplexing pipeline: https://github.com/NYU-Molecular-Pathology/demux-nf

downstream analysis pipeline: https://github.com/NYU-Molecular-Pathology/NGS580-nf

samplesheet generation (parsing of the R1 R2 pairs) happens here:
https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/4986e0a6a5eb9fec3e5016c8de29b60d5044df96/Makefile#L170

using this script:
https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/4986e0a6a5eb9fec3e5016c8de29b60d5044df96/generate-samplesheets.py

if you wanted to do it all inside one pipeline, then might want to use some of the functions of that script somehow to do the SampleID-R1-R2 pairing and then output to Nextflow in a new channel. Or if you are good with regex you might be able to do it natively in a Nextflow channel .map or something like that.

@combiz_k_twitter I have used Nextflow without issue on SGE. I am not sure what exactly they are referring to with that quote. A lot of HPC admins get really hung up on the idea of using array-jobs all the time for everything and have trouble understanding that Nextflow is managing the job dependency itself and submitting all the jobs individually.
I am not sure what you mean by "NF can only be used for serial jobs with SGE"
Stephen Kelly
@stevekm
I have never used Apache Ignite but I am not clear what it has to do with anything in that situation. If Nextflow can communicate with SGE and submit jobs, what does Apache Ignite have to do with it? Also what are they referring to as the "master job"? If you are submitting the parent Nextflow process in its own SGE job then you should just set an adequate time limit on the job's execution. I typically give mine 5 days on our current SLURM system
banjosnapper
@banjosnapper

Hi there, I am having an issue with trying to implement a perl script within my nextflow workflow. Is it possible to call a script within nextflow? or do you have to write the script within the process?

I have tried both ways and have currently had no success.
I am trying to convert a 'stringtieMerged.gtf' gene_id which gives the default output to the mirBase names that I have.
The perl script works outside of nextflow but I then have issues when placing it into the workflow. I have provided the code below.

This is the process that creates the merged list

process createList {
        module 'stringtie'
        publishDir "$baseDir/../output/stringtieGTF", mode: 'copy'
        tag "${listGTF}"
        errorStrategy { task.exitStatus == 0 ? 'retry' : 'terminate' }
        maxRetries 3
        maxErrors -1

        input:
        file listGTF from listGTF.collect()
        file gff from gffFile

        output:
        file "mergeList.txt" into mergeList
        file "stringtieMerged.gtf" into stringtieMerged

        script:
        """
        touch mergeList.txt
        ls -1 $listGTF > mergeList.txt
        stringtie --merge -o stringtieMerged.gtf -G ${gff} mergeList.txt
        """
}

This is the perl script that works perfectly fine outside of the workflow but I get either exit status 25 or 2

process swapID {
        publishDir "$baseDir/../output/stringtieGTF", mode: 'copy'

        input:
        file "stringtieMerged.gtf" from stringtieMerged
        file gff from gffFile

        output:
        file "temp.gtf" into stringtieMergedID

        script:
        """
        #!/usr/bin/env perl

        my \$gff = "mmuChr.gff3";
        my \$gtf = "stringtieMerged.gtf";

        # open GFF3 from mirbase and made a lookup

        my %lookup;   # key value obejct

        open FPIN, "<".${gff} or die;   # open file for reading


        while (<FPIN>) {        # loop over each line in turn

                if (/ID\\=([^;]+);.*Name\\=([^;]+)[\\t \\r\\n\\f;]+/) {    # if line contain this string  + is atleast one match, * zero or some

                                my (\$id, \$name) = (${1}, ${2});
                                        die if (exists \$lookup{\$id});         # don't really need this, but checks that the values isn't twice in the file
                                                \$lookup{\$id} = \$name;

                                                            }

                                                                }

                                                                close FPIN;

                                                                # open GTF and create new temp file with substituted names

                                                                open FPIN, "<".\$gtf or die;
                                                                open FPOUT, ">temp.gtf" or die;     # output to a temp file

                                                                while (my \$line = <FPIN>) {

                                                                        if (\$line =~ /; transcript_id \"(MI[^\"]+)\";/) {

                                                                                        my \$id = ${1};



         die \$id unless (exists \$lookup{\$id});

                 my \$id2 = \$lookup{\$id};



                                 # make substitution



                                                 \$line =~ s/gene_id \"[^\"]+\"/gene_id "\$id2"/;





                                                                     }




                                                                             print FPOUT \$line;



                                                                                 }


                                                                                 close FPIN;

                                                                                 close FPOUT;
"""
}
Any help would be much appreciated
Alaa Badredine
@AlaaBadredine_twitter
@banjosnapper have you tried to call your script in nextflow ? like
process swapID {
        publishDir "$baseDir/../output/stringtieGTF", mode: 'copy'

        input:
        file "stringtieMerged.gtf" from stringtieMerged
        file gff from gffFile

        output:
        file "temp.gtf" into stringtieMergedID

        script:
        """
        perl script.pl
        """
}
you can directly call your perl script within the script block of nextflow. So, you actually don't have to write perl code inside Nextflow, just give the path of your script and gives the right output and input and it should work
banjosnapper
@banjosnapper
@AlaaBadredine_twitter I get an error saying that my script.pl does not exist. Should I be putting my script in a particular place to be able to call it? It is within the same directory as the main.nf
Alaa Badredine
@AlaaBadredine_twitter
what's your script name ?
banjosnapper
@banjosnapper
I called it 'parseme.pl'
Alaa Badredine
@AlaaBadredine_twitter
replace script.pl by your actual script name
banjosnapper
@banjosnapper
That is what I did do
Alaa Badredine
@AlaaBadredine_twitter
ok so it would be perl /path/to/your/script/perseme.pl
you have to give the full path of your script
you can define it as a variable
banjosnapper
@banjosnapper
Okay I will try and give it the full path
Alaa Badredine
@AlaaBadredine_twitter
somewhere in nextflow
parseme = "/full/path/to/parseme.pl"
and then call it: perl $parseme
banjosnapper
@banjosnapper
Thank you that worked. However, I am still getting exir error status 2 when executing
Caused by:
  Process `swapID` terminated with an error exit status (2)

Command executed:

  perl /scratch/c.c1860369/nextFlow/bin/parseme.pl

Command exit status:
  2

Command output:
  (empty)

Command error:
  Died at /scratch/c.c1860369/nextFlow/bin/parseme.pl line 10.
banjosnapper
@banjosnapper
Okay so I worked out the isse. It was because I did not give an absolute path name to the mmuChr.gff3 in the perl script. Many thank for your help @AlaaBadredine_twitter
Alaa Badredine
@AlaaBadredine_twitter
@banjosnapper no problem
mmatthews06
@mmatthews06
@pditommaso or anyone else, did anybody catch my question about attaching a debugger to Nextflow? Or running Nextflow in IntelliJ in debug mode, to set breakpoints, etc.? I'm starting back working on that, just thought I'd ask for any quick hints, since I assume somebody has already done it.
Paolo Di Tommaso
@pditommaso
it's possible for the nextflow runtime development, but I guess you want for nextflow scripts
mmatthews06
@mmatthews06
No, runtime development. I'm mucking around Nextflow internals for the time being.
Paolo Di Tommaso
@pditommaso
then it's straightforward, use ./launch.sh -remote-debug run .. etc
mmatthews06
@mmatthews06
Ah, alright, I'll try that. Thanks!
Paolo Di Tommaso
@pditommaso
Workflow components in the pipeline :sunglasses:
Stephen Kelly
@stevekm
@banjosnapper put your scripts in a directory called bin adjacent to the main nextflow script; example here: https://github.com/stevekm/nextflow-demos/tree/1238d0c444f388cb1ee79c351a57610e03e4bbb6/R-Python
as long as the scripts are executable then you can just invoke them directly from within your task