Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 07:44

    pditommaso on master

    Bump AWS sdk version 1.12.351 … (compare)

  • 07:43

    pditommaso on master

    Rewrite fetchIamRole and fetchR… (compare)

  • 07:43
    pditommaso closed #3425
  • 07:20
    pditommaso synchronize #3425
  • Dec 03 07:33
    pditommaso commented #3443
  • Dec 03 07:32
    pditommaso milestoned #3353
  • Dec 03 07:31
    pditommaso closed #3353
  • Dec 03 07:31
    pditommaso commented #3353
  • Dec 03 07:31

    pditommaso on master

    Add warning on Google Logs fail… (compare)

  • Dec 03 07:20
    pditommaso commented #3464
  • Dec 03 07:19

    pditommaso on master

    Fix Quote the logName in the Cl… (compare)

  • Dec 03 07:19
    pditommaso closed #3464
  • Dec 03 06:49

    pditommaso on master

    Fix a few issues in BatchLoggin… (compare)

  • Dec 03 06:49
    pditommaso closed #3443
  • Dec 03 06:49
    pditommaso commented #3443
  • Dec 03 06:32
    pditommaso closed #3411
  • Dec 03 06:32
    pditommaso locked #3411
  • Dec 03 06:22
    pditommaso synchronize #3443
  • Dec 02 22:42
    robsyme commented #3466
  • Dec 02 22:34
    robsyme commented #3465
evanbiederstedt
@evanbiederstedt

@pditommaso
this is just great, isn't it ?
https://twitter.com/yokofakun/status/1159468857934929922

Confirmed, this is great

evanbiederstedt
@evanbiederstedt

https://gitter.im/nextflow-io/nextflow?at=5d4802a1475c0a0feb021c1b

Could you give us a few examples @pditommaso to illustrate this point? It still feels a bit abstract without a few concrete examples

Pierre Lindenbaum
@lindenb
@evanbiederstedt @pditommaso thanks ! I'm still not convinced it's the right way to implement the idea. 1) one can still use a bash script to extract the data 2) or it would be better using a 'static' method: something like:Channel.fromMap("my.bam").flatMap(Htsjdk::extractSamples)
Taylor Falk
@taylor.f_gitlab
Is there an easy way to pass two channel outputs and some associated strings to a process? I'm currently trying output = Channel.from(['filter1', channel1_out], ['filter2', channel2_out]) but this only returns the blank DataflowQueue(queue=[]) string inside the process.
Stijn van Dongen
@micans
@taylor.f_gitlab that would mean you'd have channels in a channel. What are you trying to achieve? What's in channel1_out and channel2_out?
Taylor Falk
@taylor.f_gitlab
@micans those channels contain singular files, so I am really just trying to run the next process on each output from channel1 and channel2, and passing along the correct filter string. How do I move those files into one channel, concat?
Stijn van Dongen
@micans

@taylor.f_gitlab do you have a source process? Normally you'd see something like

output: set val('filter'), file('*.txt') into channel1

then if there are multiple files in that output that you want to flatten you can use transpose().

Taylor Falk
@taylor.f_gitlab
Oh that's a good idea, let me try it that way.
Gabriel Abud
@G_Abud_twitter
I'm getting the error: fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied when trying to run a pipeline on AWS Batch. Pretty sure it's due to the s3 access of the workDir but I've double checked and I am the owner of that bucket. Any ideas?
Combiz
@combiz_k_twitter
Hi, this may be more of a singularity question than a nextflow one. I'm trying to run a test hello world nextflow script with -with-singularity image.sif on an HPC cluster. There seems to be an issue with mounting as the short script runs but produces an error `Command error: Fatal error: cannot create 'R_TempDir'. Any ideas? My guess is the singularity container can't write files to the HPC filesystem? Thanks for any pointers.
Combiz
@combiz_k_twitter
Ok so the script now runs with containerOptions '-B $PWD:/tmp' in the process
Johannes Alneberg
@alneberg

A question regarding input of memory specifications. Using the config

params {
  memParam = 7.GB
}

process {
  withName:sayHello {
    memory = {2 * params.memParam}
  }
}

works fine when memParam is not used, but when it is: nextflow run main.nf -c base.conf --memParam 8.GB, I get either

ERROR ~ Error executing process > 'sayHello (3)'

Caused by:
  No signature of method: java.lang.Integer.multiply() is applicable for argument types: (java.lang.String) values: [8.GB]
Possible solutions: multiply(java.lang.Character), multiply(java.lang.Number)

or

Error executing process > 'sayHello (1)'

Caused by:
  Not a valid 'memory' value in process definition: 8.GB8.GB

depending on the order of the multiplication. What am I missing? (I'm on Nextflow version 19.04.1 build 5072)

Paolo Di Tommaso
@pditommaso
umm, because when passing on the command line the string 8.GB does not get parse to Memory unit object
therefore for the interpreter it's a string
Johannes Alneberg
@alneberg
Is it possible to go around?
Paolo Di Tommaso
@pditommaso
you may try 2 * (params.memParam as nextflow.util.MemUnit)
quite ugly
Johannes Alneberg
@alneberg
Well, rather ugly than broken, I'll give it a try
it throws me unable to resolve class nextflow.util.MemUnit
Alaa Badredine
@AlaaBadredine_twitter
not Util ?
Johannes Alneberg
@alneberg
Same error using Util I'm afraid
Alaa Badredine
@AlaaBadredine_twitter
did you put the parenthesis ?
Johannes Alneberg
@alneberg
Yes:
process {
  withName:sayHello {
    memory = {2 * (params.memParam as nextflow.util.MemUnit)}
  }
}
Alaa Badredine
@AlaaBadredine_twitter
nextflow.util.MemUnit()
maybe he meant it like that ?
Johannes Alneberg
@alneberg
That gave me some syntax error: expecting ')', found '(' @ line 7, column 60.
I think I fixed it
Alaa Badredine
@AlaaBadredine_twitter
nice
Johannes Alneberg
@alneberg
nextflow.util.MemoryUnit did the trick
Alaa Badredine
@AlaaBadredine_twitter
oh neat
Johannes Alneberg
@alneberg
Thank you both!
Alaa Badredine
@AlaaBadredine_twitter
you're welcome !
Combiz
@combiz_k_twitter
I'm having some trouble with finding output files when NF is run via singularity. I write a file called 'test.csv' and NF gives " Missing output file(s) test.csv expected by process". If I cd to the NF workdir /rdsgpfs/general/ephemeral/user/ck/ephemeral/TestNF/work/f2/aca9181e283b109ffe55dc5e73d66a I can see the test.csv was produced. The file is saved to the workdir in R using write.table(df, "test.csv") )
Steve Frenk
@sfrenk

I've just started playing around with DSL-2 and I'm trying to pass the output of a process into a new, named channel. I need to join this output channel with another channel further down the workflow, hence chaining opperators directly from the process call doesn't work. I have a script that does something like this:

process1(parameters)

outputChannel = process1.out
    .ifEmpty {
                    error "Stuff not produced"
                }
                .map { <do something> }

But I get the error:

nextflow.Session - Session aborted -- Cause: No signature of method: nextflow.script.ChannelArrayList.ifEmpty() is applicable for argument types: (Script_48650d62$_runScript_closure7$_closure12) values: [Script_48650d62$_runScript_closure7$_closure12@749f539e]

What am I doing wrong?

Steve Frenk
@sfrenk
Also, unrelated question - what's the current status of the potential DSL-2 unit testing feature?
Michael L Heuer
@heuermh
@lindenb @pditommaso Curious if you might write up how your extensions work, and what the right way to do extensions might be; I've wanted to do similar in the past, but adding new dependencies to Nextflow isn't desireable for various reasons
Paolo Di Tommaso
@pditommaso
file splitting? Extend this, Pierre is using a different approach consisting in helper method returning a closure doing the parsing
Stephen Kelly
@stevekm

@taylor.f_gitlab

Hi all, pretty sure I've scoured the docs with no results, but is there any syntax for have a file object work similar to a non-consumable value? For instance, a reference fasta that is getting used multiple times throughout a pipeline. Is there no better way than using .fromPath() each time?

Channel.fromPath('genome.fa').into { ref_fasta1; ref_fasta2; ref_fasta3, .... etc. }

I think in the new DSL2 for Nextflow you no longer have to do this, you can just set it once and use it repeatedly.

spaceturtle
@spaceturtle

Could someone help me to figure out the problem? My code is like this:

pair = [:]
outChannel = Channel.create()
inChannel.subscribe onNext: {
              if(pair.containsKey(it))  {
                       outChannel.bind(it)
                 }
                 else {
                         pair[it] = null
                    }
            }
           onComplete: {
                      outChannel.close()
                }

outChannel.subscribe { println "$it" }

I have checked that values were correctly bound to outChannel but no output from outChannel.subscribe. Anything Wrong?

Michael L Heuer
@heuermh
@pditommaso Sorry, should have looked more closely before asking, I thought perhaps @lindenb had come up with a way of extending Nextflow to add "third-party" functionality. His branch adds htsjdk as a dependency to Nextflow itself.
Stijn van Dongen
@micans
@spaceturtle that is not a self-contained example, is it? What is that code doing, what is the problem it is solving?
Abhinav Sharma
@abhi18av

Hello everone :)

I was trying to find JavaDoc or GroovyDoc for the project but I've not be able to do so. Could anyone here point me to the right direction please?

Nabil-Fareed Alikhan
@happykhan

hi everyone.

I've been trying to set up a chain of processes; bcl2fastq > fastp > multiqc.

ive been going over the docs and i cant seem to figure out; how do i scoop out the demultiplexted reads from bcl2fastq, organize it in to read pairs and then pipe into another process.

a lot of examples use fastqc (working with each fastq.gz individually; which is fair enough)
Nabil-Fareed Alikhan
@happykhan
fastq_output.flatMap().map{ file ->
               if ("${file}".contains("_R1_") || "${file}".contains("_R2_")  ){
                    def key_match = file.name.toString() =~ /(.+)_R\d+_001\.fastq\.gz/
                    def key = key_match[0][1]
                    return tuple(key, file)
               }
            }
            .groupTuple()
            .into{ read_files_fastqc; read_files_fastp}
Figured it out I guess, made sense to use flatmap to tidy up the reads ...
mmatthews06
@mmatthews06
Hey all, is there documentation on a recommended method of running nextflow with a debugger, like in IntelliJ, for development purposes, to set breakpoints and whatnot? I've only just started looking for that specifically, but I would've thought I'd come across it now.
Riccardo Giannico
@giannicorik_twitter
@happykhan Hi, I believe you are searching for this construct here:
:point_up: June 4, 2019 12:39 PM
Combiz
@combiz_k_twitter
According to the HPC/QMUL docs on NF: "Using the SGE executor for parallel jobs causes the master job to hang until it is killed by the scheduler for exceeding walltime. This is due to Apache Ignite not being able to communicate to other pipeline scripts submitted as separate jobs.". Is this generally true of using SGE? Or is it their particular HPC config that means NF can only be used for serial jobs with SGE?
Stephen Kelly
@stevekm

@happykhan

how do i scoop out the demultiplexted reads from bcl2fastq, organize it in to read pairs

I do not do this inside Nextflow, I separate my demultiplexing pipeline from the rest of my analysis. I run a script on the demultiplexing output to coordinate the sample R1 R2 pairs into a new samplesheet as the input for the analysis pipeline.

Demultiplexing pipeline: https://github.com/NYU-Molecular-Pathology/demux-nf

downstream analysis pipeline: https://github.com/NYU-Molecular-Pathology/NGS580-nf

samplesheet generation (parsing of the R1 R2 pairs) happens here:
https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/4986e0a6a5eb9fec3e5016c8de29b60d5044df96/Makefile#L170

using this script:
https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/4986e0a6a5eb9fec3e5016c8de29b60d5044df96/generate-samplesheets.py

if you wanted to do it all inside one pipeline, then might want to use some of the functions of that script somehow to do the SampleID-R1-R2 pairing and then output to Nextflow in a new channel. Or if you are good with regex you might be able to do it natively in a Nextflow channel .map or something like that.

@combiz_k_twitter I have used Nextflow without issue on SGE. I am not sure what exactly they are referring to with that quote. A lot of HPC admins get really hung up on the idea of using array-jobs all the time for everything and have trouble understanding that Nextflow is managing the job dependency itself and submitting all the jobs individually.