Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 08:36
    pditommaso commented #3477
  • 08:33
    l-modolo commented #3477
  • 03:15
    mribeirodantas synchronize #3568
  • 00:47
    mribeirodantas synchronize #3586
  • 00:46
    mribeirodantas synchronize #3586
  • 00:41
    mribeirodantas synchronize #3586
  • 00:39
    mribeirodantas synchronize #3586
  • 00:37
    mribeirodantas synchronize #3586
  • 00:35
    mribeirodantas synchronize #3586
  • Jan 31 17:05

    pditommaso on master

    Add Header provider to Google B… (compare)

  • Jan 31 15:33
    pditommaso closed #3588
  • Jan 31 15:33
    pditommaso commented #3588
  • Jan 31 15:27

    pditommaso on master

    Bump FUSION_ prefix variables [… (compare)

  • Jan 31 15:10
    marcodelapierre opened #3593
  • Jan 31 14:37
    jfy133 review_requested #3589
  • Jan 31 14:34
    jfy133 synchronize #3589
  • Jan 31 14:32
    jfy133 synchronize #3589
  • Jan 31 14:30
    pditommaso commented #3585
  • Jan 31 14:25
    bentsherman commented #3585
  • Jan 31 14:19
    pditommaso commented #3585
Farshad Madani
@fmadani:matrix.org
[m]
Nexflowers :)
Is there any way to configure this nextflow message:
1 reply
to show a process completion time?
Brandon Cazander
@brandoncazander
Has anyone had success with composing workflows from multiple pipelines that are in different repositories?
praveenkorepupk
@praveenkorepupk
Hi @pditommaso , I was running parallel processes using AWS batch, as it running parallelly, when each job executed it overwriting with other folders, so i have included a R script that will create PATH individually based on their driver gene names. Basically this R script will generate .png files, i was running 10000 driverGenes processes parallelly, inside the publish dir i can see 10000 contains .png but I have to collect all .png files and it should save into that one directory, can you please suggest me best way to solve this
1 reply
Guillaume Noell
@gn5
Hi @pditommaso , I've successfully run channel.sql.fromQuery("SELECT * FROM CSVREAD('test.csv') where foo>=2;") . Is it also possible to use an H2 csv table for the sqlInsert ?
John Ma
@JohnMCMa
Hi, I just want to confirm one thing: if subworkflow is not executed, will subworkflow.out always be an empty array? If that's not always the case, how can I detect whether a subworkflow has been executed?
emily-kawabata
@emily-kawabata
Hi everyone, Is there a way to parallelize a specific command in a process? I am using prokka in one of the processes and looking for away to parallelize this step. Any advice would be greatly appreciated.
3 replies
Steven P. Vensko II
@spvensko_gitlab

I am trying to run some processed using google-lifesciences. It appears the GATK-4.1.4.1 Docker image from the Broad is causing issues:

Error executing process > 'lens:procd_fqs_to_procd_alns:raw_alns_to_procd_alns:bams_to_base_qual_recal_w_indices:gatk_index_feature_file (Homo_sapiens_assembly38.dbsnp138.vcf.gz)'

Caused by:
  Process `lens:procd_fqs_to_procd_alns:raw_alns_to_procd_alns:bams_to_base_qual_recal_w_indices:gatk_index_feature_file (Homo_sapiens_assembly38.dbsnp138.vcf.gz)` terminated with an error exit status (9)

Command executed:

  gatk IndexFeatureFile  -I Homo_sapiens_assembly38.dbsnp138.vcf.gz

Command exit status:
  9

Command output:
  (empty)

Command error:
  Execution failed: generic::failed_precondition: pulling image: docker pull: running ["docker" "pull" "broadinstitute/gatk:4.1.4.1"]: exit status 1 (standard error: "failed to register layer: Error processing tar file(exit status 1): write /opt/miniconda/lib/python3.6/__pycache__/selectors.cpython-36.pyc: no space left on device\n")

Work dir:
  gs://spvensko/work/2a/c230e037330eac63743b8b6e44817a

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

I've tried upping the lifeSciences.bootDiskSize but that doesn't seem to help. Any tips?

4 replies
John Ma
@JohnMCMa

Hi,

I have the following code:

John Ma
@JohnMCMa
process COMBINE_METRICS{
    input:
        path gex_metrics, stageAs: 'gex.txt'
        path adt_metrics, stageAs: 'adt.txt'
        val output_dir
    output:
        path "metrics_summary.csv", emit: metrics_csv
    publishDir "${output_dir}", mode: 'copy'
    exec:
        for (one_line: file('gex.txt').readLines()){
            doThing()
    }
    }

I have changed the file declaration in exec between 'gex.txt' and gex_metrics, but in both cases I got a "No such file" error and refers to a nonexistant file on output_dir (for example, in this case it's "${output_dir}/gex.txt". I'm using DSL2.

Can anyone thing of an solution for this? Thanks!

Michael Bale
@michaelbale

Hi all, I'm having trouble with getting nextflow to email me on completion of the pipeline -- specifically, the workflow.onComplete handler is either not working or I have a weird error.
My config file where the workflow.onComplete is housed has:

workflow.onComplete {

    sendMail(
        to: ${params.recipient},
        subject: 'pipeline execution: ${params.name}'
        '''
        Pipeline execution summary
        ---------------------------
        Completed at: ${workflow.complete}
        Duration    : ${workflow.duration}
        Success     : ${workflow.success}
        workDir     : ${workflow.workDir}
        exit status : ${workflow.exitStatus}
        Error report: ${workflow.errorReport ?: '-'}
        '''


    )
}

And I'm getting either a compilation error at the open brace for workflow.onComplete or an error that says

Unknown method invocation `onComplete` on ConfigObject type -- Did you mean?
  compute

Any help you can provide would be greatly appreciated!

1 reply
Ben R. Alexander
@benralexander

Hello @pditommaso and the rest of the nextflow community,

I am a big fan of the approach taken by Nextflow, and I am actively pushing people to adopt it as the principal data flow control language for our genetics community at the major pharmaceutical firm where I work. Our local high-performance UNIX cluster is entirely inadequate to the multitude of algorithms that we need to run, however, so running on a cloud provider is essential. Lately I’ve run up against a challenge when running with the AWS Batch mode, and I’m hoping that someone can suggest a workaround.

In our module-based, DSL-2 driven environment I’ve implemented a structure that I thought would be flexible, using some initial processes to gather (and check for the existence) of files, and then making symbolic links to those files wherever they originate so that I could access them in subsequent steps managed by Nextflow. I’ve been using an approach along these lines

process gather_essaential_files{
    input:
        val genotype_bim_name
        val genotype_bed_name
        val genotype_fam_name
        val working_dir    
output:
        path("$genotype_bim_name"), optional: true, emit: geno_bim
        path("$genotype_bed_name"), optional: true, emit: geno_bed
        path("$genotype_fam_name"), optional: true, emit: geno_fam

script:
        """
           ln -s $working_dir/$genotype_bim_name $genotype_bim_name
           ln -s $working_dir/$genotype_bed_name $genotype_bed_name
           ln -s $working_dir/$genotype_fam_name $genotype_fam_name      
        """

After a process such as this one I can treat all the files in subsequent processes as if there local, and the logic all seems very clean. I find that this approach runs wonderfully in my local environment, and making those file links is naturally super fast. When I try to run it on AWS BATCH, however, I don’t get file links, but instead those files each get uploaded to the S3 bucket where next flow is managing its process-specific work directories. This uploading/coping approach would make sense to me if we were dealing with s3-based data (given that S3 isn’t mounted like a real file system), but we aren’t. I’m using AWS FSx to allow access to those files, and therefore they act like any other local files. Here’s my question: is there some parameter/trick that I can use that would allow me to actually create symbolic links instead of copying the files every time they are referenced? We use lots of large data files, and we want to run in a massively parallel environment, and if we are forced to copy those data files every time they are needed by process and the whole approach becomes less viable. I would love a configuration parameter that would allow me to say “link-instead-of-copying” in the context of the AWSBATCH profile. Does anybody have a recommendation for me?

Thanks for any insights, Ben

1 reply
Vlad Kiselev
@wikiselev
Hi all! Maybe a naive question, but is it possible to pass a value from a shell script to an output tuple?
Vlad Kiselev
@wikiselev
thanks, all sorted!
1 reply
Matthew
@SamThePsychoticLeprechaun

Hi, I've been trying to port my codebase to DSL2, but I have hit a snag. I have a bunch of paired reads that I put in a channel using fromFilePairs and chunk using splitFastq. I pass this to Trimmomatic and from that process I emit tuples of ID, and the two paired file chunks that a given instance has trimmed (tuple val(id), file("${id}_trim_1P.fq"), file("${id}_trim_2P.fq")).

I previously was running the following to merge the fq files of forward & reverse chunks:

trimmomatic.out.trimmed_reads.collectFile() { item ->
    [ "${item[0]}_trim_1P.fq", item[1] ]
}.set{ collected_trimmed_first_reads }
trimmomatic.out.trimmed_reads.collectFile() { item ->
    [ "${item[0]}_trim_2P.fq", item[2] ]
}.set{ collected_trimmed_second_reads }
collected_trimmed_first_reads.merge(collected_trimmed_second_reads)
                                     .map { item -> tuple(item[0].simpleName, item[0], item[1]) }
                                     .set{ collected_trimmed_reads }

But in DSL2, with merge deprecated (I think actually removed as it fails to run after the deprecation warning), I'd like to move to the suggested new pattern using join. However, I cannot figure out what the pattern is that allows join to behave like merge.

I thought maybe to add an integer "key" to each element of the collected file lists produced, but I could only think of doing that in terms of merge too.

The next step in my workflow absolutely requires these chunks to be collected back up, so I can't sidestep this issue.

Thanks in advance for any suggestions on how I can do this better!

lfearnley
@lfearnley
Could be a 30s question - is it possible to get the trace log to report the name of the host executing them, or am I best to run hostname within the process?
Reece Hart
@reece

Are there alternatives to installing nextflow with curl get.nextflow.io | bash?

Although the curl-pipe-sh practice has become commonplace, it's quite inappropriate for a controlled environment.
(Let's please not debate this here. I'm interested only in answers to the narrow question about whether alternatives exist.)

2 replies
Moritz E. Beber
@Midnighter
You can always download the bash script and inspect it before running or what do you mean?
1 reply
James Fellows Yates
@jfy133:matrix.org
[m]
Does anyone know a way to customise or change the colours for the nextflow console?
OS Dark mode (on Ubuntu 20.04) makes the console pretty unreadable (particularly output...)
Filipe Alves
@FilAlves
@jfy133:matrix.org Why don't you use vscode IDE?
1 reply
James Fellows Yates
@jfy133:matrix.org
[m]
(I do use VSCod(ium), but I'm looking with the console to make very small prototypes to test certain concepts, I'm not actually developing anything in the console)
Filipe Alves
@FilAlves
Try searching how to customise a groovy console.
I found this online https://stackoverflow.com/questions/47893514/how-to-change-the-font-of-groovyconsole
Hope this helps
1 reply
Pablo Riesgo-Ferreiro
@priesgo

Hi people, I stumbled upon an issue for which I am clueless. I have this small R code that I run with Rscript that uses this sequenza library.

Rscript -e 'test <- sequenza::sequenza.extract("${seqz}", verbose = TRUE);'

The above fails when ${seqz} contains the absolute or relative path to a symbolic link, but if it has the "real" path to the file it works. Does someone have a hypothesis for what may be happening?

Nah don't listen to me, this is something else....
Pablo Riesgo-Ferreiro
@priesgo
Same file copied in different locations works or does not work... but consistently the one that works always works... agggh
doesn't seem like a nextflow issue, sorry for the noise
Luca Cozzuto
@lucacozzuto
dear all
sometimes I stumble on a problem with R libraries
using nextflow + singularity
it is like R is looking for libraries in the user space instead of looking at the container
I put in my nextflow.config this but is not helping
env {
    PYTHONNOUSERSITE = 1
    R_PROFILE_USER   = "/.Rprofile"
    R_ENVIRON_USER   = "/.Renviron"
}
any clue?
1 reply
Luca Cozzuto
@lucacozzuto
the script is also running with the parameter --vanilla
Cagatay Aydin
@kmotoko_gitlab
Hello People,
We have parallel workers, which consume messages coming from a server, and then run nextflow run. The problem is, if the number of messages is high, these parallel workers initialize nextflow almost at the same time. This causes the following error to occur in high frequency: Can't lock file: /home/myhomedir/.nextflow/history -- Nextflow needs to run in a file system that supports file locks. I suspect this happens because when one nextflow process puts a lock on the $HOME/nextflow/history, another nextflow process tries to put a lock on the same file, before it is released by the former. Is this something intended? Any ideas how to properly handle this without dirty workarounds?
Luca Cozzuto
@lucacozzuto
sorry are you running nextflow in parallel?
Cagatay Aydin
@kmotoko_gitlab
yes, each worker (12 in total) executes a nextflow process independently, if there are >1 job in queue, they could be in running parallel

I checked the source code, it is raised from here in modules/nextflow/src/main/groovy/nextflow/util/HistoryFile.groovy:

            try {
                while( true ) {
                    lock = fos.getChannel().tryLock()
                    if( lock ) break
                    if( System.currentTimeMillis() - ts < 1_000 )
                        sleep rnd.nextInt(75)
                    else {
                        error = new IllegalStateException("Can't lock file: ${this.absolutePath} -- Nextflow needs to run in a file system that supports file locks")
                        break
                    }
                }
                if( lock ) {
                    return action.call()
                }
            }

The problem is, it tries to lock for a sec (if I'm reading Java correctly) and then quits if it can't. Am I not supposed to run multiple nextflow processes in parallel?

Cagatay Aydin
@kmotoko_gitlab
One workaroud could be to use a different nextflow/history file path for each worker, but apparently it is hardcoded in the same file.
Cagatay Aydin
@kmotoko_gitlab
Also note that the error string is not entirely correct, the filesystem supports filelocks in my case, it is just that the file itself is locked by another nextflow process within a tiny time window
Luca Cozzuto
@lucacozzuto
well I think you are going a bit against the nextflow philosophy here
you should use nextflow to parallelize more than parallelize nextflow
or maybe you can do a nextflow of nextflows...
Cagatay Aydin
@kmotoko_gitlab
I'm not sure, but that might not be possible in our case, because the workers consume messages from another server, prepare the arguments for the nextflow command, and then call nextflow with those arguments
Luca Cozzuto
@lucacozzuto
but then you have an orchestrator that is not working as an orchestrator