Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 18:17
    romanaduraciova commented #2971
  • 17:30
    telatin commented #2856
  • 16:06
    bentsherman commented #2971
  • 15:52
    cjw85 commented #2447
  • 15:13
    cjw85 commented #2447
  • 14:42
    pditommaso commented #3003
  • 14:14
    jorgeaguileraseqera assigned #3003
  • 14:14
    jorgeaguileraseqera opened #3003
  • 14:08
    jorgeaguileraseqera commented #2150
  • 14:06

    jorgeaguileraseqera on 2150-staging-s3-folder-is-painfully-slow-compared-to-aws-cli

    feature: accelerate s3 download… (compare)

  • 13:05

    jorgeaguileraseqera on 2150-staging-s3-folder-is-painfully-slow-compared-to-aws-cli

    feature: accelerate s3 download… (compare)

  • 12:38
    stale[bot] labeled #2603
  • 12:38
    stale[bot] labeled #2588
  • 12:38
    stale[bot] commented #2588
  • 12:38
    stale[bot] labeled #2279
  • 12:38
    stale[bot] commented #2603
  • 12:38
    stale[bot] commented #2279
  • 10:00
    pditommaso labeled #2982
  • 08:51
    jorgeaguileraseqera commented #2611
  • Jun 28 17:29
    romanaduraciova commented #2971
Pierre Lindenbaum
@lindenb
@Midnighter that worked ! thanks ! :-)
1 reply
Luca Cozzuto
@lucacozzuto
Dear all, I got this error when running my pipeline in AWS and resuming
and for this the resume is not working
have you faced something similar?
nextflow.file.FileHelper - Can't check if speci
fied path is NFS (1): /mop2-bucket-1/scratch
Luca Cozzuto
@lucacozzuto
@pditommaso no help? :(
1 reply
Tim Dudgeon
@tdudgeon
I'm not clear on the use of publishDir with DSL2. This announcement suggested that there would be improvements here, and I found this that seems to provide some mechanism, but I was assuming that you should be able to define a process that does not need to worry about publishDir and let the workflow choose which outputs to publish? Is this possible? Any examples?
Steven P. Vensko II
@spvensko_gitlab

Is it possible to use a .count() channel as the size parameter for GroupTuples?

I currently have:

bcftools_index_somatic.out.vcfs_w_csis.groupTuple(by: [0, 1, 2, 3], size: extract_chroms_from_bed.out.chroms_list.count()).set{ vcfs_by_patient }

But I get the error:

Value 'DataflowVariable(value=null)' cannot be used in in parameter 'size' for operator 'groupTuple' -- Value don't match: class java.lang.Integer

Is it possible to convert the .count() channel into something consumable by size:?

1 reply
KyleStiers
@KyleStiers
Does anybody know of resources with more in-depth details on implementation patterns of the sql plugin (https://github.com/nextflow-io/nf-sqldb). I've got it working for basic functionality, but it would be very useful to expand on the documentation a bit or add some examples of how to push/pull to/from a database within a process for example.
3 replies
Alexander Toenges
@atpoint:matrix.org
[m]
I remember there was an inbuilt nf way to display all params in a tabular output. Not $params because this gives a comma-separates map, but something that returns a table. Any ideas?
Haruka Ozaki
@yuifu
Hi! I sometimes get a "file not found" error when using Channel.fromFilePairs with URLs.
Is there any way to set a timeout duration?
Vlad Kiselev
@wikiselev
What is a common way of making a process optional? (e.g. the process execution depends on a boolean parameter). So far I've only come up with an idea of using an empty channel for this. Any other ideas?
And of course I found it after posting: https://nextflow-io.github.io/patterns/index.html#_solution_18
Reed Bell
@utlandr
Does Nextflow support pulling images from a private register for the google-lifesciences executor? I keep getting a pull access denied errorhere (running docker pull separately works fine).
Reed Bell
@utlandr
Update: Totally supported. It appears you need to have docker.fixOwnership = true (and install procps in your image otherwise Nextflow will complain that you don't have ps installed). Best guess is that the mismatch in the ids for the owner of the image manifest file was preventing it from being accessed. Haven't tried it but a quick peek at the source code suggests that setting NXF_OWNER will also make things work.
2 replies
Dennis Hendriksen
@dennishendriksen
We are running a Nextflow workflow with Singularity containers and the SLURM executor. Conceptually I am struggling:
  • the overhead for starting and stopping a job in SLURM is very high
  • I can only use one container per process / looking at nf-core modules Nextflow seems to prefer small processes
Dennis Hendriksen
@dennishendriksen
In my use case I have a elegant workflow with parallelization that in practice performs horribly.
3 replies
I was wondering if anyone else recognizes this struggle. How do you deal with this issue?
ericart89
@ericart89
Hello, anyone knows if it is possible to apply "resume" only for certain processes instead of all of them??
1 reply
Moritz E. Beber
@Midnighter
Does anyone know if nextflow uses log4j? There is a security vulnerability https://www.lunasec.io/docs/blog/log4j-zero-day/
9d0cd7d2
@9d0cd7d2:matrix.org
[m]
hi! Im trying to run some processes with Singularity containers, is it possible to bind internal container directories to the outside dirs on the host?
Brandon Cazander
@brandoncazander
Is there a way to specify a timeout for how long processes stay in a pending state? For example, I want to optimistically use one queue that has unused capacity, but if jobs are pending for x minutes, I would like them to fail and retry on another queue.
Paolo Di Tommaso
@pditommaso
@Midnighter nextflow uses logback
4 replies
Jeremy Leipzig
@leipzig
I'm having trouble with google life sciences.
Command error:
  Execution failed: generic::failed_precondition: while running "nf-2f69f94540149df9eda94c49022f51cc-main": unexpected exit status 1 was not ignored
the bucket is being used as a work directory but there is something not producing .command.out .command.err ..command.trace
i'm using nextflow-21.10.5 - any workflow does this including hello-world
Jeremy Leipzig
@leipzig
one clue might be
gsutil cat gs://mygcpbucket/nextflow/f4/72c513e7a923fb0c80b30fc74c669d/google/logs/output
/bin/bash: /nextflow/f4/72c513e7a923fb0c80b30fc74c669d/.command.log: Permission denied
+ trap 'err=$?; exec 1>&2; gsutil -m -q cp -R /nextflow/f4/72c513e7a923fb0c80b30fc74c669d/.command.log gs://truwl-internal-inputs/nextflow/f4/72c513e7a923fb0c80b30fc74c669d/.command.log || true; [[ $err -gt 0 || $GOOGLE_LAST_EXIT_STATUS -gt 0 || $NXF_DEBUG -gt 0 ]] && { ls -lah /nextflow/f4/72c513e7a923fb0c80b30fc74c669d || true; gsutil -m -q cp -R /google/ gs://truwl-internal-inputs/nextflow/f4/72c513e7a923fb0c80b30fc74c669d; } || rm -rf /nextflow/f4/72c513e7a923fb0c80b30fc74c669d; exit $err' EXIT
+ err=1
+ exec
+ gsutil -m -q cp -R /nextflow/f4/72c513e7a923fb0c80b30fc74c669d/.command.log gs://truwl-internal-inputs/nextflow/f4/72c513e7a923fb0c80b30fc74c669d/.command.log
+ [[ 1 -gt 0 ]]
+ ls -lah /nextflow/f4/72c513e7a923fb0c80b30fc74c669d
total 40K
drwxr-xr-x 3 root root 4.0K Dec 10 23:51 .
drwxr-xr-x 3 root root 4.0K Dec 10 23:51 ..
-rw-r--r-- 1 root root 3.3K Dec 10 23:51 .command.log
-rw-r--r-- 1 root root 5.3K Dec 10 23:51 .command.run
-rw-r--r-- 1 root root   36 Dec 10 23:51 .command.sh
drwx------ 2 root root  16K Dec 10 23:50 lost+found
+ gsutil -m -q cp -R /google/ gs://mygcpbucket/nextflow/f4/72c513e7a923fb0c80b30fc74c669d
1 reply
seems weird to see that generic permissions error
pmtempy
@pmtempy

I was running a Nextflow job with about 2k tasks on AWS Batch. Unfortunately, the Docker container for one of the processes contained an error (Exception in thread "Thread-1" java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper), and I had to kill the nextflow run. I guess I must have hit CTRL+C twice, because while the interactive nextflow CLI progress stopped, I'm still left with thousands of RUNNABLE jobs in AWS Batch.

Is there any quick way to remove them without potentially affecting other nextflow runs using the same compute queue?
How can I avoid similar issues in the future? I.e. how should I properly cancel a running nextflow run and make it clean up its jobs in Batch?

1 reply
Yasset Perez-Riverol
@ypriverol
hi all, i have a code like this:
ch_spectra_summary.map { tuple_summary ->
                         def key = tuple_summary[0]
                         def summary_file = tuple_summary[1]
                         def list_spectra = tuple_summary[1].splitCsv(skip: 1, sep: '\t')
                         .flatten{it -> it}
                         .collect()
                         return tuple(key.toString(), list_spectra) 
                        }
                   .groupTuple()
                   .set { ch_spectra_tuple_results}
is resturning something like
[supp_info.mzid.gz, [[supp_info.mzid, 2014-06-24, Velos005137.mgf, MGF, Velos005137.mgf, ftp://ftp.ebi.ac.uk/pride-archive/2014/06/PXD001077/Velos005137.mgf]]]
but I would like to selec only the last column of the Csv
ftp://ftp.ebi.ac.uk/pride-archive/2014/06/PXD001077/Velos005137.mgf
the result should be:
[supp_info.mzid.gz, [ftp://ftp.ebi.ac.uk/pride-archive/2014/06/PXD001077/Velos005137.mgf]]
Sofia Stamouli
@sofstam

Hello,

I have the following python script in a nextflow process.

process update_image { 

script:
"""
#!/usr/bin/env python3 
import os, subprocess

subprocess.check_call(['singularity', 'pull', 'docker://busybox'])

}

The singularity is installed and is in the $PATH. The config file looks like:

singularity {
    singularity.enabled = true 
    singularity.autoMounts = true
}

However, I get the error: No such file or directory: 'singularity'. Any ideas what might be wrong here?

ChillyMomo
@ChillyMomo709
Hi Nextflow community, I was wondering what exactly determines what's cached and what's not? It seems there are some processes of mine that always start when I resume the pipeline, even though the process has finished before.
ChillyMomo
@ChillyMomo709

Hello,

I have the following python script in a nextflow process.

process update_image { 

script:
"""
#!/usr/bin/env python3 
import os, subprocess

subprocess.check_call(['singularity', 'pull', 'docker://busybox'])

}

The singularity is installed and is in the $PATH. The config file looks like:

singularity {
    singularity.enabled = true 
    singularity.autoMounts = true
}

However, I get the error: No such file or directory: 'singularity'. Any ideas what might be wrong here?

Try the following:

singularity {
    enabled = true
    autoMounts = true
}
Alex Mestiashvili
@mestia
is there a way to assign output of a process to a variable which can be evaluated later in workflow ?
4 replies
Brandon Cazander
@brandoncazander

I have a regular expression in my parameters

params {
    normal_name = /^NF-.*-3.*/
}

that I use to match in my workflow elsewhere

def split_normal = branchCriteria {
    normal: it.name =~ params.normal_name
}

I'm trying to override this parameter as a CLI argument with --normal_name '/^NF-.*-4.*/' but then it's treated as a string in the workflow instead. Is there a good way to handle this, perhaps by compiling the parameter in the workflow?

1 reply
ramakrishnas
@ramakrishnas

Hi, I'm trying to get the sarek pipeline work on our hpc cluster.

here is my command
nextflow run nf-core/sarek -r 2.7.1 -profile singularity -c nfcore_ui_hpc.config --input '/Users/rsompallae/projects/janz_lab_wes/fastqs_1.tsv' --genome mm10

and the error I get is

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (malloc) failed to allocate 32 bytes for AllocateHeap

An error report file with more information is saved as:

I feel like this has to do with some parameter adjustment but I'm not sure how to fix it.

Thanks in advance for your help

KyleStiers
@KyleStiers

I would like to use split -l 1 on an input file and then emit each small file out on its own with a tuple maintaining metadata for the initial file that was split, instead of having all of them contained in that one field of the tuple.

something like this:

process split {

    input:
    tuple path(run), val(plateid), path(file) from ex_ch

    output:
    tuple path(run), val(plateid), path("file_*") into parse_ch

    script:
    """
    tail -n +17 $file  > sample_lines.csv
    split -l 1 -d sample_lines.csv file_
    """
}

But this should ideally emit the number of lines in sample_lines.csv as tasks. With this setup they're all caught into a single channel and you get a tuple that looks like:

['/path/to/run', 'plate_id', 'file_1, file_2, file_3, file_4']

Anyone have a quick way to do this? Maybe it's just a .multimap / .map but I can't seem to get it right.

1 reply
Tobias Neumann
@t-neumann
Hi - I have a script that demultiplexes a fastq file based on an input barcodes file. Now I want to start one mapping process for each demultiplexed fastq file. I do this by writing a CSV file with the file locations for each demultiplexed file, which I then supply to a process that reads a channel based upon the .splitCsv operator. Now this works nicely when running this locally or on a cluster, but now I tried to move this to AWS S3. Is there a way to retreive the s3 directory location where the files are stored and put it in the CSV file? Or how would I approach this?
David Mas-Ponte
@davidmasp
Hello all, I am not sure if this is a silly question. If I have a process that generates 10 files in the same folder. Can I somehow transform them into a channel that emits each file separately. I can get a list of the files like file "*.rds" into group_chunks_list. I tried then fromList but without luck. I am not sure if I am missing smth.
5 replies
anoronh4
@anoronh4
Error: Could not find or load main class run i sometimes get this error immediately when running nextflow, and no files are produced by nextflow. what does this error actually mean? i can see that the .nf script i am calling is accessible and the nextflow executable is also accessible.