Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 15:42
    sstadick commented #1934
  • 15:42
    sstadick commented #1934
  • 15:41
    sstadick commented #1934
  • 14:35
    pditommaso commented #2035
  • 13:31
    drpatelh commented #1819
  • 13:02
    drpatelh commented #1819
  • 10:03
    abhi18av labeled #2050
  • 10:03
    abhi18av commented #2050
  • 10:02
    abhi18av labeled #2050
  • 02:28
    manuelesimi commented #2050
  • Apr 20 21:18
    abhi18av labeled #1591
  • Apr 20 19:19
    chasemc commented #1591
  • Apr 20 16:30
    ismaelperal closed #2053
  • Apr 20 16:29
    ismaelperal opened #2053
  • Apr 20 15:32
    stale[bot] labeled #1628
  • Apr 20 15:32
    stale[bot] commented #1628
  • Apr 20 15:26
    abhi18av locked #2051
  • Apr 20 15:26
    abhi18av closed #2051
  • Apr 20 15:26
    abhi18av commented #2051
  • Apr 20 15:17
    stevekm edited #2051
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
nextflow.enable.dsl=2

process runFacets {
    publishDir 'results'
    container 'repo/image:v1'

    input:
        path hapmapvcf
        path normalbam
        path tumorbam

    script:
        samplename = tumorbam.toString()
        """
        snp-pileup-wrapper.R \
            --snp-pileup-path /usr/bin/snp-pileup \
            --vcf-file $hapmapvcf \
            --normal-bam $normalbam \
            --tumor-bam $tumorbam \
            --output-prefix $samplename

        run-facets-wrapper.R \
            --counts-file $samplename.snp_pileup.gz \                                                                                                                                                               
            --sample-id $samplename \                                                                                                                                                                               
            --purity-cval 1000 \                                                                                                                                                                                    
            --cval 500 \                                                                                                                                                                                            
            --everything \                                                                                                                                                                                          
            -fl /usr/local/lib/R/site-library/ \                                                                                                                                                                    
            -D /host/data/facets/                                                                                                                                                                                   
        """                                                                                                                                                                                                         
}
I just tried simplifying things even more and its still running into trouble
Anton Loukianov
@svyl:matrix.org
[m]
Oh easy
So groovy actually interprets $samplename.snp_pileup as ${samplename.snp_pileup}. You want ${samplename}.snp_pileup.gz
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
ok looks you're right about that part
now the issue is that it's not finding file paths in the docker container
my user id and gid are different than 1000:1000
so i don't know if that plays a role
Anton Loukianov
@svyl:matrix.org
[m]
They should be staged in properly. I dunno. Which paths is it not finding?
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
Error executing process > 'runFacets (1)'

Caused by:
  Process `runFacets (1)` terminated with an error exit status (1)

Command executed:

  snp-pileup-wrapper.R             --snp-pileup-path /usr/bin/snp-pileup             --vcf-file hapmap_3.3.hg38.vcf.gz             --normal-bam blood.bam             --tumor-bam biopsy.bam             --output-prefix biopsy

  run-facets-wrapper.R             --counts-file biopsy.snp_pileup.gz             --sample-id biopsy             --purity-cval 1000             --cval 500             --everything             -fl /usr/local/lib/R/site-library/             -D /host/data/facets/

Command exit status:
  1

Command output:
  Failed to read VCF file: No such file or directory
Anton Loukianov
@svyl:matrix.org
[m]
Now when you go to the work directory, do you see any files?
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
yea
there are a few links i think
Anton Loukianov
@svyl:matrix.org
[m]
That's the default stagein mode
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
i think docker doesn't like soft links
Anton Loukianov
@svyl:matrix.org
[m]
Maybe... Can you replace that command with stat?
Or put it first?
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
hmm i think i made a stupid path error
something's happening now
yea it looks like its doing snp-pileup
yikes, thanks for the help
ill probably be back soon
Anton Loukianov
@svyl:matrix.org
[m]
:)
Good luck!
chocolatebarbrain
@chocolatebarbrain:matrix.org
[m]
thank you
Phanindra Balaji
@ItachiBal
Hi, I have a question regarding the nextflow channels. It is easy to retrieve the name of the file(sample_ID) from the "fromFilePairs" channel factory method. But I have many BAM files and I am not able to retrieve the name or sample_ID. I tried all the other channel methods but they just create a channel with the files. Is there any way to get the "sample_ID/name" or is it only for paired-end reads? Thanks in advance.
Juan MONROY-NIETO
@jmonroynieto
As far as I know, this is only a thing that that file pair channels do. If you need to recover your sample id, consider either passing a tuple with the value from any upstream process or access the path as text after the [shell/script] declaration but before the multiline script string. This can be done with .toString() method and then you would manipulate the string.
process only_has_file {
    input:
    path x from ch
    output:
    tuple path("result.txt"), val(sample_id) into output_ch
    shell:
    sample_id = x.toString()
    sample_id= sample_id - "string to subtract"
    """
    echo "these are your results" > result.txt
    """
Laurence E. Bernstein
@lebernstein
@ignaciot This sounds suspiciously like a problem using multiple inputs but not joining them so they are not synchronized.
Juan MONROY-NIETO
@jmonroynieto

I have a process monitoring progress to help me resume more cleanly (I am deleting files and starting with a list so no way for nf to know that I no longer want to process those files). I have a problem setting the publishDir directive: the files that I am looking for stay buried in the working directory. This is my process:

process monitor {
    executor 'local'
    publishDir "monitor", pattern: "*.ok", mode:'move'
    output:
    path "*.ok" optional true into null_ch
    exec:
    SRA_ok.collectFile(name: "sra.ok", newLine: true)
    FASTQ_ok.collectFile(name: "fastq.ok", newLine: true)
    KRAKEN_ok.collectFile(name: "fastq.ok", newLine: true)
    isMOLTYPE_ok.collectFile(name: "molType.ok", newLine: true)
}

What could I do to have the file moved to my expected directory?

brusselsproutaficionado
@brusselsproutaficionado:matrix.org
[m]
Hi, I was wondering how to correctly pipe output from one process to another
I have this toy code I was adapting to dsl2

```#!/usr/bin/env nextflow

nextflow.enable.dsl=2

process splitSequences {
publishDir 'results/orig'
input:
path dotfa

output:
    path 'seq_*'

"""
csplit $dotfa '%^>%' '/^>/' '{*}' -f seq_
"""

}

process revRecords {
publishDir 'results/rev'
input:
path x

output:
    path 'rev_*'

script:
    samplename = x.toString()
    """
    cat $x | rev > rev_${samplename}
    """

}

workflow{
dotfa = Channel.fromPath('./*.fa')
splitSequences(dotfa) | revRecords
}

sorry got cut off
The error is that it only processes the first record in the revRecords step
from splitSquences
brusselsproutaficionado
@brusselsproutaficionado:matrix.org
[m]
results/orig has two files seq_00 and seq_01
results/rev has only 1 file that is named rev_seq_00, but it looks like it actually has the results of the reverse of seq_01
It seems like revRecords is attempting to process the entire outputs of splitSequences in one go?
brusselsproutaficionado
@brusselsproutaficionado:matrix.org
[m]
Ok it definitely seems like it's trying to do it one go
i just changed the command to wc -l, and its giving me 61 seq_00, 421 seq_01 and 490 total
Ok i just figure out what missing, the splitSequences.out.seq needed a .flatten() at the end
brusselsproutaficionado
@brusselsproutaficionado:matrix.org
[m]
in dsl2, how do you declare memory usage for a process?
brusselsproutaficionado
@brusselsproutaficionado:matrix.org
[m]
sorry dumb question, it's just memory = 64.GB in the main process
Pierre Lindenbaum
@lindenb

@Itachibal

Hi, I have a question regarding the nextflow channels. It is easy to retrieve the name of the file(sample_ID) from the "fromFilePairs" channel factory method. But I have many BAM files and I am not able to retrieve the name or sample_ID. I tried all the other channel methods but they just create a channel with the files. Is there any way to get the "sample_ID/name" or is it only for paired-end reads? Thanks in advance.

create a process that extract the samples using the read-groups

cat ${bamlist} | while read bam; do samtools view -H "\${bam}" | grep "^@RG" | tr "\t" "\\n" | grep ^SM: -m1 |  cut -d ':' -f 2 | tr "\n" ," >> out.csv && echo \${bam} >> out.csv; done
Phanindra Balaji
@ItachiBal
@lindenb Thanks a lot. :)
Haruka Ozaki
@yuifu

Hi, is it possible to specify the name of .nextflow.log or make a copy of .nextflow.log when executing nextflow run (by some options)?

We have difficulty in specifying problems when supporting end-users who often overwrite .nextflow.log (and .nextflow.log.1, ..., 9) after the rounds of troubleshooting. It might cause human errors if we ask them to cp .nextflow.log somewhere whenever they execute nextflow run.

Juan MONROY-NIETO
@jmonroynieto
It's a little hacky, but you could try to create a process that does tail -F ${launchDir}/.nextflow.log >> \$(whoami)_\$(randonchar).log and then use workflow.onComplete or workflow.onError to instruct it to copy that new log file to a safe location before the process gets shot down. I found out about the -F option here
Cedric
@Puumanamana
Hi,
Is it possible to configure a Nextflow process so that it runs in a conda environment within a docker container?
Philip Reiner Kensche
@vinjana
Hi. Is there a way to set the cluster job name/identifier to specific values? The only thing that might work is maybe clusterOptions, but I was wondering whether there is a way that is independent of the cluster-system (like memory etc.). We use LSF where this would be bsub -J $clusterJobName. We'd use the name to select cluster jobs by job-type, for statistics and monitoring.
Haruka Ozaki
@yuifu
@jmonroynieto Thanks for your advice!! We will try it.