Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:31
    jordeu synchronize #3584
  • 13:31

    jordeu on fusion-no-entry

    Remove .command.log tee and tra… (compare)

  • 11:31

    pditommaso on master

    Log exception when an unexpecte… (compare)

  • 11:31
    pditommaso closed #3603
  • 10:14
    tcrespog synchronize #3603
  • 09:59
    tcrespog commented #3603
  • 09:56
    pditommaso commented #3603
  • 09:54
    tcrespog synchronize #3603
  • 09:48
    tcrespog opened #3603
  • 08:08
    MRTFrank closed #3602
  • 08:08
    MRTFrank commented #3602
  • 08:08
    MRTFrank commented #3602
  • 08:06
    MRTFrank edited #3601
  • 08:06
    MRTFrank opened #3602
  • 08:02
    MRTFrank edited #3601
  • 08:01
    MRTFrank opened #3601
  • 07:30
    MRTFrank commented #1010
  • 06:10
    jordeu synchronize #3584
  • 06:10

    jordeu on fusion-no-entry

    Fix Wave debug command Signed-… (compare)

  • 04:27
    jemunro commented #3423
Alaa Badredine
@AlaaBadredine_twitter
nice, good luck
Alaa Badredine
@AlaaBadredine_twitter
@pditommaso could you please help me with a problem ?
I tried to post on Google forum but I didn't receive any reply so far. I have a process that copies information from one place to another. This process does not resume when I relaunch the pipeline. What could be the possible reasons ? How can I investigate further ?
Stijn van Dongen
@micans
@AlaaBadredine_twitter nextflow should just look at its inputs to decide whether to rerun a process or not. Maybe it needs an output as well? In that case you could create a dummy output file after the rsync. (documentation says The caching feature generates a unique key by indexing the process script and inputs. This key is used identify univocally the outputs produced by the process execution.).
Alaa Badredine
@AlaaBadredine_twitter
@micans the process does take an input and gives a dummy output after the copy. I tried an rsync and a cp but both of them are not resuming
the process before it creates directories by doing a mkdir, maybe that's reason its failing ?
however, the processes that runs in parallel are cached so that might not be the issue ...
Stijn van Dongen
@micans
you wrote When I do an rsync or a copy, the process is not cached and it redoes the analysis - It's not clear to me what happens. There is an analysis bit and an rsync bit. What is and is not and should and should not be redone?
Alaa Badredine
@AlaaBadredine_twitter
by analysis I meant rerun the whole analysis pipeline
it starts by creating directories then copying the files in a particular path, here's an example
image.png
so the copy part is not cached
I mean it is cached I believe but somehow he's not able to recognize the index
and that's the script block inside the copy:
    script:
        """
        cp ${nfPath} ${toScripts}
        cp ${params.sampleSheet} ${toRaw}/SampleSheet.csv
        cp -r \$(dirname "${workflow.scriptFile}")/bin ${toScripts}
        echo ${params.userName} > ${toLogs}/user.txt

        echo ${capture} > capture.txt
        echo ${dbsnp} > dbsnp.txt
        echo ${knownSites1} >> dbsnp.txt
        echo ${knownSites2} >> dbsnp.txt

        cp --verbose -rf ${fromNAS}/* ${toRaw} 2> copy.log

        #rsync -a ${fromNAS}/ ${toRaw} --log-file=rsync.log
        #rsync --version > rsync_version.txt

        $lbzip2 -n 10 --best copy.log
        #$lbzip2 -n 10 --best rsync.log

        touch copyProcess.ok
        """
Stijn van Dongen
@micans
Alright, it's hard to tell from here. I'd try to make a small toy example if that's possible.
Alaa Badredine
@AlaaBadredine_twitter
I shall do that
thanks
Stijn van Dongen
@micans
yw, good luck!
Stijn van Dongen
@micans

@pditommaso as an alternative to this pattern: http://nextflow-io.github.io/patterns/index.html#_problem_19
I've made https://github.com/micans/nextflow-idioms/blob/master/ab-abc-until.nf
It uses

ch_skipB.until {  params.doB }.set { ch_AC }
ch_doB.until { !params.doB }.set { ch_AB }

Instead of

(ch_AC, ch_AB) = ( params.doB ? [Channel.empty(), ch_doB] : [ch_skipB, Channel.empty()] )

This made me wonder; first if there is a drawback to using until like this, and second, the example uses into { ch_doB; ch_skipB } just before. I am envisioning into syntax extended like this:

into { .until{  params.doB }.set{ ch_AC }; 
       .until{ !params.doB }.set{ ch_AB }
     }

I'm not sure it's possible or that useful, but wanted to document the thought. []

Riccardo Giannico
@giannicorik_twitter
@AlaaBadredine_twitter do you see this file to be created ${toRaw}/SampleSheet.csv? I'm wandering if you have write permissions under ${toRaw} .
Does the .nextflow.log file report any error? Does it report if the process you need has been "submitted" or "chached" ?
Alaa Badredine
@AlaaBadredine_twitter
@giannicorik_twitter yes I do, after all, the system we have here is only under root, so we don't have issues with writing/reading permissions
bit of weird tbh but that's how it is
Michael Adkins
@madkinsz
Hi! I have a question about parsing variables from the input channel made fromPath. I'm interested in parsing project names from a path doing operations on entire projects. Here's kind of a pseudocode pipeline: https://hastebin.com/raw/tutifepuxa
I can't really find any clear documentation about how to parse variables from paths
danchubb
@danchubb
Hi, I'm enjoying getting the hang of nextflow but I've hit a brick wall with what is hopefully a simple problem. When parsing a csv file using splitCsv() how do you access columns where the header has spaces? e.g. if the col is filename then it is ${row.filename} what if it is "file name" ? backticks? quotes? Thanks a lot - Dan
Stijn van Dongen
@micans
@danchubb I had a quick try, this seems to work:
#!/usr/bin/env nextflow
Channel
    .from( 'alpha,beta,gam ma\n10,20,30\n70,80,90' )
    .splitCsv(header: true)
    .subscribe { row ->
       println "${row.alpha} - ${row.beta} - ${row.'gam ma'}"
    }
(edited to show it's easy to test a small snippet -- this was taken from the splitCsv documentation)
danchubb
@danchubb
great, thanks a lot for the help.
Michael Adkins
@madkinsz
Does nextflow attempt to ignore duplicating input files as output files? e.g. I'm getting an error: Missing output file(s) *.fastq expected by process merge_nextseq_lanes when calling a script that renames files in place. There are many .fastq files in the working directory but it cannot find any?
Paolo Di Tommaso
@pditommaso
input file names are not captured by globs
Michael Adkins
@madkinsz
Is there a way to make them captured?
Or is calling a script to combine/rename some of the files bad practice? I want to take a collection, rename a small subset or combine some, then pass all the resulting files as a new channel
Paolo Di Tommaso
@pditommaso
not a good idea, a task should produce its own outputs
Michael Adkins
@madkinsz
Okay, I don't understand how preprocessing tasks are supposed to work then. I have a tool that needs to operate on all of the fastq files but some of the fastqs require preprocessing first.
Paolo Di Tommaso
@pditommaso
you can have the pre-proc task getting some of the fastqs, and another task getting all fastqs + out of the pre-proc
makes sense?
Michael Adkins
@madkinsz
That does make sense but I don't know how to exclude the ones that would be preprocessed from the all fastqs.
Since preprocessing requires all the fastqs to be collected so that some can be merged
Paolo Di Tommaso
@pditommaso
glob pattern? csv file? you should have a criteria to express that
Michael Adkins
@madkinsz
You're right. That should be reasonable, I'll look into that. Thank you.
Paolo Di Tommaso
@pditommaso
@micans the alternative may work (haven't tried), the second proposal it looks to much creative ..
Michael Adkins
@madkinsz
Can you use the channel factory / builder in the input/output parts of a process?
The connection between those two syntactic forms is kind of unclear
Stijn van Dongen
@micans
@pditommaso the alternative works ... (pretty sure, tested it). It's not that creative ... it unleashes huge possibilities :grin: ... I found the need to introduce extra channel names a little bit annoying ... so I was thinking about ways to get an implicit channel into into.
Paolo Di Tommaso
@pditommaso
you can create as many Channel.fromPath('foo*.fastq') as you need
I found the need to introduce extra channel names a little bit annoying
I understand, but dsl-2 won't require anymore to create channel dups
Michael Adkins
@madkinsz
@pditommaso but how do you use that within a process rather than at the head of a .nf file?
Stijn van Dongen
@micans
Cool @pditommaso I'll check it out. I think these extra names may be because of the a(b)c optional b process rather than into duplication, but will check for sure.
Paolo Di Tommaso
@pditommaso
ch1 = Channel.fromPath('*.fasta')
ch2 = Channel.fromPath('*.fasta')

process foo {
  input: 
  file x from ch1
  .. 
}

process bar {
  input: 
  file x from ch2
  .. 
}
@micans check it out! I need your feedback to move it on :D