by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 07 18:18
    ctb closed #1907
  • Sep 07 18:18
    ctb commented #1907
  • Aug 29 14:08
    mars188 commented #1907
  • Aug 24 21:29
    ctb commented #1907
  • Aug 22 17:28
    mars188 commented #1907
  • Aug 22 17:11
    mars188 commented #1907
  • Aug 22 16:36
    ctb commented #1907
  • Aug 22 15:58
    mars188 commented #1907
  • Aug 22 13:55
    ctb commented #1907
  • Aug 22 13:49
    mars188 opened #1907
  • Jun 24 17:37
    Lyannic edited #1906
  • Jun 24 17:34
    Lyannic opened #1906
  • Jun 22 15:34
    ctb commented #1905
  • Jun 12 19:41
    standage commented #1905
  • Jun 12 19:34
    zmunro opened #1905
  • Mar 22 16:27
    ctb commented #1904
  • Mar 22 16:26
    ctb commented #1904
  • Mar 20 02:54
    dkoslicki opened #1904
  • Feb 12 01:54
    luizirber closed #1903
  • Feb 12 01:54
    luizirber commented #1903
Daniel Standage
@standage
/usr/local/bin/{python,python2,pip} etc were all symlinks to a framework install on the system
Tim Head
@betatim
:-/
Daniel Standage
@standage
that python setup was trying to run gcc-4.2 to compile some of the code
must've been a default, because it wasn't specified anywhere in our setup.py or anything
brew install python followed by brew link python did the trick
...aaaaaand then I removed gcov
make coverage.xml (or whatever it's called) was segfaulting
not a long term solution, but...
Tim Head
@betatim
humm
weirdo
I think I've gotten to the bottom of why the coverage has dropped: https://github.com/gcovr/gcovr/issues/140#issuecomment-243157655 currently my diagnosis is that at some point jenkins stopped using a special version of gcovr
I mean, there is GCOVRURL in the Makefile which looks like it should be used by pip to install gcovr but isn't actually used ever and from the branch name it sounds like it does something with unreachable branches, which is the commandline argument that seems to "fix" things
maybe not the most coherent sentence ...
Daniel Standage
@standage
khmer has definitely installed custom package versions before, hosted at ci.oxli.org
maybe it was something like that
Tim Head
@betatim
Kevin Murray
@kdmurray91
Our manuscript for kWIP is finally out. Thanks to all here for your help getting khmer to play nice! http://biorxiv.org/content/early/2016/09/16/075481
Philipp Schiffer
@evolgenomology

Hi guys! (sorry found this room only after asking this by email)

After a long while I am returning to khmer for a new genome, but find myself a bit confused about the best approach to digital normalisation (I want ~100x) with your pipeline v.2.0.
One thing is that filter-abund.py apparently needs the kmer hash table, but normalize-by-median.py appears to lack the --savehash option now.
Would/could you maybe point me to the most recent example of the workflow? That would be very helpful.

Cheers

Philipp

hmyan90
@hmyan90
Hello, can anyone please explain about the file: dib-lab/khmer/data/100k-filtered.fa? what's the means of the data
850:2:1:1118:7944/1
TTAATTTTGGAAACCCTGCAATAAAGTCACAACATTGC
I am collecting some DNA sequence data for personal test, I want to use the data, thank you!
Daniel Standage
@standage
Hi @hmyan90! The provenance of this data is not documented. Given the read ID, I would have assumes it's real data from an Illumina sequencer, and I wouldn't be surprised if it's from something like E. coli. But I can't confirm that. Really, the purpose of this file as far as khmer is concerned is to make sure the software handles Fastq and Fasta files correctly.
Hope this helps!
hmyan90
@hmyan90
Hi, Thank you for replying, I have figure out it is from NCBI database. And NCBI has exactly what I need data. @standage
MessyaszA
@MessyaszA
Hello, does anyone know if the latest version of khmer still has a functioning filter-below-abund.py script in the sandbox directory? I am trying to run the khmer/sandbox/filter-below-abund.py script in order to trim off high-abundance kmers for a metagenome assembly.

And this is the error I receive:
Traceback (most recent call last):

File "/local/cluster/khmer-legacy/sandbox/filter-below-abund.py", line 49, in <module>

main()

File "/local/cluster/khmer-legacy/sandbox/filter-below-abund.py", line 22, in main

ht = khmer.load_counting_hash(counting_ht)

AttributeError: 'module' object has no attribute 'load_counting_hash'

I am concerned that I am getting this error because the filter-below-abund.py script is no longer part of the khmer pipeline.

The newest installed khmer in our linux /local/cluster/bin is version 2.0+103.g8300de0, but the filter-below-abundance.py script did not show up after the installation.

The script I used came from an older installation of khmer in /local/cluster.

(The python version I am using: Python 2.7.14, and the OS Version:
Linux 3.10.0-693.11.6.el7.x86_64 x86_64)

I wanted to know if anyone would know why I am getting this error, if the filter-below-abund.py script should be included in installations of the latest khmer version, and if this script is still functioning.

Daniel Standage
@standage
Hi @MessyaszA. The error message makes me think that there is some kind of conflict or confusion regarding the two different khmer versions installed on the cluster.
Perhaps the filter-abund.py or the filter-abund-single.py scripts from the newer version can satisfy your needs?
MessyaszA
@MessyaszA
@standage I'm not sure if filter-abund.py and filter-abund-single.py would satisfy my needs. Those scripts trim low abundance kmers, but for metagenome assembly I see that the opposite is recommended - trimming high abundance kmers. When looking at the newer version I don't see any commands in those scripts that would allow me to trim high abundance kmers rather than low abund. kmers. I'm also wondering if anyone has a recommendation for metagenome assembly that would either skip this step or use a different method.
Daniel Standage
@standage
I know my colleagues have used the variable coverage trimming options on transcriptome and metagenome data. That is not something I have worked much with. Let me ping @ctb and see what he has to say.
C. Titus Brown
@ctb
yo
hi @MessyaszA we recommended doing two things in the past but things have changed
so the two old things were -
  • trim low abundance k-mers with variable coverage approach
  • trim high abundance k-mers b/c of partitioning etc
with newer assemblers like megahit, I would say
  • assemble with megahit if you can! it will work except for really really big metagenomes.
C. Titus Brown
@ctb
if you really want to do k-mer trimming to reduce memory requirements prior to assembly, then the instructions here https://peerj.com/preprints/890/ are what I would suggest. we use this a lot, but not for assembly specifically.
it basically comes down to using either 'filter-abund.py -V' or 'trim-low-abund.py -V' from khmer.
but honestly I don't think you need to do any trimming prior to metagenome assembly unless you are trying to lower memory requirements, which megahit probably won't need.
HTH!
If you want to ask some more questions, please just file an issue at github.com/dib-lab/khmer and we will help you there! I don't get notifiations from gitter :(
MessyaszA
@MessyaszA
@standage @ctb thank you for the advice!
Brad Langhorst
@bwlang
I’d like to get a list of the most abundtant kmers in my sample set… what’s the right script for this? find-knots?
i should mentions… i’ve already created countgraphs with load-into-counting and prepared abundance histograms...
image.png
I want to findout the identity of sequences in that 31k + category.
(these are 6-mers)
Brad Langhorst
@bwlang
i’m trying partition_graph -> find_knots now… please let me know if that’s the wrong path… thanks!