by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 29 22:30

    jimallman on master

    Remove log chatter after testing (compare)

  • May 29 22:26

    jimallman on master

    Fix options going to restart-ap… (compare)

  • May 29 21:59

    snacktavish on master

    Fix path to taxo-browser (alway… Merge pull request #162 from Op… (compare)

  • May 29 21:59
    snacktavish closed #162
  • May 29 21:59
    snacktavish closed #161
  • May 29 21:56
    jimallman opened #162
  • May 29 21:54

    jimallman on fix-taxo-browser-path

    Fix path to taxo-browser (alway… (compare)

  • May 29 20:12
    jimallman assigned #161
  • May 29 20:11
    jimallman commented #161
  • May 29 14:03
  • May 28 23:08
    mtholder commented #161
  • May 28 23:06
    mtholder opened #161
  • May 28 22:08

    mtholder on master

    oh, I am a silly goose, aren't … (compare)

  • May 28 22:05

    mtholder on master

    ok, let me see this one tb (compare)

  • May 28 21:59

    mtholder on master

    need to pass in data to and log… (compare)

  • May 28 21:03

    mtholder on master

    redis config is now config-dep (compare)

  • May 28 20:59

    mtholder on master

    variable name for user Merge branch 'master' of github… (compare)

  • May 28 20:40

    snacktavish on master

    avoid opentree server error whe… Merge pull request #1237 from O… (compare)

  • May 28 20:40
    snacktavish closed #1237
  • May 28 20:25

    mtholder on master

    Recognize and isolate calls for… hide all ancestor-siblings more style fixes and 14 more (compare)

Benjamin Redelings
@bredelings
So I'm not completely happy about the 0.26 second response time from my laptop to devapi, but I guess I have to blame "the internet".
Yan Wong
@hyanwong
Nancy Moran at UT Austin has been in touch with OneZoom about the weirdness of the bacterial tree, which we take straight from the OpenTree. She recommends Hug et al as a good backbone (https://www.nature.com/articles/nmicrobiol201648/). I'm asking her for a newick, but apart from adding it as another study, there anything I need to do for studies like this that are likely to bring about major, deep rearrangements to the current OpenTree topology for an entire domain?
Jonathan A Rees
@jar398
It's a mess, I agree. In Open Tree history, prokaryotes a triage victim; we were initially using NCBI and one of the PIs said we should use SILVA instead. But we had to keep NCBI to the extent possible because source tips had already been matched to it. One of her postdocs did an initial alignment of the two, and then I rewrote that code. I felt I was spending way too long on it and cut bait, even though it's a very interesting problem.
  • Before proceeding it's important to clarify which is most important: tip assignments for phylogeny synthesis, or a classification with stable names for higher taxa that can be used for annotation and exploration. I think the tips are the main thing since there's not major consequences to a higher classification change (they aren't referenced by many source trees).
  • A Newick is not enough. In these groups you have to have a reference to one genbank record per group, at the very least, as a sanity check. Ideally all the 'type material' in NCBI would be represented, and the reference sequences for clusters.
Mark T. Holder
@mtholder
I certainly agree that the bacterial portion of the tree is not in good shape. Before our hack, @bredelings was working on (and pretty much done with) an enhancement to our synthesis pipeline that can use our https://github.com/OpenTreeOfLife/script-managed-trees repo
Ben is pretty much done with his optimization of the TNRS (that work was triggered by a need to make our software stack slimmer after the hack)
So, we should be able to get back to building a synth with a "script-managed-trees" The general use-case for that was huge trees that are impractical to deal with in the curation context
The reason that I bring that up here, was that our intent was to update the 1 tree that is in https://github.com/OpenTreeOfLife/script-managed-trees to be the latest tree from https://gtdb.ecogenomic.org/
The GTDB effort seems very impressive to me, and they have a lot of the associations between taxa and accessions in other DB's that provide a lot of the info that would be good for updating the taxonomy.
Mark T. Holder
@mtholder
The massive polyphyly of a lot of bacterial taxa means that our current synth pipeline does not works very well in the context of a taxonomy with a high proportion of groups that are contested by the phylogenetic inputs.
So building the synth using the tree with just a (older) GTDB tree and no changes to our taxonomy was pretty slow (a couple of hours) and produced an even larger number of egregiously large polytomies.
I suspect that we'd have a simalar issue with the Hug et al tree, but I'm happy to try it.
Benjamin Redelings
@bredelings
Part of the issue with bacterial in our current synth tree is that major groups (e.g. 'Proteobacteria') are missing because of conflicts with some input trees.
Benjamin Redelings
@bredelings
For example, pg_2448 breaks Euryarcheota, Firmicutes, Proteobacteria, Delta-proteobactera, Beta-proteobacteria, Alpha-proteobacteria, Spirochaetes, etc.
Yan Wong
@hyanwong
Thanks for the info @mtholder and @bredelings. It would be great to get bacteria sorted somehow, but I don't know the best way forward, given disagreements. Anything I can ping back to Nancy?
Benjamin Redelings
@bredelings
It looks like using some of these bacteria trees to update OTT might be quite helpful.
I'm curious how much of the broken-taxa-in-bacteria issue is misplaced taxa in OTT.
The other study that breaks Proteobacteria is pg_2542. The problem seems to be that OTT places Klebsiella variacola (https://tree.opentreeoflife.org/taxonomy/browse?id=20649) in Bacteriodetes > Flavobacteria, whereas all the other Klebsiella are placed in Proteobacteria > Gammaproteobacteria. Maybe we could use one of the Bacterial trees as a backbone in OTT construction?
At least now it is a lot easier to figure out why these taxa are getting broken.
Mark T. Holder
@mtholder
I do think that that it is a deep taxonomy issue. but i'm not sure that it is easy to fix
I guess the general message to Nancy would be: we are willing to try putting the Hug et al tree in as a highly ranked tree. I fear that we are going to need to devise some way to either incorporate info from trees into a taxonomy build, or add some special preprocessing steps to the synth pipeline to deal with the pervasive issues with taxonomic names.
we are definitely open to working with interested parties on a long term fix, but I'm not super optimistic that a tree or two by itself will clear up the mess.
Mark T. Holder
@mtholder
perhaps if we implement constraints in the synth pipeline and put in some strong constraints for the bacteria it'll improve things (it almost certainly could help with our runtimes)
Yan Wong
@hyanwong
I think the Hug tree was a simple suggestion as she didn't know the intricacies. I agree that it seems unlikely to be an easy fix like this.
Benjamin Redelings
@bredelings
(As another data point, it looks like pg_2448 breaks Euryarcheota because NCBI places Thermoplasmatales in Euryarcheota, where as the phylogeny does not.)
Yan Wong
@hyanwong

She later said:

Both the Segata et al and the Lang et al paper have very decent bacterial trees -- but neither looks anything like the one in OneZoom (i.e. OpenTree). I think a clue for how this happened is a note on the OpenTree page that says root may be arbitrary. Basically what you have is the equivalent of rooting the Eucarya on the internode between humans and chimps, and leaving everything outside the Old World Monkeys as a single node, so the basal branches are gorillas and macaques, and plants and ants end up lumped together.

I will see about getting a better tree, but both the Segata or the Lang ones would be very good starts, especially Segata. It is rooted (with Archaea) and is strictly bifurcating (even though the order of splitting of phyla 3 billion years ago is probably impossible to really resolve). However, I didn't see a newick file mentioned, but will check further.

Mark T. Holder
@mtholder
there really is a pretty profound culture clash in terms of how botanists/mycologist/zoologists use taxonomy and how the microbiologists do
(re the point about rootings: We could look into alternative rootings in some automated fashion)
Benjamin Redelings
@bredelings
pg_2448 = Segata tree.
Yan Wong
@hyanwong
I do think that one of the main sources of conflict with deep Eubacterial phylogenies is likely to be rooting.
But there are probably ways of getting a better backbone even with rooting disagreements
Benjamin Redelings
@bredelings
There is a note about possibly incorrect rooting on the Segata tree, but the root is on the branch between Archaea and Bacteria, so it seems right to me. Maybe it got fixed and the note was not removed.
Emily Jane McTavish
@snacktavish
We do have the Hug tree queued for synthesis, but will only 800/3000 taxa mapped.
Yan Wong
@hyanwong
:thumbsup:
(I did look for it in the curator using the search term "Hug" but missed it, sorry)
Emily Jane McTavish
@snacktavish
No prob - the search isn't super granular.
Yan Wong
@hyanwong
What's the story with the GTDB if it ever gets incorporated? Is there any way that can be used as a backbone?
Mark T. Holder
@mtholder
That is my hope. but I'm not an expert on bacterial taxonomy, so I'm certainly open to suggestions on that
Brian O'Meara
@bomeara
ooh, this all reminds me -- I should add big trees to the script managed trees repo
Jonathan A Rees
@jar398
I think first of all, we need a designated type sequence for every bacterial tip. SILVA provides this, and we could capture their reference sequence designations if we wanted to. I don't know about any of the others.
Then we would have a way to align tips, at least, that could be made to be practical, since all of these efforts rest on sequences and the tip-like buckets that they go into. And at that point it would not be too hard to replace one higher taxonomy with another.
Remember that OTT's taxonomy is from SILVA, with NCBI grafted on at various places. So if you really want to understand current 'misplacements' you need to spend some quality time first with SILVA, and then with those NCBI tips that are not covered by SILVA. (I'm not saying it's necessary to understand; that's up to you.)
Jonathan A Rees
@jar398
OTT's report on 'broken' taxa (those in a source that are broken by a higher priority source) needs to be improved. Currently it contains the information required to understand any NCBI issues, but it is all buried among a lot of rows that don't matter. I wrote this code before I had a good understanding of RCC-5 and how to use it.
Disagreements between SILVA and parallel bacterial efforts are scientifically and methodologically interesting - not just quirks of open tree - and could yield a very nice paper / contribution.
(although maybe someone has already done this.)
Jim Allman
@jimallman
@hyanwong, i just wanted to report some progress on an isolated OpenTree feedback tool (live on our dev site) that should be usable within OneZoom and other sites.
if you look at the URL, you'll see that you should be able to open a node-specific comment tool by building the proper URL. you'll need to supply the latest version of the synthetic tree and the assigned node id, like /opentree/feedback/{synth-version}@{synth-nodeid}
Jim Allman
@jimallman
This should work in a new browser window or iframe. it might require some cross-domain fiddling if we want to embed it in your main page. Storage and behavior is currently the same as in the OpenTree synth-tree viewer, since it's just a mangled version of that page.
Jim Allman
@jimallman
one interesting question is, do we want to mingle feedback from different tree viewers? currently we skip the work of loading the target node's subtree, but that means we can only index these feedback items by URL (example: OpenTreeOfLife/feedback#478 ) vs. the more specific values used in our synth-tree viewer (example: OpenTreeOfLife/feedback#476 ). as a result, the OneZoom feedback would show up in the OpenTree viewer (which has all the metadata), while OneZoom would only show feedback from itself (having only the URL to match). is this a happy accident? or do we want everyone in the same conversation?
sorry if this is confusing. i'm happy to clarify!