Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Dec 13 18:27
    dependabot[bot] labeled #1215
  • Dec 13 18:27
    dependabot[bot] opened #1215
  • Dec 13 18:27

    dependabot[bot] on npm_and_yarn

    Bump mixin-deep from 1.3.1 to 1… (compare)

  • Dec 13 18:26

    jimallman on master

    Try and show contesting trees. Add conflicting-tree links (and… Stub in asynchronous fetch of c… and 5 more (compare)

  • Dec 13 18:26
    jimallman closed #1214
  • Dec 12 15:07

    kcranston on irmng-update

    new irmng location dwca import script for irmng (compare)

  • Dec 11 23:21
    mtholder opened #86
  • Nov 29 18:47
    TonyRees commented #343
  • Nov 29 16:05

    kcranston on jar398-patch-1

    (compare)

  • Nov 29 16:05

    kcranston on master

    Homo sapiens sapiens needs to b… Merge pull request #374 from Op… (compare)

  • Nov 29 16:05
    kcranston closed #374
  • Nov 29 16:04

    kcranston on dwca

    (compare)

  • Nov 29 16:04

    kcranston on master

    work in progress on improved im… gbif script seems to work made 'make refresh/gbif' work and 11 more (compare)

  • Nov 29 16:04
    kcranston closed #353
  • Nov 29 16:04
    kcranston commented #353
  • Nov 25 19:21
    jimallman commented #1214
  • Nov 25 19:19
    jimallman review_requested #1214
  • Nov 25 19:19
    jimallman review_requested #1214
  • Nov 25 19:19
    jimallman review_requested #1214
  • Nov 25 19:19
    jimallman commented #1214
Karen Cranston
@kcranston
Do you just want a different rank, or a flag, or actually moved in the taxonomy (if so, then what is the parent?)
I don't tjink there is anything in the code to identify those, but it would be a fairly simple regex
Benjamin Redelings
@bredelings
Hmm, we have an infraspecific flag.
I wonder how that is getting assigned...
Looking at these, I see names like Escherichia coli O145 str. RM9872, Escherichia coli O91, Xenorhabdus bovienii str. oregonense
A lot of things that are incertae sedis seem to be getting that way because people have an individual specimen, and classify it only down to the level of family or (if we are lucky) genus.
Since a lot of these things are really specimens, not species, it seems like we should reflect that somehow.
Karen Cranston
@kcranston
$  source git:(irmng-update) ✗ grep "sp\." taxonomy.tsv | wc -l     
 1128110
$  source git:(irmng-update) ✗ grep infraspecific taxonomy.tsv| wc -l
   69623
ott3.2
Benjamin Redelings
@bredelings
I like the idea of an "individual" rank, but that may not fit with existing practice.
Ha.
Yes.
This is partly why including the incertae sedis stuff in the synthetic tree makes it unbrowseable.
I did not realize that it was so high though!
Unfortunately there are definitely other patterns though, such as Oceanospirillales bacterium JGI 02_I10. But maybe those are less important.
Hmm... I wonder if we are getting the sp. mostly through ncbi and/or silva.
Karen Cranston
@kcranston
there are a couple of thousand in irmng and IF, so most from either nbcbi or silva
I don't have local copies of those - I could check on the server, but I don't know where else they would be from
Benjamin Redelings
@bredelings
Makes sense. I was thinking we would get a lot from ncbi because its DNA based. In that context it makes more sense to include records for individuals or specimens.
The internet says that "Canis sp." means "an unspecified species of the genus Canis". However, it seems in practice to be used to denote individual acts of DNA sequencing. Sigh.
Karen Cranston
@kcranston
I wonder if there is some rule like "if there is only one DNA sequence associated with the taxon label"
Benjamin Redelings
@bredelings
I think that probably describes the practical reality...
Hmm.... Isn't SILVA the one that clusters observed sequences, and claims to make "stable clusters"?
That is at least higher-level that a single (genome) sequence.
Karen Cranston
@kcranston
what is the end goal? filter these from OTT, from the synthesis tree, from the visualization?
Jonathan A Rees
@jar398
@bredelings I think 'A sp. B' may be used in many different ways. E.g. sometimes it means a new species has been found (perhaps with many samples) but not yet described. I suspect that the intent, if it says 'sp.', is that some species is being referred to, either a known or unknown one; it's just that whoever wrote it doesn't know which species. And we rarely know anything about the empirical basis; could be a single specimen that is believed not to belong to any of the other species in the group, and therefore representative of a species not know to the authors of the study (but, perhaps independently described elsewhere, or later).
If people are just taking lots of samples and not bothering to identify them to known species, that's a different story, closer to what you were assuming. Then the sample could belong to a species containing some other sample in the same study.
Jonathan A Rees
@jar398
The OTT names on the SILVA clusters are pretty random - they are derived from NCBI heuristically so the name may have very little to do with the cluster. As I keep saying the whole prokaryote/SILVA situation in OTT needs an overhaul. Also regarding SILVA stability, I don't think it's as good as one would like. I asked them to try to re-use reference sequences from one SILVA release to the next, and they said they would; that could help line the clusters in one version up with the clusters in the next, but is no rigorous guarantee (i.e. cluster X in SILVA version N and cluster Y in SILVA version N+1 could both have the same reference sequence, but the clustering could move the 'center' of the cluster so that some sequences in X are not in Y).
Not all SILVA clusters are represented in OTT, which is sort of a problem. This wouldn't be hard to fix, but we'd have to invent names for them (e.g. "SILVA 17 cluster A12345" where 17 is the SILVA version and A12345 is the designated reference sequence for the cluster). If SILVA published diagnostic criteria for its clusters (sequence based obviously) that would also be lovely, because then we'd be able to put sequences that entered Genbank after that clustering into existing clusters.
Jonathan A Rees
@jar398
Sadly not all NCBI taxa have Genbank records that belong to SILVA clusters, either just because there's no 16s or because of SILVA's quality control process. This makes creating a taxonomy that reflects both NCBI taxonomy and SILVA clustering a horrible mess, inevitably.
Mark T. Holder
@mtholder
Hi @bredelings , @kcranston , @jimallman : @snacktavish and I just noticed that the debug logging level on the ws_wrapper is leading to very large logs in /home/opentree/repo/ws_wrapper/ws_wrapper.log on api
I can turn down the logging, but I'll also need to get rid of that old log file.
Looking at it briefly, I think that we could just delete it.
Obviously, I could summarize the info in it before I did that if you any of you want some of the info that is stored there.
I'll delete it on Friday if I don't hear a complaint
Karen Cranston
@kcranston
ok from me to delete
Juliana Dániel-Ferreira
@julianadf

Hello,
I do not know if this is the right place to do this but I have been running into trouble when I want to download the phylogeny of some land plants while using the rotl package, for example:

taxa_plant <- c("Veronica officinalis", "Vicia cracca", "Vicia sepium")
resolved_names_pl <- tnrs_match_names(taxa_plant, context_name = "Land plants")
Error in check_tnrs(res) : No matches for any of the provided taxa

The search works when I remove the context_name but then I get into trouble with synonyms.
When I go directly to the OTL homepage and search for any of the plant species under the category "Land plants" I get no results. When I do the same search in "all life" it manages to find the species. So it seems "land plants" has some trouble...

Karen Cranston
@kcranston
Hi @julianadf - thanks for the report (and asking here is ideal!). We have been doing some taxonomy updates, and also changing the software that processes these queries. I know @mtholder was debugging some problems with contexts, so this might be the source. I'll let him chime in when he comes online.
Mark T. Holder
@mtholder
Hmmm. Thanks for the report @julianadf I had thought that we fixed this... I'm traveling most of the day today, so it may be tomorrow before I can fully investigate what's going on.
Benjamin Redelings
@bredelings
OK! So, now when you go to a broken taxon in the tree viewer, it will tell you which trees conflict with the monophyly of that taxon, and point you to a conflict analysis of each tree with the taxonomy.
For example, Microchiroptera has 4 studies that conflict with it. You can type that name into the tree viewer yourself, or go click here to see what it looks like: https://tree.opentreeoflife.org/opentree/argus/ottol@603783/Microchiroptera
Nice work on the visualization, @jimallman !
Brian O'Meara
@bomeara
Is it possible for regular users to "fix" the conflict to restore the taxon?
Mark T. Holder
@mtholder
@bomeara yes, but not quickly. If a new tree is added that supports monophyly and outranks the trees that conflict, then the taxon should show up again. but there is a lag in between the users' actions and when we re-build the tree.
Brian O'Meara
@bomeara
can users predict a tree rank? [I know taxonomies are ranked manually, not sure about trees]
Mark T. Holder
@mtholder
they can see the order of trees in our "collections" and the order of the collections in our synth algorithm.
but the whole operation is less than crystal clear.
they can also edit the tree order in collections, too. In principle, a user can fix it. In practice almost no one outside the project would figure it out at this point.
Brian O'Meara
@bomeara
ok, thanks. Just thinking about how much of a lift to restore conflicted taxon we care about
Karen Cranston
@kcranston
moderate amount of work if we have a well-curated, non-conflicting tree in the database