Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API v4 tweaks for a synth tree that deals with _incertae sedis_ taxa #123

Open
mtholder opened this issue Feb 15, 2017 · 2 comments
Open

Comments

@mtholder
Copy link
Member

@bredelings and I are working on the propinquity and otcetera changes needed to support treating incertae sedis taxa correctly. One wrinkle is that the same node can be identified by multiple OTT IDs:

So if the taxonomy is:

((A1_ott1,A2_ott2)A_ott3,(B1_ott4,B2_ott5)B_ott6*,(C1_ott7,C2_ott8)C_ott9*); 

with asterisks denoting incertae sedis taxa, and the synth tree is:

((A1,A2)A,((B1,C1)mrcaB1C1,(B2,C2)mrcaB2C2)x);

then the node x could be labeled B_ott6 or C_ott9. We may not have any such cases in a synthetic tree, but we should probably figure out what we are going to do for when they start showing up.

It would be easy to list these synonomies in the annotations file produced by propinquity. It is less clear how they would be dealt with in web services. In particular, several tree-of-life calls return an ott ID.

Should that field be expanded to be an array of integers, or should we just pick one (e.g. the one with the lowest number) and list the synonyms in an additional field?

The larger issue is that any naming scheme in the face of incertae sedis taxa requires some definition of what the OTT IDs mean. My gut instinct would be to say that:

  1. the interpretation to be that the IDs are versioned by the taxonomy version. and
  2. For any particular version of OTT, the definition of a taxon is taken to be "the clade rooted at MRCA of all of the included taxa (descendants of the taxon) as long as that node excludes the entire exclude set of the taxon." The exclude set of a focal taxon is "all of the taxa outside of the focal taxon with the exception of any taxa that are descendants of incertae sedis taxa which are children of any ancestor of the focal taxon." Those incertae sedis taxa represent a "nonexcluded" set for the focal taxon. They can be inside or outside without changing the taxon's name.
@kcranston
Copy link
Member

Couple of questions:

  • would one option for labelling x be mrcaB1C2 (or other combination of descendant taxa)?
  • is there an assumption that we would like the taxa B_ott6 or C_ott9 as labels on the synthetic tree (i.e. are these 'good' taxon names that happen to have the incertae sedis flag, vs having names like "unclassified blah")

@mtholder
Copy link
Member Author

We could use the MRCA notation, but I think we still have to communicate to the user that things have changed and now it is possible for one node in the tree to match >1 ott definition. Or, at least it makes sense to me that we'd want to communicate that to users.

In answer to the second point: yes, I was thinking of both B_ott6 and C_ott9 as valid taxa, they just cannot be excluded from intruding on other taxa in the tree. So not tips labeled "unclassified blah".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants