Guarantee compatibility between reference package components and rank recommendation #95
Labels
feature request
A request for a new feature unlike one that already exists
Projects
Taxonomic classification is reliant on the evolutionary distance (i.e. branch-length, or number of substitutions) linear model. Distances between query sequences and reference sequences inferred during phylogenetic placement are influenced by the underlying reference alignment, and therefore the MSA trimming process. This causes a conflict when, for example, a model trained on a BMGE-trimmed MSA is used to correct classifications derived from ClipKit-trimmed MSA.
Potential Solutions
treesapp assign
is executed, the parameters are compared to those that were used to create the reference package. If there are differences that could influence the phylogeny, the reference package is automatically re-trained. MSA-trimming software name, mode and parameters would need to be stored. Creating a parser to extract these attributes for each trimming software would be inconvenient, and potentially unstable across multiple versions.treesapp create/update
. The raw reference leaf sequences would need to be stored in the refpkg sotreesapp update
andtreesapp train
can access the raw sequences.Acceptance criteria
--trim_align
and related arguments are removed from all subcommands exceptcreate
andupdate
treesapp create
, including all candidate sequences unused. These include records that passed the taxonomic screen & filter, and length thresholds.The text was updated successfully, but these errors were encountered: