Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate domains in refpkg profile HMMs #77

Open
6 tasks
cmorganl opened this issue Mar 5, 2021 · 0 comments
Open
6 tasks

Annotate domains in refpkg profile HMMs #77

cmorganl opened this issue Mar 5, 2021 · 0 comments
Labels
feature request A request for a new feature unlike one that already exists

Comments

@cmorganl
Copy link
Collaborator

cmorganl commented Mar 5, 2021

The Achilles heel of gene-centric methods is domains shared with other, functionally-unrelated protein families. As these methods annotate in a vaccuum, they fall susceptible to classifying non-homologous sequences which would otherwise be correctly classified to a different sequence or protein family by methods that use large databases (e.g. EggNOG-mapper or methods that use Refseq). When these domains are present in a protein family, the risk of classifying false positives increases greatly.

After building a reference package (RefPkg), users should be provided with a simple method for annotating these domains in their RefPkg, using a database that is sufficiently large. Based off of these annotated domains, treesapp assign may be able to filter spurious homologous queries and users would be better informed of their RefPkg's protein structure.

I propose that through the layer subcommand, users can use a '--domains' flag that will automatically use the PFam database to perform HMM-to-HMM alignment, identify protein domains found in the RefPkg's profile HMM used for searching, and annotate those loci as such. The annotated domains can be preserved in a new 'domains' attribute of a reference package, and possibly propagated to future updates of the RefPkg.

TODO list:

  • Add 'domains' attribute to ReferencePackage class
  • Add '--domains' flag to treesapp layer arguments
  • Add HHsuite to requirements (conda, Docker and relevant Wiki pages)
  • Automate downloading of the latest HHsuite PFam databaseto the installation's data/ directory when necessary
  • Develop workflow for searching for and annotating domains in a RefPkg's profile HMM
  • Allow users and/or treesapp assign to filter queries mapped to domains
@cmorganl cmorganl added the feature request A request for a new feature unlike one that already exists label Mar 5, 2021
@cmorganl cmorganl added this to To do in v0.12.0 via automation Mar 5, 2021
@cmorganl cmorganl removed this from To do in v0.12.0 Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A request for a new feature unlike one that already exists
Projects
None yet
Development

No branches or pull requests

1 participant