Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split reference sequences into different fasta files by their annotation from treesapp purity #86

Open
cmorganl opened this issue Apr 27, 2021 · 0 comments

Comments

@cmorganl
Copy link
Collaborator

treesapp purity is (mostly) good at indicating whether reference packages will end up classifying off-target homologs. A missing piece, however, leaves users unsure of what to do with the reference package if off-target hits were found.

While not perfect, the sequences of a reference package could be split across fasta files based on what orthologous group they were classified to. A fasta containing all sequences that were not classified would be written as well. With these files, users could concatenate the sequences they believe to belong to their targeted protein family and recreate the refpkg.

This would be ideal in cases when misannotated, nonhomologous sequences were included in the initial set used by treesapp create.

@cmorganl cmorganl created this issue from a note in v0.11.0 (To do) Apr 27, 2021
@cmorganl cmorganl removed this from To do in v0.11.0 Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant