-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Preserve prodigal metadata for anvi-export-gene-calls
#2181
Comments
Just a quick note as I'm passing through this: When we do this we need to think of a way that doesn't require a specific design that locks us in with Prodigal for gene calling. A generic design that can keep track of additional features for genes (or genomic regions, or nucleotides, or codons) that can also be populated from Prodigal output. |
anti-export-gene-calls
anvi-export-gene-calls
Because I understand this exists in the context of broader changes that need be done (that are beyond my current mastery of the Anvi'o codebase), here is a temporary pseudo-solution for anyone who ends up here. I wrote this script which essentially piggy backs on Anvi'o prodigal caller and response parser:
This won't update your contigs database or otherwise modify any Anvi'o functionality, however if you call it with a FASTA as input, it will return the same dict anvi'o generates in addition to keeping the other prodigal outputs chosen here (e.g. 'gc_cont'). Example run command |
Thanks for letting me know about the typo, @Ge0rges. I'm not sure how did it survive this long. I guess because no one is using Prodigal Your temporary workaround is masterful and beautiful. Regarding the original feature request: this has been a difficult one to address because it requires a change in the way we keep gene calls in our relevant table with the addition of a few new columns, which will likely add millions of additional data points to that table, increasing the contigs-db size by a lot while only being relevant to a fraction of the users. A better solution would be to extend that table if |
That makes sense. I was wondering if the revamped mentioned in #2152 is thought of to affect the contigs-db or to involve the creation of new type of artifact centered around genomes/MAGs? If the latter was the case, this feature could be relegated to that artifact rather than expanding on contigs-db. |
I think it will have to be new, optional tables in contigs-db. We already have the code to mark nucleotide, codon/amino acid positions in contig sequences in contigs-db files, but they are not used outside of anvi'o structure currently. We will have to make them more accessible to mainstream programs :) The best way to get these things done is to have a project in the lab that needs this solution to be in place. That's why there is a delay currently :( |
The need
Identifying things like RBS motif, start codon, etc. can come in handy with gene-aware analyses. Such analyses may become more frequent in the future especially given the current effort
#2152
The solution
From @ivagljiva on discord:
Perhaps this should be relegated to the effort mentioned by @semiller10 in #2152, but I thought it pertinent to bring it up.
Beneficiaries
Anyone doing gene aware analyses.
The text was updated successfully, but these errors were encountered: