Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal ORFs #103

Open
annebresciani opened this issue Dec 17, 2021 · 2 comments
Open

Internal ORFs #103

annebresciani opened this issue Dec 17, 2021 · 2 comments

Comments

@annebresciani
Copy link

I was wondering why Ribotricer does not have an ORF category that is called "internal". I was trying to compare results for ribotricer and ribocode and I identify an internal ORF in RiboCode that does not seem to be in the ribotricer index at all. They are based on the same human reference GRCh38 from Ensembl version 104.

I looked into the ribotricer code, and I can see that there is an ORF type called "internal", but you do not append it (prepare_orfs.py line 258 and 340). Can you help me explain the reasoning for this. Perhaps I am just misunderstanding the code.

I have an example of an internal ORF in transcript ENST00000675536 that is part of Ribocodes indexing, but not Ribotricer (in relation to that transcript). The AA sequence that it translates to is found in other transcripts, so it is not that the ORF can actually not be identified as translating, but the annotation is missing.
I don't know if it is a bug or intended, but I would really like to understand the reasoning. It seems that many ORFs are found in that transcript, so why not that one?
In ribotricer it has the coordinates 89646069_89649410_396 (ENSG00000131165).

Thank you in advance!
I look forward to your answer.
Kind regards,
Anne

@saketkc
Copy link
Collaborator

saketkc commented Dec 19, 2021

The reasoning to exclude them was because there were so many of them. We should make it available as a flag.
For now you can just convert the ribocode index to a ribotricer compatible one. I'll keep this open till we fix it.

@annebresciani
Copy link
Author

Thank you very much for your swift reply. I am happy to read that I did not misunderstand. For now, I think that we will be okay with how it is, but it would definitely be good to have it as an optional in the future.
Another idea could be an option to collapse the identical ORFs into one row when you have several ORFs that are identical but are identified based on several different transcripts. I am right now doing this downstream, but might be useful for others as well.

Best,
Anne

Ps. out of curiosity, are there any plans for developing the tool further?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants