New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unannotated ORFs #161
Comments
Hi there! I re-ran RiboTricer with a TE annotation (gtf). All the samples ran up until this point. I'm having trouble making sense of this error message; please let me know if you could help! The "index" file warning is not an issue, because the same samples ran with the same warning when the gtf was different. It seems the program fails after "WARNING: no periodic read length found... using cutoff 0.418", but no other warnings or errors are reported. In addition, all the reported files and graphs (except for read_length_dist) are empty. The metagene_profile tables are all also exclusively populated with zeroes.
|
Hi @singhbhavya,
How are your TE elements annotated in the gtf?
Yes, but it is dependent on how you generate the index (and thereby what was your input gtf). It would be helpful to know what your annotation (gtf) looks like. |
Hi Sir! Please see here - https://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/mm39_rmsk_TE.gtf.gz |
hi @singhbhavya, Wehn I generate the index using the gtf you shared, it looks okay to me:
Have you tried learning the |
My index looks the same as the one you generated; I don't think there's a problem with the index. I'm providing a few more details, in case it's helpful. This is what the bam_summary.txt looks like .
What's super unusual is the ribotricer_protocol, which checks only 4 reads. For the other (gene) annotation, it checks far more than 4.
I have RNA-seq data available to me for these samples, and I'll try learning the cutoff. Do you think the cutoff should be different for TE reads compared to gene? Thank you so, so much for your help so far!! It is very appreciated. |
Hi @singhbhavya, the read distribution shows a high enrichment of 29-31nt reads which is great. The protocol inference seems a bit concerning (could be a bug). If you point me to the bam file (or a subset of it), I will be happy to take a deeper look. thanks! |
Hi there! I have a question about how RiboTricer handles unannotated sequences.
Purpose: I am trying to run RiboTricer after generating BAM files using SQUIRE (which does both gene and transposable element quantification), and I want to know which TE reads in my ribo-seq data are actively translating.
Problem: When I run
ribotricer prepare-orfs
, all the candidate ORFs correspond with gene names, which confuses me. I don't see many unannotated regions that don't have names.Question: Does RiboTricer predict ALL candidate ORFs (including repeat elements)? If so, how does it name them? As long as all the predicted ORFs have a start end end site, that's all I need, since I have a TE annotation. I have a .txt repeatmasker file with specific TE sequences; can I somehow use that within RiboTricer?
The text was updated successfully, but these errors were encountered: