Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p-site offsets and metagene profiles #122

Open
p-levy opened this issue Apr 7, 2022 · 5 comments
Open

p-site offsets and metagene profiles #122

p-levy opened this issue Apr 7, 2022 · 5 comments

Comments

@p-levy
Copy link

p-levy commented Apr 7, 2022

Hi Saket,

Thanks for developing this great tool! I am surprised to get these results for the psite_offsets.txtoutput file:

relative lag to base: 33
	lag of 28: 0
	lag of 22: 0
	lag of 33: 0
	lag of 27: 0
	lag of 24: 0
	lag of 32: 0
	lag of 34: 0
	lag of 31: 0
	lag of 30: -1
	lag of 25: 0
	lag of 26: 0
	lag of 29: 0
	lag of 35: 0
	lag of 36: 0
	lag of 23: 0
	lag of 21: 0
	lag of 20: 0

How should I interpret that?

And I find my metagene profiles a bit strange. Especially around the stop codons. See this example:

Screenshot 2022-04-07 at 14 47 40

While using RibORF with the same ribo-seq bam file I get this type of profiles showing a p-site offset of 13 nt for both start and stop codon:

Screenshot 2022-04-07 at 14 47 04

Why do you think this is the case and should I then trust the results of my translating_ORFs.tsv output?

Thanks!
Pierre

@saketkc
Copy link
Collaborator

saketkc commented Apr 17, 2022

Thanks for developing this great tool! I am surprised to get these results for the psite_offsets.txtoutput file:

relative lag to base: 33
	lag of 28: 0
	lag of 22: 0
	lag of 33: 0
	lag of 27: 0
	lag of 24: 0
	lag of 32: 0
	lag of 34: 0
	lag of 31: 0
	lag of 30: -1
	lag of 25: 0
	lag of 26: 0
	lag of 29: 0
	lag of 35: 0
	lag of 36: 0
	lag of 23: 0
	lag of 21: 0
	lag of 20: 0

How should I interpret that?

The output indicates relative offsets between different read lengths with 33nt reads treated as the base. Ribotricer tries to choose relative offsets for merging profiles of different read lengths so to maximize the cross-correlation between them, You can override this behavior by providing custom offsets using the --psite_offsets option (

"--psite_offsets",
).

And I find my metagene profiles a bit strange. Especially around the stop codons. See this example:

Screenshot 2022-04-07 at 14 47 40

While using RibORF with the same ribo-seq bam file I get this type of profiles showing a p-site offset of 13 nt for both start and stop codon:

Screenshot 2022-04-07 at 14 47 04

Why do you think this is the case and should I then trust the results of my translating_ORFs.tsv output?

I am not quite sure why there is bleedthrough over the stop codon - but can likely be arising from annotation issues in the GTF. Can you provide what annotation is this based on (Genome build and GTF version). In our benchmarking, we find our strategy to have both higher sensitivity and specificity, over multiple species.

@p-levy
Copy link
Author

p-levy commented Apr 19, 2022

Hi Saket,

This was generated from human data.

Genome used: GRCh38.primary_assembly.genome.fa (gencode)
GTF: gencode.v39.annotation.gtf

Here's how I ran ribotricer:

  1. Prepare ORFs
ribotricer prepare-orfs \
	--gtf /plevy/ref/gencode.v39.annotation.gtf \
	--fasta /plevy/ref/GRCh38.primary_assembly.genome.fa \
	--prefix gencode \
	--min_orf_length 24 \
	--start_codons ATG,TTG,CTG,GTG \
	--longest
  1. Detect ORFs
ribotricer detect-orfs \
	--bam sample.bam \
	--ribotricer_index gencode_candidate_orfs.tsv \
	--prefix gencode

Thanks!
Pierre

@saketkc
Copy link
Collaborator

saketkc commented Apr 19, 2022

Thanks, looks correct to me. Actually, it is hard to compare against ribORFs profile without extending ours a few bases beyond the stop codon which is currently 0 (

offset_3p=0,
). I will try to expose this parameter to the cli. But the bleed through in the above visualization does not affect the periodicity calculation so the results remain interpretable.

@polklin
Copy link

polklin commented Feb 10, 2023

Hello @saketkc,

thanks for this very nice tool !

I add a comment in this thread because I do find some strange results in the metagene profile on my side too.
I added an offset of 20 nucleotides on the 3' end as you suggested, so that I can see a P-site offset on the stop codon too.

I typically observe this kind of plot on my data:
image

For the start it looks nice, I have a peak ~12 nucleotides before the start codon. But it seems the Frame 3 (in blue) is majoritarian. I would have expected the Frame 1 (orange) instead ?

For the stop codon plot, the Frame 1 (orange) seems nice. But the Frame 2 (green) is way higher and constant on the all plot.

I installed ribotricer via conda (https://anaconda.org/bioconda/ribotricer), release 1.3.2.
I am working with GENCODE (hg19) human release.
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz

I prepare the ORFS with:

ribotricer prepare-orfs --gtf gencode.v19.annotation.gtf \
                        --fasta GRCh37.p13.genome.fa \
                        --prefix out \
                        --min_orf_length 30 \
                        --start_codons ATG,CTG,GTG
ribotricer detect-orfs \
             --bam input.bam\
             --prefix 'out_' \
             --ribotricer_index candidate_orfs.tsv \
             --phase_score_cutoff 0.440 \
             --report_all

Thanks,
Paul

@polklin
Copy link

polklin commented Apr 20, 2023

I have also tried with the test data you're providing: https://www.dropbox.com/s/xr0xsdluuni2b95/ribotricer_test_data_tair10.zip

I detected the translating ORF with the command:

ribotricer detect-orfs --bam bams_unique/SRX219170.bam \
                                      --ribotricer_index index/ribotricer_v44_annotation_longest_candidate_orfs.tsv \
                                      --prefix SRX219170_generated

If I understand correctly the code, the read length with the highest number of reads is taken as the reference (p-site shifts will be computed relatively to this reference).

In this test example, it is the 29nt long that is majoritarian. The metagene profile looks like this:
image

The triplet periodicity seems good, but there is a bias towards the Frame 2 (in green). I would have expected Frame 1 (orange to be majoritarian) ?

When looking at the psite_offset.txt generated file it seems all other read length are shifted towards Frame 2.

	lag of 30: 1
	lag of 28: -1
	lag of 29: 0
	lag of 26: -3
	lag of 31: 1
	lag of 27: -2
	lag of 25: -4
	lag of 24: -5
	lag of 32: 2

image

image

Perhaps that is my understanding that is not good enough .

Shouldn't we expect that only read length with a good periodicity + majority of reads on Frame 1 to be kept / considered as the reference for p-site alignment ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants