Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot visualise rRNA units from gff file #38

Closed
Edouard94 opened this issue Apr 29, 2024 · 3 comments
Closed

Cannot visualise rRNA units from gff file #38

Edouard94 opened this issue Apr 29, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Edouard94
Copy link

Hello,

I am trying to visualise my rRNA units with the following script:

# Limited Range Features

from Bio.SeqFeature import SeqFeature
from pygenomeviz import Gff, GenomeViz
import matplotlib.pyplot as plt

# Load GFF file
gff_file = "Gb_microsporidia_assemblyGFF.gff"
gff = Gff(gff_file, min_range=1, max_range=170000)

# Plot min-max range GFF features with sublabel
gv = GenomeViz(fig_track_height=0.7, tick_style="bar")
track = gv.add_feature_track(name=gff.name, size=gff.range_size, start_pos=gff.min_range, labelsize=0)
track.add_gff_features(gff, feature_type="CDS", facecolor="skyblue", linewidth=0.5, label_type="gene", labelvpos="top")
track.add_gff_features(gff, feature_type="rRNA", facecolor="lime", linewidth=0.5, label_type="product", labelvpos="top", patch_kws=dict(hatch="/"))
track.set_sublabel(f"{gff.name}\n\n{gff.min_range} - {gff.max_range} bp", ymargin=1.0)

fig = gv.plotfig()

# Save the figure
plt.savefig("genome_range_plot.png", dpi=1000)

but it was not working as I think my gff format is no quite the one expected. Here are the location of the rRNA units in the gff file:

CONTIG_177_POLISHED	EMBL/GenBank/SwissProt	rRNA	162154	166014	.	-	.	note "rRNA unit" 
CONTIG_181_POLISHED	EMBL/GenBank/SwissProt	rRNA	4102	7964	.	-	.	note "rRNA unit" 
CONTIG_293_POLISHED	EMBL/GenBank/SwissProt	rRNA	1	734	.	-	.	note "rRNA unit"

Any idea on how I could plot them with the track.add_gff_features or feature.qualifiers.get functions.

Thank you for your help!

@moshi4
Copy link
Owner

moshi4 commented Apr 30, 2024

Hi @Edouard94,

I'm not sure what visualization results you expect to get. Were you expecting the three rRNAs listed in the example GFF file to be plotted on one track ranging from 0 - 170,000bp?

If the above assumption is correct, then the idea of ​​plotting rRNAs present in different contigs on one track is wrong, so it is natural that the plot will not be as you expected.

@moshi4 moshi4 added the question Further information is requested label May 2, 2024
@Edouard94
Copy link
Author

Edouard94 commented May 4, 2024

Hello @moshi4,

Thank you for your reply, yes I can understand that having multiple rRNAs units is wrong; I wanted to plot this surprising annotation and try to visualise the placement of these units along my genome.

But I also retry to plot only one rRNA unit (located between 1 and 1000), and again the expected 'lime' unit is not produced, is this because of my gff format file? (see my first comment)

I used the same code as before, and just changed the range from 1 to 1000 bp.

genome_range_plot

@moshi4
Copy link
Owner

moshi4 commented May 4, 2024

Sorry, I don't quite understand what you are trying to do, so it is difficult to answer. At least the GFF file is reading correctly, so I think the format is correct.
It seems to me that you are not understanding the previous comment correctly, so please take a look at the following code example to try to understand what I mean.

Code Example

GCF_000959065.1.gff.gz

from pygenomeviz import Gff, GenomeViz

gff_file = "GCF_000959065.1.gff.gz"
gff = Gff(gff_file)

gv = GenomeViz()

for seqid, size in gff.get_seqid2size().items():
    track = gv.add_feature_track(seqid, size)
    track.set_sublabel()
    cds_features = gff.get_seqid2features("CDS")[seqid]
    track.add_features(cds_features, facecolor="skyblue", plotstyle="arrow")
    rrna_features = gff.get_seqid2features("rRNA")[seqid]
    track.add_features(rrna_features, facecolor="lime", plotstyle="arrow")

gv.savefig("example01.png")

example01.png

example01

from pygenomeviz import Gff, GenomeViz

gff_file = "GCF_000959065.1.gff.gz"
target_seqid = "NZ_LAEX01000007.1"
gff = Gff(gff_file, target_seqid=target_seqid, min_range=50000, max_range=100000)

gv = GenomeViz()

track = gv.add_feature_track(target_seqid, gff.range_size, start_pos=gff.min_range)
track.set_sublabel()
cds_features = gff.extract_features("CDS")
track.add_features(cds_features, facecolor="skyblue", plotstyle="arrow")
rrna_features = gff.extract_features("rRNA")
track.add_features(rrna_features, facecolor="lime", plotstyle="arrow")

gv.savefig("example02.png")

example02.png

example02

@moshi4 moshi4 closed this as completed May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants