Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate the direction of data obtained from gff file #78

Closed
cabraham03 opened this issue May 17, 2024 · 4 comments
Closed

Calculate the direction of data obtained from gff file #78

cabraham03 opened this issue May 17, 2024 · 4 comments

Comments

@cabraham03
Copy link

How could I plot a group of genes in the same directions using gggenes and ggplot2 ?

I have a data obtained from a .gff file, it was generated with prokka, then it was imported with gggenomes::read_feats function, and get the genes that I want to plot, it generate the next data:

df
# A tibble: 12 × 11
   file_id    seq_id        start    end strand feat_id        locus_tag      orientation gene     cluster       position
   <chr>      <chr>         <int>  <int> <chr>  <chr>          <chr>                <int> <chr>    <chr>            <int>
 1 Genome-300 Genome-300_1 199743 201911 -      IMEHDJCA_00189 IMEHDJCA_00189          -1 geneE    cluster_00365   200827
 2 Genome-300 Genome-300_1 201914 203275 -      IMEHDJCA_00190 IMEHDJCA_00190          -1 geneD    cluster_01313   202594
 3 Genome-300 Genome-300_1 203272 205377 -      IMEHDJCA_00191 IMEHDJCA_00191          -1 geneB    cluster_00403   204324
 4 Genome-300 Genome-300_1 205817 206176 +      IMEHDJCA_00192 IMEHDJCA_00192           1 gene1033 cluster_06099   205996
 5 Genome-300 Genome-300_1 206268 206663 +      IMEHDJCA_00193 IMEHDJCA_00193           1 geneC    cluster_05858   206465
 6 Genome-300 Genome-300_1 206686 222306 +      IMEHDJCA_00194 IMEHDJCA_00194           1 geneA    cluster_00001   214496
 7 Genome-320 Genome-320_8 123699 139310 -      ILCJGNBA_01570 ILCJGNBA_01570          -1 geneA    cluster_00001   131504
 8 Genome-320 Genome-320_8 139333 139728 -      ILCJGNBA_01571 ILCJGNBA_01571          -1 geneC    cluster_05858   139530
 9 Genome-320 Genome-320_8 139820 140179 -      ILCJGNBA_01572 ILCJGNBA_01572          -1 gene1033 cluster_06099   139999
10 Genome-320 Genome-320_8 140619 142724 +      ILCJGNBA_01573 ILCJGNBA_01573           1 geneB    cluster_00403   141671
11 Genome-320 Genome-320_8 142721 144082 +      ILCJGNBA_01574 ILCJGNBA_01574           1 geneD    cluster_01322   143401
12 Genome-320 Genome-320_8 144085 146253 +      ILCJGNBA_01575 ILCJGNBA_01575           1 geneE    cluster_00365   145169

plot

ggplot(df, aes(xmin = start, xmax = end, y = file_id, fill = gene)) +
    geom_gene_arrow(aes(forward = orientation), arrow_body_height = unit(7, "mm"), 
                    arrowhead_height = unit(7, "mm"), 
                    arrowhead_width = unit(5.7, "mm")) + 
    scale_fill_manual(values = c("orange", "blue", "red",  "green", "yellow", "#878787")) +
    facet_wrap(~ file_id, scales = "free", ncol = 1) +
    scale_y_discrete(position="right") +
    gggenes::theme_genes() +
    theme(panel.background = element_rect(fill = 'white' , color = 'white' ),
          panel.grid.major.y = ggplot2::element_line(colour = "grey", linewidth = 0.6),
          axis.title.y= element_blank(), 
          axis.text.y = element_text(size = 14, # sample name size
                                     family="Times New Roman", 
                                     face="bold"),
          plot.margin = unit(c(t=0.3, b=0.3, r=0.3, l=0.001), "mm") 
    )

genes2

When I generate the plot, the genes of each genome are in different direction, is any way to recalculate the position for genes in Genome-320 and be in the same direction than Genome-300 ??

any suggestion ?
thanks

@wilkox
Copy link
Owner

wilkox commented May 20, 2024

Hi @cabraham03, do you want to change the direction of individual genes (this is controlled by the forward aesthetic), or reverse the direction of the entire locus shown in Genome-320 so that it resembles Genome-300?

@cabraham03
Copy link
Author

cabraham03 commented May 27, 2024

Hello.
not for a single gene, all the genes in the second plot (Genome-320), I found this solution, but I'm not sure:
if I have a full length of 150 bp (just an example), and I have 2 genes, geneA start at 10 (start1) and end at 50 bp (end1), and geneB start at 80 (start1) and end at 130 bp (end1); so, to calculate the genes direction will be:
start2 = (total length - end1) +1
end2 = (total length - start1) + 1

so
geneA
start2= (150-50) + 1=101
end2 = (150-10)+1 = 141

geneB
start2 = (150 - 130) + 1 = 21
end2 = (150 - 80) + 1 = 71

is it ok ?
thanks !!!

@wilkox
Copy link
Owner

wilkox commented May 28, 2024

I think the best way to do it would be to draw the genomes separately with the x-axis reversed for Genome-320, then composite the plots with patchwork. This way you avoid showing incorrect coordinates for the genes.

# Libraries
library(tidyverse)
library(gggenes)
library(patchwork)

# Data
df <- tribble(
  ~file_id,     ~seq_id,        ~start, ~end,   ~strand, ~feat_id,         ~locus_tag,       ~orientation, ~gene,      ~cluster,        ~position,
  "Genome-300", "Genome-300_1", 199743, 201911, "-",     "IMEHDJCA_00189", "IMEHDJCA_00189", -1,           "geneE",    "cluster_00365", 200827,
  "Genome-300", "Genome-300_1", 201914, 203275, "-",     "IMEHDJCA_00190", "IMEHDJCA_00190", -1,           "geneD",    "cluster_01313", 202594,
  "Genome-300", "Genome-300_1", 203272, 205377, "-",     "IMEHDJCA_00191", "IMEHDJCA_00191", -1,           "geneB",    "cluster_00403", 204324,
  "Genome-300", "Genome-300_1", 205817, 206176, "+",     "IMEHDJCA_00192", "IMEHDJCA_00192",  1,           "gene1033", "cluster_06099", 205996,
  "Genome-300", "Genome-300_1", 206268, 206663, "+",     "IMEHDJCA_00193", "IMEHDJCA_00193",  1,           "geneC",    "cluster_05858", 206465,
  "Genome-300", "Genome-300_1", 206686, 222306, "+",     "IMEHDJCA_00194", "IMEHDJCA_00194",  1,           "geneA",    "cluster_00001", 214496,
  "Genome-320", "Genome-320_8", 123699, 139310, "-",     "ILCJGNBA_01570", "ILCJGNBA_01570", -1,           "geneA",    "cluster_00001", 131504,
  "Genome-320", "Genome-320_8", 139333, 139728, "-",     "ILCJGNBA_01571", "ILCJGNBA_01571", -1,           "geneC",    "cluster_05858", 139530,
  "Genome-320", "Genome-320_8", 139820, 140179, "-",     "ILCJGNBA_01572", "ILCJGNBA_01572", -1,           "gene1033", "cluster_06099", 139999,
  "Genome-320", "Genome-320_8", 140619, 142724, "+",     "ILCJGNBA_01573", "ILCJGNBA_01573",  1,           "geneB",    "cluster_00403", 141671,
  "Genome-320", "Genome-320_8", 142721, 144082, "+",     "ILCJGNBA_01574", "ILCJGNBA_01574",  1,           "geneD",    "cluster_01322", 143401,
  "Genome-320", "Genome-320_8", 144085, 146253, "+",     "ILCJGNBA_01575", "ILCJGNBA_01575",  1,           "geneE",    "cluster_00365", 145169
)

# Convert orientation to logical
df$orientation <- ifelse(df$orientation == 1, TRUE, FALSE)

# Plot
base <- ggplot(df, aes(xmin = start, xmax = end, y = file_id, fill = gene)) +
  scale_fill_manual(values = c("orange", "blue", "red",  "green", "yellow", "#878787")) +
  scale_y_discrete(position = "right") +
  gggenes::theme_genes() +
  theme(panel.background = element_rect(fill = 'white' , color = 'white' ),
        panel.grid.major.y = ggplot2::element_line(colour = "grey", linewidth = 0.6),
        axis.title.y= element_blank(), 
        axis.text.y = element_text(size = 14, # sample name size
                                   family="Times New Roman", 
                                   face="bold")
  )

genome300 <- base + geom_gene_arrow(
  data = df[which(df$file_id == "Genome-300"), ],
  aes(forward = orientation),
  arrow_body_height = unit(7, "mm"),
  arrowhead_height = unit(7, "mm"),
  arrowhead_width = unit(5.7, "mm")
)

genome320 <- base + geom_gene_arrow(
  data = df[which(df$file_id == "Genome-320"), ],
  aes(forward = orientation),
  arrow_body_height = unit(7, "mm"),
  arrowhead_height = unit(7, "mm"),
  arrowhead_width = unit(5.7, "mm")
) + scale_x_continuous(transform = "reverse")

(genome300 / genome320) + plot_layout(guides = "collect")

Created on 2024-05-28 with reprex v2.1.0

@cabraham03
Copy link
Author

Thanks So Much !!!

@wilkox wilkox closed this as completed May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants