Skip to content

A Novel Approach to Codon Optimization Based on Machine Learning and Genetic Algorithm

License

Notifications You must be signed in to change notification settings

Gumgo91/CodonHumanization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodonHumanization: A Novel Approach to Codon Optimization Based on Machine Learning and Genetic Algorithm

Table of Contents

Abstract

Codon optimization is a molecular biological technique that modulates codon usage and modifies DNA or RNA sequence to enhance protein expression. However, for human, current codon optimization methods have certain limitations, such as potential alterations in protein structure and increased immunogenicity. In this study, we present a new approach to codon optimization utilizing machine learning and genetic algorithm, termed "codon humanization". The utilization of natural language processing (NLP) can enable the detection of latent patterns in human DNA sequences, which can subsequently be employed for the humanization of sequences through directed evolution using genetic algorithms. This approach has the potential to improve safety and efficacy in gene delivery to humans, such as gene therapy or RNA vaccines.

Codon humanization can be used via an pretrained model. please In addition, all materials for learning and reproduction of genetic algorithms are provided.

Installation

$ git clone https://github.com/Gumgo91/CodonHumanization.git

After cloning the repository by executing the above command, you can use it as follows.

from codonhumanization import CodonHumanizer

ch = CodonHumanizer()

sequence = "ATGATGATG" # example sequence
sequence_pool = ch.evolution(sequence)
print(sequence_pool[0])

Method

For the purpose of implementing codon humanization, a dataset was compiled and a classification model was developed. The binary classification model can differentiate between a random sequence and a human sequence when presented with a DNA sequence. By employing directed evolution via genetic algorithms on a DNA sequence, the corresponding amino acid sequence remains unchanged, but the DNA sequence is transformed to become more human-like. The figure below shows the sequential processes in which a binary classification model is trained, and a given sequence is humanized using a genetic algorithm. Figure 1

About

A Novel Approach to Codon Optimization Based on Machine Learning and Genetic Algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published