Skip to content

Analysis of DNA Sequence Classification Using Neural Networks - Bioinformatics Course Project - Winter 2022

Notifications You must be signed in to change notification settings

arminZolfaghari/DNA-Sequence-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

DNA-Sequence-Classification

Analysis of DNA Sequence Classification Using Neural Networks. This project is the implementation of this research article.

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed MLP using Label and K-mer encoding for DNA sequence classification.

In this project (Bioinformatics Course Project), we will classify 6 viruses with MLP. The genome of each virus is shown by nucleotide sequences that have different lengths. Adenine (A), cytosine (C), guanine (G), and thymine (T) are the four nucleotides that makeup DNA. The DNA of each virus is unique, and the pattern of arrangement of the nucleotides determines the unique characteristics of a virus.

First, the K-mer method was used to reduce the length of the DNA sequence, and then the Word to Vector method was used to convert it to a fixed length.

Dataset

The training set includes 1320 data that has 220 data for each virus.
The development set includes 30 data to check the model and evaluation.
The test set includes 400 data without a label, and we must predict the label (virus class).

About

Analysis of DNA Sequence Classification Using Neural Networks - Bioinformatics Course Project - Winter 2022

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published