Skip to content

This is a Machine Learning-Audio Signal Processing Project where a real-time audio signal is classified into speech or music using Deep Neural Network and Convolutional Network

Notifications You must be signed in to change notification settings

cksajil/MusicRJ

Repository files navigation

MusicRJ

Basic Build

A Machine Learning-Audio Signal Processing Project (Ongoing)

Project Details

This is a Machine Learning-Audio Signal Processing Project where a real-time audio signal is classified into speech or music using Deep Neural Network and Convolutional Network. The long term goal is to create an AI personal assistant which listens to audio streams and summarize its content to the end user.

Block diagram

Dataset

The project use the dataset DataGTZAN music/speech collection.

All the wav audio files should be extracted to the Data/Files folder.

Python Version

Python 3.9.12

Setting up virtual environment

Installing Virtual Environment

python -m pip install --user virtualenv

Creating New Virtual Environment

python -m venv envname

Activating Virtual Environment

source envname/bin/activate

Upgrade PIP

python -m pip install --upgrade pip

Installing Packages

python -m pip install -r requirements.txt
pip install PyAudio

How to run

#Data preprocessing
python main.py -s p

#Model Training
python main.py -s t

#Real-time Demonstration
python main.py -s r

Model 1 (Simple DNN) Architecture

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                8224                                                              
 dense_1 (Dense)             (None, 64)                2112                                                             
 dense_2 (Dense)             (None, 128)               8320                                                                  
 dense_3 (Dense)             (None, 256)               33024                                                                 
 dense_4 (Dense)             (None, 512)               131584                                                                 
 dense_5 (Dense)             (None, 256)               131328                                                                
 dense_6 (Dense)             (None, 128)               32896                                                                 
 dropout (Dropout)           (None, 128)               0                                                                    
 dense_7 (Dense)             (None, 64)                8256                                                             
 dense_8 (Dense)             (None, 2)                 130                                                                    
=================================================================
Total params: 355,874
Trainable params: 355,874
Non-trainable params: 0
_________________________________________________________________

dnn architecture

Model 1 Train and validation loss graph

Loss graph

Model 2 (CNN) Architecture

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)                (None, 101, 1290, 32)    320                                                                    
 max_pooling2d (MaxPooling2D)   (None, 50, 645, 32)      0                                             
 conv2d_1 (Conv2D)              (None, 48, 643, 64)      18496                                                              
 max_pooling2d_1 (MaxPooling2D) (None, 24, 321, 64)      0                                                                                       
 conv2d_2 (Conv2D)              (None, 22, 319, 64)      36928                                                                
 flatten (Flatten)              (None, 449152)           0                                                                   
 dense (Dense)                  (None, 64)               28745792                                                        
 dense_1 (Dense)                (None, 2)                130                                                            
=================================================================
Total params: 28,801,666
Trainable params: 28,801,666
Non-trainable params: 0
_________________________________________________________________

Testing

python -m pytest --verbose

Results

Model Accuracy Precision Recall F1-score
DNN Model 0.9812 0.9980 0.9647 0.9810

About

This is a Machine Learning-Audio Signal Processing Project where a real-time audio signal is classified into speech or music using Deep Neural Network and Convolutional Network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages