Skip to content

ivankunyankin/quartznet-asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuartzNet

Lightweight PyTorch implementation of QuartzNet (https://arxiv.org/pdf/1910.10261.pdf).

QuartzNet BxR architecture

Features

  1. Allows to choose between three different model sizes: 5x5, 10x5, 15x5. For details refer to the article.
  2. Easily customisable
  3. Allows training using a cpu, single or multiple gpu.
  4. Suitable for training in colab and aws spot instances as it allows to continue training after a break-down.

Table of contents

  1. Installation
  2. Default training
  3. Train custom data 📚
  4. Hyperparameters 📚
  5. Things that are different compared to the article

Installation

  1. Clone the repository
git clone https://github.com/ivankunyankin/quartznet-asr.git
cd quartznet-asr

  1. Create an environment
 and install the dependencies
python3 -m venv env

source env/bin/activate

pip3 install -r requirements.txt


Default training

(back to the top)

Training

This guide shows training QuartzNet5x5 model using a part of LibriTTS dataset.

  1. Download the data by running the following:
assets/data.sh

If you encounter an error, give the execute permission to your script and rerun the above command: chmod +x assets/data.sh

The script will download and unzip the following subsets of LibriTTS: train-clean-360 (for training), dev-clean (for validation), test-clean (for testing)

Warning. This subset of the dataset requires around 30 Gb of storage space.

  1. Run the following to start training:
python3 train.py

Add --from_checkpoint flag to continue training from a checkpoint.

Testing

python3 test.py

The code will test the trained model on test-clean subset of LibriTTS.
It will print the resulting WER (word error rate) and CTC loss values as well as save intermediate logs in the logs directory

Tensorboard

tensorboard --logdir logs

Contribution

(back to the top)

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change

Acknowledgements

(back to the top)

I found inspiration for TextTransform class and Greedy decoder in this post.

Releases

No releases published

Packages

No packages published