pauses

Quick library to extract pause length features from audio files.

How to get started

I'm assuming you are running this on a Mac computer (this is the only operating system tested).

First, make sure you have installed Python3, FFmpeg, and SoX via Homebrew:

brew install python3 sox ffmpeg

Now, clone the repository and install all require dependencies:

git clone git@github.com:jim-schwoebel/pauses.git
cd pauses 
pip3 install -r requirements.txt

Technique #1 - thresholding

The extract_pauselength.py script uses sys.argv[] convention to pass through variables in the terminal. For more information on this, check out this StackOverflow post.

assumptions

To simplify things a bit, I recorded a few files that I could use for reference (in ./data folder) - slow, moderate, moderate-fast, and fast speaking (reading the constitution of the US).

I then used pydub to segment based on a threshold of 50 milleseconds segments and -32 dBFS (to allow for detection of fast speaking events) as a silence interval. This parameter likely needs to be tuned to the dataset and speaker power, etc. and is likely overfitted to my voice. Nonetheless, this gives a proof-of-concept implementation of how to segment speaking segments from non-speaking segments with a threshold. I then calculated pause length as total duration (seconds) over the counted number of segments (e.g. number of pauses) - to get a sec/pause.

if you want to process all audio files in the ./data folder

Run the script in the terminal with:

python3 extract_pauses_1.py n y

recording voice files and calculating pauses in real-time

If you want to record a file you can do this by:

python3 extract_pauses_1.py y n

After you record it it will display the pause length and create a .JSON file.

process audio files in ./data folder and record an audio file in real time together

If you want to both record a file (10 seconds) and process all the files in the ./data director you can run

python3 extract_pauses_1.py y y

Technique #2 - machine learning classification

Another technique that can be used is to train a machine learning model to detect pause lengths. In this case, I trained a quick machine learning model from 5-6 files separating the files into 200 millisecond windows and labeling each one as a 'pause' or a 'speech' event. I used the train_audioTPOT.py script found in the voicebook repository with the librosa feature embedding (librosa_features.py). The model achieves around 91.22807017543859% accuracy with an optimized SVM model.

To run this script, you must first put some files in the load_dir folder when you clone the repository (e.g. 'fast.wav').

Next, run the script:

python3 extract_pauses_2.py

The audio files in ./load_dir are then spliced into 200 millisecond segments and classified as silence or speech events. What results is a file in the ./load_dir that corresponds with the speech file (e.g. fast.wav --> fast.json) with the following information:

{"filename": "fast.wav", "total_length": 1.0, "mean": 0.4, "std": 0.20000000000000007, "max_value": 0.6000000000000001, "min_pause": 0.2, "median": 0.4}

As you can see, you get a bit more information here. Note this was a proof-of-concept and likely needs to be augmented with other datasets for it to work robustly across speakers.

Limitations

Both scripts are limited to low-noise environments. If there is a lot of background noise in your file, I'd first suggest cleaning them and removing noise (e.g. with SoX) before using this script to calculate pause lengths.

Feedback

Any feedback this repository is greatly appreciated.

If you find something that is missing or doesn't work, please consider opening a GitHub issue.
If you want to learn more about voice computing, check out Voice Computing in Python book.
If you'd like to be mentored by someone on our team, check out the Innovation Fellows Program.
If you want to talk to me directly, please send me an email @ js@neurolex.co.

License

This repository is licensed under the Apache 2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

load_dir

load_dir

models

models

README.md

README.md

extract_pauses_1.py

extract_pauses_1.py

extract_pauses_2.py

extract_pauses_2.py

librosa_features.py

librosa_features.py

license.txt

license.txt

requirements.txt

requirements.txt

Repository files navigation

pauses

How to get started

Technique #1 - thresholding

assumptions

if you want to process all audio files in the ./data folder

recording voice files and calculating pauses in real-time

process audio files in ./data folder and record an audio file in real time together

Technique #2 - machine learning classification

Limitations

Feedback

License

Additional reading

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
data		data
load_dir		load_dir
models		models
README.md		README.md
extract_pauses_1.py		extract_pauses_1.py
extract_pauses_2.py		extract_pauses_2.py
librosa_features.py		librosa_features.py
license.txt		license.txt
requirements.txt		requirements.txt

License

jim-schwoebel/pauses

Folders and files

Latest commit

History

Repository files navigation

pauses

How to get started

Technique #1 - thresholding

assumptions

if you want to process all audio files in the ./data folder

recording voice files and calculating pauses in real-time

process audio files in ./data folder and record an audio file in real time together

Technique #2 - machine learning classification

Limitations

Feedback

License

Additional reading

About

Topics

Resources

License

Stars

Watchers

Forks

Languages