Speaker Verification 🗣

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

What is speaker verification? 🤔

Speaker verification is a biometric method to confirm a person's identity based on their unique voice traits. For instance, in secure systems, a user's voiceprint is compared to a preregistered sample for access. It's commonly used in phone-based customer service, voice assistants, and security applications to enhance identity verification.

Overview of this project 😄

Speaker verification utilizes speech characteristics to validate the speaker’s identity. It has become increasingly important in security, where it is employed in several applications, including access control, monetary transactions, and safe communication, to authenticate people. This project focuses on verifying the speakers based on their voices. The speakers are the voices of famous virtual assistants: Siri, Cortana, Google Assistant, and Alexa. Text-to-speech (TTS) technology is often used to create these virtual assistants' voices. As a result, these assistants lack the natural variances in human voices. This project applies transfer learning to the ECAPA-TDNN (SoTA model for speech verification tasks) from the SpeechBrain toolkit, recognizing synthetic sounds and verifying the speakers. Inter and intra-comparisons are done on text-dependent and independent methods, and results are obtained based on evaluation metrics: accuracy, precision, recall, and F1 score.

Introduction 📖

Speaker verification utilizes speech characteristics such as pitch, formants, spectral envelope, MFCCs, and prosody characteristics.
"Voice prints" represent a speaker's unique vocal qualities.
There are two types of speaker verification methods: text-dependent and text-independent.
Transfer learning employs pre-trained models to improve performance when labeled data is scarce.
The ECAPA-TDNN model from the SpeechBrain toolkit is used in this study for transfer learning on virtual assistants.

Methodology 💻

Dataset

A custom audio dataset was created with a subset selected for analysis.
Organized into:
- Intra-pair Comparisons:
  - Siri Versions (iOS 9 vs iOS 10 vs iOS 11)
  - Alexa Versions (3rd gen vs 4th gen vs 5th gen)
- Inter-pair Comparisons:
  - Alexa
  - Siri
  - Google Assistant
  - Cortana

Speechbrain

SoTA toolkit for speaker verification-related tasks.
Has pre-trained ECAPA-TDNN model, a state-of-the-art model for speaker recognition that uses TDNN design with MFA mechanism, Squeeze-Excitation (SE), and residual blocks.
Hyperparameters are detailed in a YAML format.
Data Loading makes use of a PyTorch dataset interface.
Batching includes extracting speech features like spectrograms and MFCCs.
Brain_class() simplifies the neural model training process.

ECAPA-TDNN model

SpeechBrain provides outputs using pre-trained models such as ECAPA-TDNN.
Data preprocessing: Extract 80-dimensional filterbank features.
Model initialization: 5 TDNN layers, an attention mechanism, and an MLP classifier.
Hyperparameter setting: epochs, batch size, learning rate, etc.
Training: Trained on the VoxCeleb2 dataset.
Validation and Testing: Evaluate on a validation set.

Implmentation 💬

-It can be understood by the following chart👇

Result 👩‍💻

In brief analysis 🧐
- Intra-pair comparison
- Inter-pair comparison

Conclusion 👏

Intra-pair TDSV analysis shows similarities among all versions, leading to potential security concerns.
Inter-pair TDSV analysis found matches between Cortana & Google Assistant and Alexa.
TISV has higher accuracy than TDSV due to the model's capability to differentiate different texts.
Additional training on a broader dataset of synthetic voices is recommended for better performance.
The study emphasizes the potential of transfer learning and SpeechBrain for speaker verification, also acknowledging challenges with synthetic voices.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Alexa_3rd_gen_vs_4th_gen_vs_5th_gen.ipynb		Alexa_3rd_gen_vs_4th_gen_vs_5th_gen.ipynb
Alexa_vs_Siri_vs_Google_assistant_vs_Cortana.ipynb		Alexa_vs_Siri_vs_Google_assistant_vs_Cortana.ipynb
README.md		README.md
iOS_9_vs_iOS_10_vs_iOS_11.ipynb		iOS_9_vs_iOS_10_vs_iOS_11.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alexa_3rd_gen_vs_4th_gen_vs_5th_gen.ipynb

Alexa_3rd_gen_vs_4th_gen_vs_5th_gen.ipynb

Alexa_vs_Siri_vs_Google_assistant_vs_Cortana.ipynb

Alexa_vs_Siri_vs_Google_assistant_vs_Cortana.ipynb

README.md

README.md

iOS_9_vs_iOS_10_vs_iOS_11.ipynb

iOS_9_vs_iOS_10_vs_iOS_11.ipynb

Repository files navigation

Speaker Verification 🗣

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

What is speaker verification? 🤔

Overview of this project 😄

Introduction 📖

Methodology 💻

Dataset

Speechbrain

ECAPA-TDNN model

Implmentation 💬

Result 👩‍💻

Conclusion 👏

About

Releases

Packages

Languages

harshita-bfly/Speaker_verification

Folders and files

Latest commit

History

Repository files navigation

Speaker Verification 🗣

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

What is speaker verification? 🤔

Overview of this project 😄

Introduction 📖

Methodology 💻

Dataset

Speechbrain

ECAPA-TDNN model

Implmentation 💬

Result 👩‍💻

Conclusion 👏

About

Topics

Resources

Stars

Watchers

Forks

Languages