Neural_Voice_Cloning

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person’s voice from only a few audio samples. System that learns to synthesize a person’s voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding.
Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. Speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.

Tested Speaker Audio Link

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
AudioSamples		AudioSamples
Cloning_Audio		Cloning_Audio
Img		Img
Modules		Modules
checkpoints		checkpoints
dv3		dv3
Encoder.py		Encoder.py
README.md		README.md
setup.py		setup.py
speaker_adaptation.py		speaker_adaptation.py
train_dv3.py		train_dv3.py
train_encoder.py		train_encoder.py
train_whole.py		train_whole.py
utils.py		utils.py

VisionBrain/Neural_Voice_Cloning