You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if it should be LJSpeech or LibriTTS the proper candidate to be used to finetune a single person voice.
I've seen that there is a multispeaker boolean field in the configuration, which in my case should be set to false, but I don't know if this imply I have to use LJSpeech, since LibriTTS is a multispeaker.
Maybe is it even better to train the model from scratch? I'm thinking about it, but I suppose I have too few samples (126 files of clean audio for a total of almost 19 minutes)
Thank you in advance.
The text was updated successfully, but these errors were encountered:
Sweetapocalyps3
changed the title
Better LJSpeech or LibriTTS for finetuning a single speaker voice?
Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with little data?
Apr 2, 2024
Sweetapocalyps3
changed the title
Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with little data?
Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data?
Apr 2, 2024
LibriTTS is by far the better choice, the model has seen multiple speakers, and can adapt far better to a smaller dataset for a single speaker.
You can leave all of the settings in config_ft.yml the same (Changing only dataset, then batch size and window size depending on your hardware). Multi-speaker should be kept on true, just make sure that in your dataset metafiles the speaker_id is set to the same id for each file.
Training the model from scratch from with 19 minutes of data will most likely yield bad results, although I haven't tried myself.
The expressions and emphasis in the voices sound really natural, but there are always noises at the beginning and especially at the end. I believe a pad of silence at the start and end was missing during the training.
Hi everyone,
I'm wondering if it should be LJSpeech or LibriTTS the proper candidate to be used to finetune a single person voice.
I've seen that there is a multispeaker boolean field in the configuration, which in my case should be set to false, but I don't know if this imply I have to use LJSpeech, since LibriTTS is a multispeaker.
Maybe is it even better to train the model from scratch? I'm thinking about it, but I suppose I have too few samples (126 files of clean audio for a total of almost 19 minutes)
Thank you in advance.
The text was updated successfully, but these errors were encountered: