You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Sorry for the second issue, it was unrelated and I assume this is a stupid user moment)
So I have both xtts-api-server up and running using your docker container, all hooked up, running great.
To fine tune I set up the Colab of xtts-webui, and batch uploaded a bunch of wav files, and it sounds literally amazing. 1:1 it sounds perfect, I was honestly shocked at how accurate it was.
I thought copying the samples/<<name>>.wav into the api's samples/<<name>>.wav would be enough, but on the self-hosted API server it sounds like a completely different person. Maybe a hint that they are the same person, but a very large difference.
What is the proper way to "export" the fine-tuned model from the webui and add it to the API server? If it is just copying the wav file, is there something else I'm missing for my api server? Everything is generic, nothing customized.
Edit: Also the downloaded wav is just the first wav file, where I uploaded a batch of... 15 or so and had it clean them up and do all of the processing. So I assume really that's the problem - is there a "combined" wav or model that I should instead download?
Thanks for building the tools!
The text was updated successfully, but these errors were encountered:
Update, I Found that I Needed to download the entire directory, so speaker/...wav files all are downloaded after being cleaned up and moved over. However, playing on my API server it still sounds like a completely different voice. It's cleaner, less tinny and robotic than just the one, but it sounds nothing like the original voice still. (The voice I hear is american, the voice I uploaded is british)
(Sorry for the second issue, it was unrelated and I assume this is a stupid user moment)
So I have both xtts-api-server up and running using your docker container, all hooked up, running great.
To fine tune I set up the Colab of xtts-webui, and batch uploaded a bunch of wav files, and it sounds literally amazing. 1:1 it sounds perfect, I was honestly shocked at how accurate it was.
I thought copying the
samples/<<name>>.wav
into the api'ssamples/<<name>>.wav
would be enough, but on the self-hosted API server it sounds like a completely different person. Maybe a hint that they are the same person, but a very large difference.What is the proper way to "export" the fine-tuned model from the webui and add it to the API server? If it is just copying the wav file, is there something else I'm missing for my api server? Everything is generic, nothing customized.
Edit: Also the downloaded wav is just the first wav file, where I uploaded a batch of... 15 or so and had it clean them up and do all of the processing. So I assume really that's the problem - is there a "combined" wav or model that I should instead download?
Thanks for building the tools!
The text was updated successfully, but these errors were encountered: