Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtts-webui sounds different than xtts-api-server #36

Open
AgentScrubbles opened this issue Dec 31, 2023 · 2 comments
Open

xtts-webui sounds different than xtts-api-server #36

AgentScrubbles opened this issue Dec 31, 2023 · 2 comments

Comments

@AgentScrubbles
Copy link

AgentScrubbles commented Dec 31, 2023

(Sorry for the second issue, it was unrelated and I assume this is a stupid user moment)

So I have both xtts-api-server up and running using your docker container, all hooked up, running great.

To fine tune I set up the Colab of xtts-webui, and batch uploaded a bunch of wav files, and it sounds literally amazing. 1:1 it sounds perfect, I was honestly shocked at how accurate it was.

I thought copying the samples/<<name>>.wav into the api's samples/<<name>>.wav would be enough, but on the self-hosted API server it sounds like a completely different person. Maybe a hint that they are the same person, but a very large difference.

What is the proper way to "export" the fine-tuned model from the webui and add it to the API server? If it is just copying the wav file, is there something else I'm missing for my api server? Everything is generic, nothing customized.

Edit: Also the downloaded wav is just the first wav file, where I uploaded a batch of... 15 or so and had it clean them up and do all of the processing. So I assume really that's the problem - is there a "combined" wav or model that I should instead download?

Thanks for building the tools!

@AgentScrubbles
Copy link
Author

Update, I Found that I Needed to download the entire directory, so speaker/...wav files all are downloaded after being cleaned up and moved over. However, playing on my API server it still sounds like a completely different voice. It's cleaner, less tinny and robotic than just the one, but it sounds nothing like the original voice still. (The voice I hear is american, the voice I uploaded is british)

@daswer123
Copy link
Owner

Hi, I'll try to figure it out after the holidays, I don't have time right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants