Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we get fine tuned models to show up in speaker list? #40

Open
GamingDaveUk opened this issue Jan 8, 2024 · 5 comments
Open

How do we get fine tuned models to show up in speaker list? #40

GamingDaveUk opened this issue Jan 8, 2024 · 5 comments

Comments

@GamingDaveUk
Copy link

I suspect that Silly Tavern just hasnt been updated to support fine tuned models yet or has a bug thats not showing fine tuned models in the speaker list (ticket for that is here: SillyTavern/SillyTavern#1657 )
But just incase I am doing it wrong, here is how i am loading xtts api.
First of all I updated it pip (before the update it did not recognize the models folder command -mf however after the update it does)

The full bat i am using to load it is:

cd xtts
call venv\Scripts\activate
python -m xtts_api_server -mf C:\NewAI\SillyTavern\SillyTavern\xtts\models -sf C:\NewAI\SillyTavern\SillyTavern\xtts\speakers2 --streaming-mode-improve --deepspeed --stream-play-sync

The speakers2 part was a test with just one wav in it rather than messing with my current rather full speakers folder.
The fine tuned voice is in the models folder, in its own folder.
so models then NarratorNew folder
inside NarratorNew folder is:
config.json
model.pth (large file... VERY large file)
reference.wav
speakers_xtts.pth
vocab.json

When fine tuning in the webui this model worked fine (great infact) but with out instructions on how to moce this to an api server install I somewhat guessed its possible I got it wrong? am i missing something or is it a case of sillytavern is the issue? (it only shows the wav file in speaker2 folder)

For now I will continue finetuning as i have a fair few i want todo.

Also how do we format the json packet for manual sending to the server to use specific fine tuned models?

Currently I am using:

@echo off
setlocal enabledelayedexpansion

rem Set the API endpoint and function
set API_ENDPOINT=http://localhost:8020/tts_to_file

rem Set the input values
rem set SPEAKER_WAV="dave2.wav"
set SPEAKER_WAV="stanlyNarrator.wav"
set LANGUAGE="en"
set FILE_NAME_OR_PATH="narrator.wav"

rem Check if a file is dropped onto the batch file
if "%~1" neq "" (
    set "TEXT_FILE=%~1"
) else (
    echo No text file dropped. Exiting.
    exit /b
)

rem Read the contents of the dropped text file into the TEXT variable
set "TEXT="
for /f "delims=" %%i in ('type "%TEXT_FILE%"') do (
    set "LINE=%%i"
    rem Escape special characters in the line
    set "LINE=!LINE:"=\"!"
    set "TEXT=!TEXT!!LINE! "
)

rem Trim trailing whitespace
set "TEXT=!TEXT:~0,-1!"

rem Build the JSON payload
set JSON_PAYLOAD={^
  "text": "!TEXT!",^
  "speaker_wav": %SPEAKER_WAV%,^
  "language": %LANGUAGE%,^
  "file_name_or_path": %FILE_NAME_OR_PATH%^
}

rem Write the JSON payload to a temporary file
echo %JSON_PAYLOAD% > temp.json

rem Make the curl request
curl -v -X POST -H "Content-Type: application/json" -d @temp.json %API_ENDPOINT%

rem Remove the temporary file
del temp.json

but naturally set SPEAKER_WAV="stanlyNarrator.wav" needs to be changed to the fine tuned model, but not sure what format to use there (which could also be the silly tavern issue lol

@daswer123
Copy link
Owner

daswer123 commented Jan 8, 2024

Hi at the moment, you can't switch xtts model via SillyTaven.
Currently, to use a custom model you need to add the flag -v {MODEL NAME} where MODEL NAME is the folder name in the folder where the models are located.

I also prepared some endpoints in a recent update, maybe later they can be added to SillyTavern.
GET http://127.0.0.1:8020/get_models_list - gets a list of all available models
POST http://127.0.0.1:8020/switch_model - switches to the model we pass in

@GamingDaveUk
Copy link
Author

Ah damn, the fine tuned models came out REALLY well. Hopefully silly tavern will implement these end points.

@GamingDaveUk
Copy link
Author

GamingDaveUk commented Jan 9, 2024

Is there a way to have it load the fine tuned model and ignore the speaker_wav of the json packet?

so using my example of NarratorNew fine tuned model I load an api instance with that as the model and then have it not care what speaker wav is selected it always uses the fine tuned one? that would be a good work around for my use case until silly tavern supports fine tuned models? (specially since they may never support fine tuned models) I could then create launch bat files for each model and use as needed.

@Afterswish007
Copy link

Hmmm am I confused here? GamingDaveUK talks about wanting to use a trained voice within silly tavern. It looks your (@daswer123) reply was talking about TTS Models? I'm having this exact problem. Trained voices sound great in webui. I drop the trained folder (config.json, model.pth, reference.json, reference.wav, speakers_xtts.pth, vobab.json) into the Speakers folder within the silly tavern extension folder xtts/speakers/ rename the folder and the speaker sounds American and nothing like the original trained voice within Silly Tavern. To confirm : @daswer123 so when you say model you are referring to the model/v2.0.2? @GamingDaveUk when you say model you are referring to the speaker/trained voice?

@GamingDaveUk
Copy link
Author

GamingDaveUk commented Jan 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants