VoiceCraftST

VoiceCraftST is a Python API that integrates the advanced text-to-speech capabilities of VoiceCraft with SillyTavern. This API enables the use of VoiceCraft's features within SillyTavern without any need for modifications on the SillyTavern platform.

Updates

2024.4.27 Implemented sliding window and bumped to latest version of VC

Installation

VoiceCraftST is supported on Linux systems with NVIDIA GPUs to avoid complex dependencies required for VoiceCraft model.

Prerequisites

Docker: Install Docker by following the guide on Docker's official website.
NVIDIA Docker: Required for NVIDIA GPU support. Installation instructions can be found on NVIDIA Docker's GitHub repository.

Installation Steps

Clone the Repository: Clone VoiceCraftST and initialize its submodule:

git clone https://github.com/Keeo/VoiceCraftST.git
cd VoiceCraftST
git submodule init
git submodule update

Deploy Using Docker-Compose: Navigate to the directory containing docker-compose.yaml and execute:
```
docker-compose up
```
This command builds the necessary Docker images and starts the service, making VCST available at port 5000 on your local machine.

Configuration for SillyTavern

Configure SillyTavern to use VoiceCraftST by following these steps:

Setup TTS Provider

Choose TTS Provider: In SillyTavern settings, select 'XTTSv2' as the TTS provider.
TTS Features:
- Enable TTS: Ensure the 'Enabled' checkbox is checked.
- Auto Generation: Activate automatic speech generation from text.
- Narration and Text Preferences: Configure preferences for narrating user messages, handling special text formats (like quotes or code), and processing text with special characters.

Advanced TTS Settings

Provider Endpoint Configuration: Set up the XTTS endpoint with specific parameters to optimize performance:
```
http://localhost:5000/{username}/{stop_repetition}/{sample_batch_size}
```
Customize settings such as username, stop_repetition, and sample_batch_size based on your requirements.
- username: Choose any username you want, it is used as a key for rest of the configuration.
- stop_repetition: if the model generate long silence, reduce the stop_repetition to 3, 2 or even 1 (default: 3)
- sample_batch_size: if the if there are long silence or unnaturally strecthed words, increase sample_batch_size to 5 or higher. What this will do to the model is that the model will run sample_batch_size examples of the same audio, and pick the one that's the shortest. So if the speech rate of the generated is too fast change it to a smaller number. (default: 4)
Parameter Tuning: Adjust TTS parameters like Temperature, Top P and Top K to fine-tune the speech generation characteristics.
- Speed: Does nothing.
- Temperature: Sets the variance in speech generation. (default: 1.0)
- Length Penalty: Does nothing.
- Repetition Penalty: Does nothing.
- Top K: (default: 0)
- Top P: (default: 0.8)
- Stream Chunk Size: Does nothing and streaming has not been implemented.

Optional: Enable Text Splitting

For better handling of large text blocks, enable the 'Text Splitting' feature to ensure smooth and continuous narration. More at VoiceCraft#39. (default: True)

Adding New Voices

To add new voices to the system:

Prepare Voice Sample: Record a mono WAV file at 16,000 Hz sample rate.
Create Transcript: Accurately transcribe the voice sample.
Upload Command: Use the following command to upload the new voice sample and its transcript:
```
make upload SPEAKER_NAME=sample TRANSCRIPT="Your transcript here." FILE_PATH=sample.wav
```
Replace placeholders with actual data. After uploading, reload SillyTavern to access the new voice.

System requirements

In current state it requires 24gb GPU VRAM requirements for training, finetuning, and inference? but it is likely to go down. Speed is slower compared to XTTSv2 and with TopP below 0.85 it likes to generate long silences which decrease the speed even further. sample_batch_size parameter is supposed to help with that by brute-force and generating multiple alternatives and picking shortest.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
VoiceCraft @ da6d34e		VoiceCraft @ da6d34e
api		api
data		data
.Dockerignore		.Dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
runpod.md		runpod.md

Keeo/VoiceCraftST

Folders and files

Latest commit

History

Repository files navigation

VoiceCraftST

Updates

Installation

Prerequisites

Installation Steps

Configuration for SillyTavern

Setup TTS Provider

Advanced TTS Settings

Optional: Enable Text Splitting

Adding New Voices

System requirements

About

Topics

Resources

Stars

Watchers

Forks

Languages