Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long-form/streaming support? #53

Open
Blazzycrafter opened this issue Feb 14, 2024 · 5 comments
Open

long-form/streaming support? #53

Blazzycrafter opened this issue Feb 14, 2024 · 5 comments
Labels
feature request New feature or request

Comments

@Blazzycrafter
Copy link

i wanna use it in role plays and the audio is mostly 500+ chars big so the generation is long.....
is there and stream mode planned?
like in xtts?

@vatsalaggarwal
Copy link
Contributor

We're planning to release long-form and streaming soon after we've had some bandwidth to push code with faster inference...

by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...

@vatsalaggarwal vatsalaggarwal changed the title steam support? streaming support? Feb 29, 2024
@vatsalaggarwal vatsalaggarwal changed the title streaming support? long-form/streaming support? Feb 29, 2024
@vatsalaggarwal vatsalaggarwal added the feature request New feature or request label Mar 12, 2024
@platform-kit
Copy link

Hey @vatsalaggarwal, is that release still in the pipeline?

@sidroopdaska
Copy link
Contributor

@platform-kit, yes release is still planned. We just released fine-tuning capabilities #93. We are now going to start working on long-form & streaming.

Would love insights on the below

by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...

@platform-kit
Copy link

@sidroopdaska The way I did this in my implementation of XTTS (https://github.com/Render-AI/cog-xtts-v2/blob/main/predict.py) was to split the text into chunks (i.e. sentences, but it could be done in other ways), then render each sentence as an audio output and then concatenate the audio.

You do lose some context this way but it makes the output very stable (avoiding weird outputs where the voice trails off as the duration increases, for example).

@MethanJess
Copy link

Would love insights on the below

by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...

@sidroopdaska
daswer123 has made a WebUI that has infinite amount of text input, the API streaming is still coming soon though. https://github.com/daswer123/xtts-webui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants