Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] PyTorch K-means for discrete audio tokens extraction #2411

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

lucadellalib
Copy link
Collaborator

@lucadellalib lucadellalib commented Feb 14, 2024

Native PyTorch implementation of vanilla mini-batch K-means for extracting semantic tokens (one or multiple codebooks) using wav2vec2.0, HuBERT and WavLM on LJSpeech.
The idea is to follow the approach described in https://arxiv.org/abs/2312.09747. According to this paper, speaker information can be preserved by mapping discrete representations back to continuous ones and training a vocoder on top of these.

@lucadellalib lucadellalib changed the title [WIP] PyTorch K-means for discrete tokens extraction [WIP] PyTorch K-means for discrete audio tokens extraction Feb 14, 2024
@mravanelli mravanelli added the enhancement New feature or request label Feb 16, 2024
@lucadellalib
Copy link
Collaborator Author

tsne-ljspeech-batch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants