[WIP] PyTorch K-means for discrete audio tokens extraction #2411

lucadellalib · 2024-02-14T23:28:56Z

Native PyTorch implementation of vanilla mini-batch K-means for extracting semantic tokens (one or multiple codebooks) using wav2vec2.0, HuBERT and WavLM on LJSpeech.
The idea is to follow the approach described in https://arxiv.org/abs/2312.09747. According to this paper, speaker information can be preserved by mapping discrete representations back to continuous ones and training a vocoder on top of these.

lucadellalib · 2024-02-16T18:39:34Z

Add PyTorch K-means

c45dc2c

lucadellalib changed the title ~~[WIP] PyTorch K-means for discrete tokens extraction~~ [WIP] PyTorch K-means for discrete audio tokens extraction Feb 14, 2024

lucadellalib added 3 commits February 14, 2024 18:46

Fix variable name

2299ad4

Update

35ea0dd

Update

a3f131c

mravanelli requested a review from poonehmousavi February 16, 2024 03:26

mravanelli assigned lucadellalib Feb 16, 2024

mravanelli added the enhancement New feature or request label Feb 16, 2024

Update

997fd62

lucadellalib added 3 commits February 16, 2024 19:02

Minor improvements, refactoring

26b887b

Minor fixes, update hyperparameters

96867c1

Update hyperparameters

4a78bbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] PyTorch K-means for discrete audio tokens extraction #2411

[WIP] PyTorch K-means for discrete audio tokens extraction #2411

lucadellalib commented Feb 14, 2024 •

edited

lucadellalib commented Feb 16, 2024

[WIP] PyTorch K-means for discrete audio tokens extraction #2411

Are you sure you want to change the base?

[WIP] PyTorch K-means for discrete audio tokens extraction #2411

Conversation

lucadellalib commented Feb 14, 2024 • edited

lucadellalib commented Feb 16, 2024

lucadellalib commented Feb 14, 2024 •

edited