-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help]: FACodec. How to recreate demo examples for voice conversion? #161
Comments
Hi, which checkpoint are you using? You can follow: from Amphion.models.codec.ns3_codec import FACodecEncoderV2, FACodecDecoderV2
# Same parameters as FACodecEncoder/FACodecDecoder
fa_encoder_v2 = FACodecEncoderV2(...)
fa_decoder_v2 = FACodecDecoderV2(...)
encoder_v2_ckpt = hf_hub_download(repo_id="amphion/naturalspeech3_facodec", filename="ns3_facodec_encoder_v2.bin")
decoder_v2_ckpt = hf_hub_download(repo_id="amphion/naturalspeech3_facodec", filename="ns3_facodec_decoder_v2.bin")
fa_encoder_v2.load_state_dict(torch.load(encoder_v2_ckpt))
fa_decoder_v2.load_state_dict(torch.load(decoder_v2_ckpt))
with torch.no_grad():
enc_out_a = fa_encoder_v2(wav_a)
prosody_a = fa_encoder_v2.get_prosody_feature(wav_a)
enc_out_b = fa_encoder_v2(wav_b)
prosody_b = fa_encoder_v2.get_prosody_feature(wav_b)
vq_post_emb_a, vq_id_a, _, quantized, spk_embs_a = fa_decoder_v2(
enc_out_a, prosody_a, eval_vq=False, vq=True
)
vq_post_emb_b, vq_id_b, _, quantized, spk_embs_b = fa_decoder_v2(
enc_out_b, prosody_b, eval_vq=False, vq=True
)
vq_post_emb_a_to_b = fa_decoder_v2.vq2emb(vq_id_a, use_residual=False)
recon_wav_a_to_b = fa_decoder_v2.inference(vq_post_emb_a_to_b, spk_embs_b) |
Hi, I tried this code but the quality of the reconstructed wav seems to be poor, how should I adjust the parameters to get the best results? |
same here |
Hi, since our model is trained on 16KHz English data, vc performance in other languages may not be as good as shown on the demo page. |
Is that possible to train with a new language? And How can i do it? |
Hi, you can train the codec with other languages if you have some aligned phonemes and waveforms. |
But now I use the English source and prompt provided by the demo page to generate zero-shot voice quality is worse than that of the demo page. May I ask why? |
Hi @wosyoo, could you attach your input and generated samples here? |
Would love to do this, how can I ? Haven't seen any training code so far .... and I need to say: in the target language I am using, the results with the pretrained models are really bad (Icelandic) |
Problem Overview
I tried to recreate results from demo page for FACodec: Voice Conversion Samples, but results are worse then examples provided in demo page. Why is it so? And how to achieve the same quality as from demo page samples?
Steps Taken
Expected Outcome
Results of voice conversion are worse then in examples.
Environment Information
The text was updated successfully, but these errors were encountered: