-
❓ Questions and HelpFor 16000Hz audio, window_size_samples can choose 512 (32ms) 1024 (64ms) 1536 (96ms). Can window_size_samples be selected as 160 (10ms)? In addition, what do the parameters threshold, min_silence_samples_at_max_speech, min_speech_samples, max_speech_samples, speech_pad_samples mean, and what impact do they have on the vad results? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
No, because we do not have data annotation with such granularity.
min_silence_samples_at_max_speech, speech_pad_samples - these are a but technical, it is better to look them up in the utils code
They mostly correct noise and make outputs more robust and suppress spurious activations. |
Beta Was this translation helpful? Give feedback.
No, because we do not have data annotation with such granularity.
threshold
- it is the main activation parameter, probability threshold when the VAD activatesmin_speech_samples
- the minimal number of audio samples that can constitute speech, this is to suppress spurious activationsmax_speech_samples
- the same but for maximum speech lengthmin_silence_samples_at_max_speech, speech_pad_samples - these are a but technical, it is better to look them up in the utils code