Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in the VADIterator() which would return negative start #446

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sobomax
Copy link

@sobomax sobomax commented Apr 18, 2024

VADIterator() might return negative start position if voice happens to be detected in the very first frame. I don't have a test case to reproduce, but the logic error should be seen with an unaided eye. It basically tripped some assertions in our own code:

(InfernRTPActor pid=141495) Exception in thread Thread-5:
(InfernRTPActor pid=141495) Traceback (most recent call last):
(InfernRTPActor pid=141495)   File "/home/sobomax/miniconda3/envs/tinygrad/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
(InfernRTPActor pid=141495)     self.run()
(InfernRTPActor pid=141495)   File "/home/sobomax/projects/Infernos/Cluster/InfernBatchedWorker.py", line 39, in run
(InfernRTPActor pid=141495)     self.process_batch(wis)
(InfernRTPActor pid=141495)   File "/home/sobomax/miniconda3/envs/tinygrad/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(InfernRTPActor pid=141495)     return func(*args, **kwargs)
(InfernRTPActor pid=141495)            ^^^^^^^^^^^^^^^^^^^^^
(InfernRTPActor pid=141495)   File "/home/sobomax/projects/Infernos/Core/VAD/SileroVAD.py", line 81, in process_batch
(InfernRTPActor pid=141495)     assert poff > 0 and poff < vc.active_buffer.size(0), f'{poff=} {vc.active_buffer.size(0)=} {sd.current_sample=} {vc.active_start=}'
(InfernRTPActor pid=141495)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(InfernRTPActor pid=141495) AssertionError: poff=1008 vc.active_buffer.size(0)=768 sd.current_sample=768 vc.active_start=-240

This is 8kHz, so -240 is the 30ms.

position if voice happens to be detected in the very first frame.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant