Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched input not working in pytorch model #351

Open
lokesh1199 opened this issue Jun 16, 2023 · 4 comments
Open

Batched input not working in pytorch model #351

lokesh1199 opened this issue Jun 16, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@lokesh1199
Copy link

馃悰 Bug

Hi,
I am trying to batch the input to the pytorch model, but the batched input works for when the model is in cpu. But when the model is in gpu, the inference only works for the first time. After that it shows the below error occurs

code

silero_model.silero_model(torch.rand((34, 512), device='cuda'), 16000)

If I run the above code twice, then this error occurs.

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/vad/model/vad_annotator.py", line 43, in forward
      _h = self._h
      _c = self._c
      out0, _6, _7, = (_model).forward(x0, _h, _c, )
                       ~~~~~~~~~~~~~~~ <--- HERE
      self._h = _6
      self._c = _7
  File "code/__torch__/vad/model/vad_annotator/___torch_mangle_28.py", line 26, in forward
    x3 = (encoder).forward(x2, )
    decoder = self.decoder
    x4, h0, c0, = (decoder).forward(x3, h, c, )
                   ~~~~~~~~~~~~~~~~ <--- HERE
    _0 = torch.mean(torch.squeeze(x4, 1), [1])
    out = torch.unsqueeze(_0, 1)
  File "code/__torch__/vad/model/vad_annotator.py", line 156, in forward
    x3 = torch.permute(x0, [0, 2, 1])
    decoder = self.decoder
    x4 = (decoder).forward(x3, )
          ~~~~~~~~~~~~~~~~ <--- HERE
    return (x4, h0, c0)
class VADRNNJIT(Module):
  File "code/__torch__/torch/nn/modules/container/___torch_mangle_26.py", line 15, in forward
    _2 = getattr(self, "2")
    input0 = (_0).forward(input, )
    input1 = (_1).forward(input0, )
              ~~~~~~~~~~~ <--- HERE
    return (_2).forward(input1, )
  def __len__(self: __torch__.torch.nn.modules.container.___torch_mangle_26.Sequential) -> int:
  File "code/__torch__/torch/nn/modules/conv/___torch_mangle_25.py", line 23, in forward
    weight = self.weight
    bias = self.bias
    _0 = (self)._conv_forward(input, weight, bias, )
          ~~~~~~~~~~~~~~~~~~~ <--- HERE
    return _0
  def _conv_forward(self: __torch__.torch.nn.modules.conv.___torch_mangle_25.Conv1d,
  File "code/__torch__/torch/nn/modules/conv/___torch_mangle_25.py", line 29, in _conv_forward
    weight: Tensor,
    bias: Optional[Tensor]) -> Tensor:
    _1 = torch.conv1d(input, weight, bias, [1], [0], [1])
         ~~~~~~~~~~~~ <--- HERE
    return _1

Traceback of TorchScript, original code (most recent call last):
  File "/home/keras/notebook/nvme_raid/adamnsandle/silero-models-research/vad/model/vad_annotator.py", line 377, in forward
    
        if sr == 16000:
            out, self._h, self._c = self._model(x, self._h, self._c)
                                    ~~~~~~~~~~~ <--- HERE
        elif sr == 8000:
            out, self._h, self._c = self._model_8k(x, self._h, self._c)
  File "/home/keras/notebook/nvme_raid/adamnsandle/silero-models-research/vad/model/vad_annotator.py", line 300, in forward
        x = self.encoder(x)
    
        x, h, c = self.decoder(x, h, c)
                  ~~~~~~~~~~~~ <--- HERE
    
        out = x.squeeze(1).mean(dim=1).unsqueeze(1)
  File "/home/keras/notebook/nvme_raid/adamnsandle/silero-models-research/vad/model/vad_annotator.py", line 241, in forward
        x = x.permute(0, 2, 1)
    
        x = self.decoder(x)
            ~~~~~~~~~~~~ <--- HERE
    
        return x, h, c
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
    def forward(self, input):
        for module in self:
            input = module(input)
                    ~~~~~~ <--- HERE
        return input
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 307, in forward
    def forward(self, input: Tensor) -> Tensor:
        return self._conv_forward(input, self.weight, self.bias)
               ~~~~~~~~~~~~~~~~~~ <--- HERE
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 303, in _conv_forward
                            weight, bias, self.stride,
                            _single(0), self.dilation, self.groups)
        return F.conv1d(input, weight, bias, self.stride,
               ~~~~~~~~ <--- HERE
                        self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [1, 64, 1], expected input[64, 1, 34] to have 64 channels, but got 1 channels instead
@lokesh1199 lokesh1199 added the bug Something isn't working label Jun 16, 2023
@snakers4
Copy link
Owner

Hi,

But when the model is in gpu

The model is not designed to run on GPU.

the inference only works for the first time.

What is "the first time"?

@lokesh1199
Copy link
Author

What is "the first time"?

silero_model, _ = torch.hub.load(
            repo_or_dir="snakers4/silero-vad",
            model="silero_vad",
            force_reload=False,
            onnx=False,
        )
silero_model.to('cuda')
silero_model(torch.rand((34, 512), device='cuda'), 16000) # no error 
silero_model(torch.rand((34, 512), device='cuda'), 16000) # weights shape mismatch error

@janvainer
Copy link

janvainer commented Jul 28, 2023

Hi I am having the same issue. The reason for batched inference on GPU is for large batches in very low-latency streaming scenario. The cpu version is relatively slow when batch size exceeds a couple tens of items. Are there any plans to make this model usable on GPU?

@amr-lopezjos
Copy link

I ran a sanity check using a 4-second clip at 16 KHz. I split the array into chunks of 512. Unfortunately, I get very different probabilities in batched mode vs. frame-by-frame inference.
The frame-by-frame probabilities seem correct. When there is speech, the pytorch model outputs a value close to 1. The same frame will output a much lower value, e.g. 0.15, if it is part of a batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants