-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change alignment library from whisperx
to ctc-forced-aligner
#184
Conversation
wondering where the license info for the universal multilingual model can be found |
Hi @transcriptionstream |
I'll work on getting a build going and test it out. Intrigued by the performance increase. I've got over 30k, and counting, diarizations done for a recent client utilizing the old model - the increase in speed with this model sounds wild and game changing! |
getting the following errors when trying to build this branch
diarize.py 134 <module>
emissions, stride = generate_emissions(
alignment_utils.py 129 generate_emissions
emissions_ = model(input_batch).logits
module.py 1501 _call_impl
return forward_call(*args, **kwargs)
modeling_wav2vec2.py 1969 forward
outputs = self.wav2vec2(
module.py 1501 _call_impl
return forward_call(*args, **kwargs)
modeling_wav2vec2.py 1554 forward
extract_features = self.feature_extractor(input_values)
module.py 1501 _call_impl
return forward_call(*args, **kwargs)
modeling_wav2vec2.py 461 forward
hidden_states = conv_layer(hidden_states)
module.py 1501 _call_impl
return forward_call(*args, **kwargs)
modeling_wav2vec2.py 336 forward
hidden_states = self.conv(hidden_states)
module.py 1501 _call_impl
return forward_call(*args, **kwargs)
conv.py 313 forward
return self._conv_forward(input, self.weight, self.bias)
conv.py 309 _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError:
"slow_conv2d_cpu" not implemented for 'Half' |
you can ignore the first warning, huggingface/transformers#30628 |
Thanks! Got it built and am and running it through its paces. So far so good. Trying to get some good benchmarks on speed improvement. Quick tests show it's definitely faster and output is consistent with whisperx. Would love to try it in a prod env if the license can be modified. |
Unfortunately the license is the decision of the model owners, I just reuploaded it to HF, but you can mitigate that by using another english model that has a suitable license, which works for all languages other than english too because it is the same idea (romanize and normalize all languages to match model vocab) |
Any chance you can put me in contact with the model owners? I'd love to ask some questions and see what they'd need to license it for commercial use. |
These are all relevant links, I don't have direct contact information unfortunately |
Pros:
Cons