Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please create true comparisons with other whisper implementations #162

Open
BBC-Esq opened this issue Nov 29, 2023 · 12 comments
Open

please create true comparisons with other whisper implementations #162

BBC-Esq opened this issue Nov 29, 2023 · 12 comments

Comments

@BBC-Esq
Copy link

BBC-Esq commented Nov 29, 2023

Rather than re-typing everything, I'm simply providing a link to my issue request in the insanely-fast-whisper repository for you asking the same type of information be put in the readme:

See Here

Plus, you guys both work on the same team so I'm hoping to work together to get some more accurate/explanatory numbers for people to rely on...

You claim that the Jax implementation is 70x faster...that portion of the readme hasn't been updated in awhile, and in the meantime other advancements have been made. Also, it has never compared other options like faster-whisper, whisper.cpp, WhisperX, etc. If you would be so kind, please include additional true comparisons for people, that way when they spend hours possibly revising code they can assess whether it's worth it from a cost-benefit perspective and they're clear on "batching" being the source of any speed increase...increased vram usage...or whatever.

Thanks, still love the work ya'll do!

@sanchit-gandhi
Copy link
Owner

sanchit-gandhi commented Nov 30, 2023

Hey @BBC-Esq! Thanks for reaching out, I appreciate your interest in the Whisper-JAX project! Unfortunately this repo is more or less archived now, since we stopped working on it back in April. It was a fun project to see how fast we could make Whisper in JAX on TPU v4-8's, but the community is simply more interested in running on GPUs, which means we've switched to focussing on optimisations that can be applied uniformly, independent of hardware (e.g. Distil-Whisper: https://github.com/huggingface/distil-whisper).

There are some scripts for reproducing the benchmarks here. The pmap benchmarks should be run on at least a TPU v4-8, or v4-16 for the best performance on higher batch-sizes (where we are > 100x faster than Open AI Whisper). These are the results we got on an A100 with CUDA 11 and PyTorch 1.9, and TPU v4-8 https://github.com/sanchit-gandhi/whisper-jax#benchmarks. It would be super interesting to see how it compares to newer implementations, e.g. faster-whisper. Note that Whisper-JAX on TPU v4-8 is much faster than Hugging Face Transformers on GPU (which is what powers insanely-fast-whisper), so you should get the faster result of them all. You can get a ball park idea using a v3-8: https://www.kaggle.com/code/sgandhi99/whisper-jax-tpu. Using a v4-8 is about 2x faster than this in my experience.

@BBC-Esq
Copy link
Author

BBC-Esq commented Nov 30, 2023

Hey, thank you. BTW, tell your colleague over at insanely-fast-whisper to change his readme and not dump on other peoples' work...

Moving on...Thank you sincerely for the technical discussion. I'm excited that you're working on Distil-Whisper. I still need to test that!

To make sure I understand you...Jax version is much faster on TPU but Hugging Face Transformers is much faster on GPUs? I don't have a TPU so...

@BBC-Esq
Copy link
Author

BBC-Esq commented Nov 30, 2023

I forgot to ask, is it true that Distil-Whisper can't do any language besides English...and that's basically the tradeoff?

Looking forward to your work on distil-whisper. Definitely in the hopper to try out.

@sanchit-gandhi
Copy link
Owner

We found Whisper JAX to be faster than Hugging Face Transformers' Whisper (same as insanely-fast-whisper) on our experiment. But those with different CUDA/PyTorch/Hardware versions had varying results, and sometimes PyTorch was faster. JAX is a real pain to set-up on CUDA, so I would encourage you to use a PyTorch implementation if you're working on GPU. With Flash Attention 2 support, and full torch compile support coming, I'm pretty sure Whisper in PyTorch will always beat Whisper in JAX in the next few weeks.

If you're using cloud computing, then swapping from GPUs to TPU v3's on GCP are quite reasonably priced, and you can run transcription super fast using there: https://cloud.google.com/tpu/pricing. TPU v4s are what we benchmarked to get the fastest results.

@sanchit-gandhi
Copy link
Owner

Yes that's right. Distil-Whisper is English-only since it's the language that has the most usage, but we still want to provide checkpoints that support more languages. Distilling Whisper on all the languages it supports in one go is hard - the decoder is very small, so it's difficult for it to have good knowledge of all languages at once. Instead, we're actively encouraging the community to distill language-specific Whisper checkpoints. We've released all the Distil-Whisper training code, and a comprehensive description of the steps required to perform distillation: https://github.com/huggingface/distil-whisper/tree/main/training. Feel free to ask if you have any questions about Distil-Whisper on the repo! I'd be more than happy to answer.

@BBC-Esq
Copy link
Author

BBC-Esq commented Nov 30, 2023

That's awesome, thanks for the info, very helpful. I noticed you said that Hugging Face Transformers' Whisper...is that the same as BetterTransformers in a way? BetterTransformers is basically a class/library that Huggingface created...kind of like Pipeline? I'm learning about Pipeline and how it simplifies things...and I'm learning about the parameters you can use...

My question is, what person (or people) actually, physically, created the batching functionality of the Pipeline...upon which the "insanely" fast whisper (insert lightning bolt, insert the word "blazingly" a few more times...) uses?

It appears that the developer for insanely-faster-whisper singlehandedly created that functionality, thus enabling the world to experience insanely faster whisper for the betterment of mankind.

I'd like to know who's actually responsible and/or if it was a team effort over there with you guys. I'd like to know who to follow to keep abreast of the creative and hard work you guys do...Thanks.

@sanchit-gandhi
Copy link
Owner

sanchit-gandhi commented Nov 30, 2023

  • Transformers Whisper: provides the underlying code for the Whisper model, with efficient attention code and Flash Attention 2 support. Can be used for short form audio and long form audio with "sequential" transcription. See docs for examples.
  • Optimum BetterTransformer: builds on the Transformers implementation, by changing the attention implementation to use PyTorch SDPA. Equivalent to Flash Attention 1 on hardware that supports it.
  • Transformers Pipeline: wrapper around the Transformers Whisper model for an easier API. Also implements the "chunked" long-form transcription algorithm, which is about 10x faster than OpenAI's original "sequential" one, see Section 5 of the Distil-Whisper paper and also Table 7.

=> now this is all the underlying code that you need to get the reported speed-ups

What insanely-fast-whisper does is packages up the above 3 implementations into end-to-end examples and a CLI so that you can get maximum performance as easily as possible.

@BBC-Esq
Copy link
Author

BBC-Esq commented Nov 30, 2023

I didn't get a direct response to my main question.

Let's try asking again...Did the dude at insanely-fast-whisper actually create any of the code of these underlying technologies? Wink with your left eye if you don't want to answer because ya'll work together or wink with your right eye for no, he didn't actually create any new innnovation and it was other people...We'll just keep this between you and I. lol.

@sanchit-gandhi
Copy link
Owner

You can do all the best open-source work, develop the best open-source models, and have the fastest open-source library. But if no-one knows about, it's useless!

In that regard, making these tools more accessible and visible to the OS community is just as valuable (if not more) than actually developing them.

I don't think we should credit people any more or any less depending on what they've done here. It's a collaborative effort in which we simply want to work with the community to create the best open-source speech technologies possible.

@BBC-Esq
Copy link
Author

BBC-Esq commented Nov 30, 2023

Thanks for the platitudes, but they still didn't answer my question. No worries, I understand...you're in a tight spot with him being a co-worker. I am not, however, in such a situation.

If he deserved credit I'll recognize that, but seems like he's actually done nothing new or innovative whatsoever so...

Anyways, I've spent a lot of my personal time on this so I'm going to give it a break for a day...Feel free to test out my program if you want, or if you want to re-created the tests I've done. Thanks.

@flexchar
Copy link

Based on Vaibhavs10/insanely-fast-whisper#82 I'd suggest to have this closed.

@BBC-Esq
Copy link
Author

BBC-Esq commented Dec 29, 2023

lol, Noting was actually addressed, but go ahead and close if you want Sanchit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants