Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription fails with diarization enabled #790

Open
foolishgrunt opened this issue Apr 24, 2024 · 0 comments
Open

Transcription fails with diarization enabled #790

foolishgrunt opened this issue Apr 24, 2024 · 0 comments

Comments

@foolishgrunt
Copy link

foolishgrunt commented Apr 24, 2024

I'm trying to use a derivative project (subsai) with whisperX as its back-end. It works perfectly and generates my subtitle file as desired when run normally, but it fails as soon as I try to enable speaker diarization - IMO the killer feature of whisperX over the other whisper implementations supported by subsai.

I am nearly certain that this is not a bug in subsai, because when I try to upload the same file to victor-upmeet's demo instance, I get the same result: works perfectly when run normally, but returns the following when I enable speaker diarization:

Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.2. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/whisperx-vad-segmentation.binModel was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu121. Bad things might happen unless you revert torch to 1.x. Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/pyannote/segmentation/resolve/2022.07/pytorch_model.bin The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1261, in hf_hub_download metadata = get_hf_file_metadata( ^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1674, in get_hf_file_metadata r = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 369, in _request_wrapper response = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 393, in _request_wrapper hf_raise_for_status(response) File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/pyannote/segmentation/resolve/2022.07/pytorch_model.bin The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/cog/server/worker.py", line 217, in _predict result = predict(**payload) ^^^^^^^^^^^^^^^^^^ File "/src/predict.py", line 167, in predict result = diarize(audio, result, debug, huggingface_access_token, min_speakers, max_speakers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/predict.py", line 291, in diarize diarize_model = whisperx.DiarizationPipeline(model_name='pyannote/speaker-diarization@2.1', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/whisperx/diarize.py", line 19, in __init__ self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/core/pipeline.py", line 136, in from_pretrained pipeline = Klass(**params) ^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 130, in __init__ model: Model = get_model(segmentation, use_auth_token=use_auth_token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/pipelines/utils/getter.py", line 75, in get_model model = Model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/core/model.py", line 624, in from_pretrained path_for_pl = hf_hub_download( ^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1406, in hf_hub_download raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

To my uneducated eyes, it looks like it's failing to connect to HuggingFace. I have properly inputted my access token, and accepted the terms of both the segmentation and the speaker-diarization libraries. Here is the audio file in question. (The actual audio to be transcribed is nearly 2 hours long, but I trimmed it to ~25 minute segments to see if reducing the file size would help. This yielded no results.)

Any ideas how to overcome this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant