Transcription fails with diarization enabled #790

foolishgrunt · 2024-04-24T06:08:17Z

I'm trying to use a derivative project (subsai) with whisperX as its back-end. It works perfectly and generates my subtitle file as desired when run normally, but it fails as soon as I try to enable speaker diarization - IMO the killer feature of whisperX over the other whisper implementations supported by subsai.

I am nearly certain that this is not a bug in subsai, because when I try to upload the same file to victor-upmeet's demo instance, I get the same result: works perfectly when run normally, but returns the following when I enable speaker diarization:

Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.2. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/whisperx-vad-segmentation.binModel was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu121. Bad things might happen unless you revert torch to 1.x. Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/pyannote/segmentation/resolve/2022.07/pytorch_model.bin The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1261, in hf_hub_download metadata = get_hf_file_metadata( ^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1674, in get_hf_file_metadata r = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 369, in _request_wrapper response = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 393, in _request_wrapper hf_raise_for_status(response) File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/pyannote/segmentation/resolve/2022.07/pytorch_model.bin The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/cog/server/worker.py", line 217, in _predict result = predict(**payload) ^^^^^^^^^^^^^^^^^^ File "/src/predict.py", line 167, in predict result = diarize(audio, result, debug, huggingface_access_token, min_speakers, max_speakers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/predict.py", line 291, in diarize diarize_model = whisperx.DiarizationPipeline(model_name='pyannote/speaker-diarization@2.1', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/whisperx/diarize.py", line 19, in __init__ self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/core/pipeline.py", line 136, in from_pretrained pipeline = Klass(**params) ^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 130, in __init__ model: Model = get_model(segmentation, use_auth_token=use_auth_token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/pipelines/utils/getter.py", line 75, in get_model model = Model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/pyannote/audio/core/model.py", line 624, in from_pretrained path_for_pl = hf_hub_download( ^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1406, in hf_hub_download raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

To my uneducated eyes, it looks like it's failing to connect to HuggingFace. I have properly inputted my access token, and accepted the terms of both the segmentation and the speaker-diarization libraries. Here is the audio file in question. (The actual audio to be transcribed is nearly 2 hours long, but I trimmed it to ~25 minute segments to see if reducing the file size would help. This yielded no results.)

Any ideas how to overcome this?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcription fails with diarization enabled #790

Transcription fails with diarization enabled #790

foolishgrunt commented Apr 24, 2024 •

edited

Transcription fails with diarization enabled #790

Transcription fails with diarization enabled #790

Comments

foolishgrunt commented Apr 24, 2024 • edited

foolishgrunt commented Apr 24, 2024 •

edited