Trying to stack tensors from different devices in `_pad_to_max_length` in Whisper batched inference #30223

cifkao · 2024-04-12T16:06:06Z

This issue seems to be due to the following line, added in #29065 to fix #29036, but the fix doesn't work with batched inference on GPU/MPS because the tensor is on the wrong device:

transformers/src/transformers/models/whisper/generation_whisper.py

Line 146 in 4f7b434

sequences.append(torch.tensor([]))

System Info

transformers version: 4.40.0.dev0 bf9a7ab
Platform: macOS-14.2.1-arm64-arm-64bit
Python version: 3.10.13
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.28.0
Accelerate config: not found
PyTorch version (GPU?): 2.2.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@ylacombe @sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from datasets import Audio, load_dataset
from transformers import WhisperForConditionalGeneration, AutoProcessor
import torch
import numpy as np

model = WhisperForConditionalGeneration.from_pretrained(
    "openai/whisper-tiny", torch_dtype=torch.float16
)
processor = AutoProcessor.from_pretrained("openai/whisper-tiny")
model.to("mps")

ds = load_dataset("distil-whisper/meanwhile", "default")["test"]
ds = ds.cast_column("audio", Audio(sampling_rate=16000))
audio = ds[:8]["audio"]
audio = [x["array"] for x in audio]
audio[0][:] = np.random.normal(scale=0.05, size=audio[0].shape)
inputs = processor(
    audio,
    return_tensors="pt",
    truncation=False,
    padding="longest",
    return_attention_mask=True,
    sampling_rate=16_000,
)
inputs = inputs.to(model.device, torch.float16)

result = model.generate(
    **inputs,
    no_speech_threshold=0.2,
    logprob_threshold=0.0,
    temperature=(0.0,),
    task="transcribe",
    language="fr",
)
decoded = processor.batch_decode(
    result, skip_special_tokens=False, decode_with_timestamps=True
)
print(decoded)

Traceback (most recent call last):
  File "/Users/ondra/bordel/test-hf/repr_batch_device_issue.py", line 27, in <module>
    result = model.generate(
  File "/Users/ondra/mambaforge/envs/transformers/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 730, in generate
    sequences = _pad_to_max_length(final_segments, generation_config.pad_token_id, padding="right")
  File "/Users/ondra/mambaforge/envs/transformers/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 153, in _pad_to_max_length
    sequences = torch.stack(sequences, dim=0)
RuntimeError: torch.cat(): all input tensors must be on the same device. Received cpu and mps:0

Expected behavior

No error

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-05-13T08:57:19Z

Gentle ping @sanchit-gandhi @ylacombe

ylacombe · 2024-05-13T16:59:50Z

Thanks for the nice catch @cifkao, I've opened #30787 to address this

amyeroberts added the Audio label Apr 12, 2024

huggingface deleted a comment from github-actions bot May 13, 2024

ylacombe mentioned this issue May 13, 2024

Fix pad_to_max_length Whisper #30787

Merged

kamilakesbi assigned ylacombe May 17, 2024

ylacombe closed this as completed in #30787 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to stack tensors from different devices in `_pad_to_max_length` in Whisper batched inference #30223

Trying to stack tensors from different devices in `_pad_to_max_length` in Whisper batched inference #30223

cifkao commented Apr 12, 2024 •

edited

amyeroberts commented May 13, 2024

ylacombe commented May 13, 2024

Trying to stack tensors from different devices in _pad_to_max_length in Whisper batched inference #30223

Trying to stack tensors from different devices in _pad_to_max_length in Whisper batched inference #30223

Comments

cifkao commented Apr 12, 2024 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented May 13, 2024

ylacombe commented May 13, 2024

Trying to stack tensors from different devices in `_pad_to_max_length` in Whisper batched inference #30223

Trying to stack tensors from different devices in `_pad_to_max_length` in Whisper batched inference #30223

cifkao commented Apr 12, 2024 •

edited