You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i run the docker ghcr.io/collabora/whisperbot-base:latest and run the server , but when i send the request by client , i faced
client:.
[INFO]: * recording
setting
[INFO]: Waiting for server ready ...
[INFO]: Opened connection
Message from Server: TensorRT-LLM not supported on Server yet. Reverting to available backend: 'faster_whisper'
[INFO]: Websocket connection closed: 1000:
server: [03/04/2024-12:11:15] TensorRT-LLM not supported: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T*) nullptr, idVals, (int*) nullptr, vocabSize * batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) * 8, stream): no kernel image is available for execution on the device (/root/TensorRT-LLM/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:322)
The text was updated successfully, but these errors were encountered:
@Rodenhhh yeah the docker image is supposed to work only on 4090, and unfortunately we missed that part and its not mentioned anywhere sorry for the trouble.
As for a solution, stay tuned we will push a docker-compose to make TensorRT-LLM setup straight forward.
Thanks
@Rodenhhh you can test the docker compose setup if it builds and works as expected. Just make sure to pass the right CUDA_ARCH to docker compose build to have tensorrt-llm build successfully
i run the docker
ghcr.io/collabora/whisperbot-base:latest
and run the server , but when i send the request by client , i facedclient:.
server:
[03/04/2024-12:11:15] TensorRT-LLM not supported: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T*) nullptr, idVals, (int*) nullptr, vocabSize * batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) * 8, stream): no kernel image is available for execution on the device (/root/TensorRT-LLM/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:322)
The text was updated successfully, but these errors were encountered: