-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message from Server:TensorRT-LLM not supported on Server yet. #164
Comments
@Rodenhhh Hey, which gpu are using ? |
@makaveli10 A100-40GB |
@Rodenhhh yeah the docker image is supposed to work only on 4090, and unfortunately we missed that part and its not mentioned anywhere sorry for the trouble. As for a solution, stay tuned we will push a docker-compose to make TensorRT-LLM setup straight forward. |
@Rodenhhh you can test the docker compose setup if it builds and works as expected. Just make sure to pass the right |
@Rodenhhh did you get a chance to try out the docker compose solution? |
Closed by #227 |
i run the docker
ghcr.io/collabora/whisperbot-base:latest
and run the server , but when i send the request by client , i facedclient:.
server:
[03/04/2024-12:11:15] TensorRT-LLM not supported: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T*) nullptr, idVals, (int*) nullptr, vocabSize * batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) * 8, stream): no kernel image is available for execution on the device (/root/TensorRT-LLM/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:322)
The text was updated successfully, but these errors were encountered: