-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: vLLM api_server.py when using with prompt_token_ids causes error.
bug
Something isn't working
#5186
opened Jun 1, 2024 by
TikZSZ
[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ?
feature request
#5185
opened Jun 1, 2024 by
xxll88
[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary Something isn't working
asyncio.exceptions.CancelledError
bug
#5182
opened Jun 1, 2024 by
jlcmoore
[Bug]: Model Launch Hangs with 16+ Ranks in vLLM
bug
Something isn't working
#5170
opened May 31, 2024 by
wushidonguc
[Performance]: What can we learn from OctoAI
performance
Performance-related issues
#5167
opened May 31, 2024 by
hmellor
[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine
bug
Something isn't working
#5162
opened May 31, 2024 by
kezouke
[Bug]: WSL2(Including Docker) 2 GPU problem --tensor-parallel-size 2
bug
Something isn't working
#5161
opened May 31, 2024 by
goodmaney
[Feature]: Linear adapter support for Mixtral
feature request
#5155
opened May 31, 2024 by
DhruvaBansal00
[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference.
bug
Something isn't working
#5154
opened May 31, 2024 by
LIUKAI0815
[Bug]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache
bug
Something isn't working
#5152
opened May 31, 2024 by
khluu
[Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests
bug
Something isn't working
#5147
opened May 31, 2024 by
zhaotyer
[Usage]: how should I do data parallelism using vLLM?
usage
How to use vllm
#5143
opened May 30, 2024 by
YuWang916
[Bug]: nsys cannot track the cuda kernel called by the process except rank 0
bug
Something isn't working
#5132
opened May 30, 2024 by
crazy-JiangDongHua
[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX
feature request
#5128
opened May 30, 2024 by
zhaofangtao
[Usage]: extractive question answering using VLLM
usage
How to use vllm
#5126
opened May 30, 2024 by
suryavan11
[New Model]: LLaVA-NeXT-Video support
new model
Requests to new models
#5124
opened May 30, 2024 by
AmazDeng
[Usage]: Multiple samplig params with OpenAI library
usage
How to use vllm
#5117
opened May 30, 2024 by
JH-lee95
[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters
bug
Something isn't working
#5113
opened May 29, 2024 by
flexorRegev
[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM
misc
#5107
opened May 29, 2024 by
AkshataDM
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-05-01.