Loading the api demo with merged adaptor model not working #3222
Unanswered
cosminroger
asked this question in
Q&A
Replies: 1 comment
-
I found a solution for this: CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python src/api_demo.py This now works, it loads the adapter, and the completion endpoint works. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi.
I am having some trouble with loading the API demo with a model that has the adaptor merged. It gets stuck at:
INFO 04-10 14:34:46 selector.py:51] Cannot use FlashAttention because the package is not found. Please install it for better performance. INFO 04-10 14:34:46 selector.py:25] Using XFormers backend.
The model I have merged the adaptor into is: mistralai/Mistral-7B-Instruct-v0.2
This is the script I am using:
CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python src/api_demo.py
--model_name_or_path ./model
--template mistral
--infer_backend vllm
--vllm_enforce_eager
I have also checked the GPU memory usage, and it looks like there might be some memory allocation, but the shards are not actually loaded. I have tested with the normal mistral model and it works, but it doesn't work with the adapter merged model.
Thanks for the help in advance.
Beta Was this translation helpful? Give feedback.
All reactions