Loading the api demo with merged adaptor model not working #3222

cosminroger · 2024-04-10T12:42:35Z

cosminroger
Apr 10, 2024

Hi.

I am having some trouble with loading the API demo with a model that has the adaptor merged. It gets stuck at:
INFO 04-10 14:34:46 selector.py:51] Cannot use FlashAttention because the package is not found. Please install it for better performance. INFO 04-10 14:34:46 selector.py:25] Using XFormers backend.

The model I have merged the adaptor into is: mistralai/Mistral-7B-Instruct-v0.2

This is the script I am using:

CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python src/api_demo.py
--model_name_or_path ./model
--template mistral
--infer_backend vllm
--vllm_enforce_eager

I have also checked the GPU memory usage, and it looks like there might be some memory allocation, but the shards are not actually loaded. I have tested with the normal mistral model and it works, but it doesn't work with the adapter merged model.

Thanks for the help in advance.

cosminroger · 2024-04-10T14:02:29Z

cosminroger
Apr 10, 2024
Author

I found a solution for this:

CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python src/api_demo.py
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2
--adapter_name_or_path ./someAdaptorPathHere
--template mistral
--infer_backend huggingface
--finetuning_type lora

This now works, it loads the adapter, and the completion endpoint works.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading the api demo with merged adaptor model not working #3222

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Loading the api demo with merged adaptor model not working #3222

cosminroger Apr 10, 2024

Replies: 1 comment

cosminroger Apr 10, 2024 Author

cosminroger
Apr 10, 2024

cosminroger
Apr 10, 2024
Author