-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of serving LLM with Ray Serve and vLLM #45325
base: master
Are you sure you want to change the base?
Conversation
27032bd
to
c7f7dce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!
c7f7dce
to
68b4c8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
c7f7dce
to
65667d9
Compare
ci/docker/serve.build.Dockerfile
Outdated
@@ -28,6 +28,9 @@ pip install -U -c python/requirements_compiled.txt \ | |||
tensorflow tensorflow-probability torch torchvision \ | |||
transformers aioboto3 | |||
|
|||
# Add vllm for llm test | |||
pip install -U vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pin a version please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and why does it required the -U
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing if this works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done @aslonnie can you approve?
bdacd67
to
279673c
Compare
Signed-off-by: Akshay Malik <akshay@anyscale.com>
279673c
to
1f72df5
Compare
Ready to merge. pending @aslonnie 's approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint is failing; i just added lint to microcheck yesterday ;), sorry for not catching this sooner |
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
pushed a commit to fix linter |
@@ -8,7 +8,7 @@ xgboost==1.7.6 | |||
lightgbm==3.3.5 | |||
|
|||
# Huggingface | |||
transformers==4.36.2 | |||
transformers==4.40.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work as it is unfortunately, you'll need to recompile the requirements_compiled.txt (step 4 in https://www.notion.so/anyscale-hq/OSS-Python-dependency-management-f32633b0018c423f927727807ea9da08)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so painful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does CI still do it and output the compiled version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice it seems it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is yess, it should be the artifact of this job https://buildkite.com/ray-project/premerge/builds/26290#018f8782-65e9-40ba-ad54-655f6c33f0bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool trying that
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@@ -14,3 +14,6 @@ torch-scatter==2.1.1+pt20cu118 | |||
torch-sparse==0.6.17+pt20cu118 | |||
torch-cluster==1.6.1+pt20cu118 | |||
torch-spline-conv==1.2.2+pt20cu118 | |||
|
|||
# Install vLLM for documentation example tests (doc/source/serve/doc_code). | |||
vllm==0.4.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the serve build doesn't install this requirement though so the test will likely fail; you might need to add here https://github.com/ray-project/ray/blob/master/ci/docker/serve.build.Dockerfile#L26
-r python/requirements/dl-cpu-requirements.txt
and add python/requirements/dl-cpu-requirements.txt
to the source file in https://github.com/ray-project/ray/blob/master/ci/docker/serve.build.py39.wanda.yaml#L13
also perhaps add vllm==0.4.2
to dl-cpu-requirements.txt because dl-gpu-requirements.txt doesn't install on a cpu image (the serve.build.Dockerfile is based on ubuntu not cuda)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about all of these complexities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need it for the GPU build, do I still need to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yess, your docs run on a gpu build, ignore what i say :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the build failed w/ the new compiled requirements: https://buildkite.com/ray-project/premerge/builds/26291#018f879a-16e5-4a7b-b63a-06f2e79a2c9a/119-1330
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it requires the latest version of torch as well, this vllm package is pretty picky ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah :(
Am I right in assuming that the prometheus metrics that vllm exports will just automatically get propagated to the ray serve metrics endpoint? (provided i use the |
Adds a documentation example using vLLM to serve LLM models on Ray Serve. This is a copy of #45325 + add a build environment for ray serve + vllm. Test: - CI --------- Signed-off-by: can <can@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com> Signed-off-by: Cuong Nguyen <128072568+can-anyscale@users.noreply.github.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Why are these changes needed?
Adds a documentation example using vLLM to serve LLM models on Ray Serve.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.