Add example of serving LLM with Ray Serve and vLLM #45325

akshay-anyscale · 2024-05-14T07:02:03Z

Why are these changes needed?

Adds a documentation example using vLLM to serve LLM models on Ray Serve.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

edoakes

Sweet!

doc/source/serve/doc_code/vllm_openai_example.py

doc/source/serve/tutorials/vllm-example.md

doc/source/serve/doc_code/vllm_openai_example.py

doc/source/serve/tutorials/vllm-example.md

doc/source/serve/doc_code/vllm_openai_example.py

doc/source/serve/tutorials/vllm-example.md

doc/source/serve/doc_code/vllm_openai_example.py

doc/source/serve/tutorials/vllm-example.md

GeneDer

LGTM!

aslonnie · 2024-05-16T18:33:18Z

ci/docker/serve.build.Dockerfile

@@ -28,6 +28,9 @@ pip install -U -c python/requirements_compiled.txt \
  tensorflow tensorflow-probability torch torchvision \
  transformers aioboto3

+# Add vllm for llm test
+pip install -U vllm 


pin a version please.

and why does it required the -U flag?

Testing if this works

done @aslonnie can you approve?

Signed-off-by: Akshay Malik <akshay@anyscale.com>

akshay-anyscale · 2024-05-17T00:33:47Z

Ready to merge. pending @aslonnie 's approval

aslonnie

via GIPHY

can-anyscale · 2024-05-17T13:22:34Z

lint is failing; i just added lint to microcheck yesterday ;), sorry for not catching this sooner

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes · 2024-05-17T13:30:58Z

pushed a commit to fix linter

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

can-anyscale · 2024-05-17T16:38:42Z

python/requirements/ml/core-requirements.txt

@@ -8,7 +8,7 @@ xgboost==1.7.6
 lightgbm==3.3.5

 # Huggingface
-transformers==4.36.2
+transformers==4.40.0


this won't work as it is unfortunately, you'll need to recompile the requirements_compiled.txt (step 4 in https://www.notion.so/anyscale-hq/OSS-Python-dependency-management-f32633b0018c423f927727807ea9da08)

does CI still do it and output the compiled version?

oh nice it seems it does

it is yess, it should be the artifact of this job https://buildkite.com/ray-project/premerge/builds/26290#018f8782-65e9-40ba-ad54-655f6c33f0bc

cool trying that

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

can-anyscale · 2024-05-17T17:11:10Z

python/requirements/ml/dl-gpu-requirements.txt

@@ -14,3 +14,6 @@ torch-scatter==2.1.1+pt20cu118
 torch-sparse==0.6.17+pt20cu118
 torch-cluster==1.6.1+pt20cu118
 torch-spline-conv==1.2.2+pt20cu118
+
+# Install vLLM for documentation example tests (doc/source/serve/doc_code).
+vllm==0.4.2 


the serve build doesn't install this requirement though so the test will likely fail; you might need to add here https://github.com/ray-project/ray/blob/master/ci/docker/serve.build.Dockerfile#L26

-r python/requirements/dl-cpu-requirements.txt

and add python/requirements/dl-cpu-requirements.txt to the source file in https://github.com/ray-project/ray/blob/master/ci/docker/serve.build.py39.wanda.yaml#L13

also perhaps add vllm==0.4.2 to dl-cpu-requirements.txt because dl-gpu-requirements.txt doesn't install on a cpu image (the serve.build.Dockerfile is based on ubuntu not cuda)

sorry about all of these complexities

we only need it for the GPU build, do I still need to add it?

ah yess, your docs run on a gpu build, ignore what i say :D

looks like the build failed w/ the new compiled requirements: https://buildkite.com/ray-project/premerge/builds/26291#018f879a-16e5-4a7b-b63a-06f2e79a2c9a/119-1330

it requires the latest version of torch as well, this vllm package is pretty picky ;)

kousun12 · 2024-05-18T15:20:13Z

Am I right in assuming that the prometheus metrics that vllm exports will just automatically get propagated to the ray serve metrics endpoint? (provided i use the log_stats when starting my vllm engine)

Adds a documentation example using vLLM to serve LLM models on Ray Serve. This is a copy of #45325 + add a build environment for ray serve + vllm. Test: - CI --------- Signed-off-by: can <can@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com> Signed-off-by: Cuong Nguyen <128072568+can-anyscale@users.noreply.github.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>

akshay-anyscale requested review from edoakes, shrekris-anyscale, zcin, GeneDer and a team as code owners May 14, 2024 07:02

akshay-anyscale force-pushed the vllm_example branch 4 times, most recently from 27032bd to c7f7dce Compare May 14, 2024 07:13

edoakes reviewed May 14, 2024

View reviewed changes

GeneDer reviewed May 14, 2024

View reviewed changes

doc/source/serve/tutorials/vllm-example.md Show resolved Hide resolved

doc/source/serve/doc_code/vllm_openai_example.py Show resolved Hide resolved

aslonnie reviewed May 14, 2024

View reviewed changes

doc/source/serve/tutorials/vllm-example.md Show resolved Hide resolved

edoakes reviewed May 14, 2024

View reviewed changes

doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved

akshay-anyscale force-pushed the vllm_example branch from c7f7dce to 68b4c8a Compare May 14, 2024 18:45

edoakes reviewed May 14, 2024

View reviewed changes

doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved

shrekris-anyscale reviewed May 14, 2024

View reviewed changes

shrekris-anyscale approved these changes May 14, 2024

View reviewed changes

doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved

doc/source/serve/tutorials/vllm-example.md Outdated Show resolved Hide resolved

GeneDer approved these changes May 15, 2024

View reviewed changes

edoakes approved these changes May 15, 2024

View reviewed changes

akshay-anyscale force-pushed the vllm_example branch 2 times, most recently from c7f7dce to 65667d9 Compare May 15, 2024 23:20

akshay-anyscale requested a review from a team as a code owner May 16, 2024 18:29

aslonnie reviewed May 16, 2024

View reviewed changes

akshay-anyscale force-pushed the vllm_example branch from bdacd67 to 279673c Compare May 16, 2024 22:21

Add example of serving LLM with Ray Serve and vLLM

1f72df5

Signed-off-by: Akshay Malik <akshay@anyscale.com>

akshay-anyscale force-pushed the vllm_example branch from 279673c to 1f72df5 Compare May 16, 2024 22:22

can-anyscale requested a review from aslonnie May 17, 2024 00:35

aslonnie approved these changes May 17, 2024

View reviewed changes

edoakes added the go Trigger full test run on premerge label May 17, 2024

Merge branch 'master' into vllm_example

710dabe

edoakes enabled auto-merge (squash) May 17, 2024 13:03

fix lint & nits

4f8e185

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

github-actions bot disabled auto-merge May 17, 2024 13:30

fix

65d421d

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes enabled auto-merge (squash) May 17, 2024 13:32

fix

16959fb

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

github-actions bot disabled auto-merge May 17, 2024 13:54

edoakes enabled auto-merge (squash) May 17, 2024 14:22

fix

33a58b0

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes requested review from amogkam, richardliaw and matthewdeng as code owners May 17, 2024 15:50

github-actions bot disabled auto-merge May 17, 2024 15:50

fix

9df209c

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

can-anyscale reviewed May 17, 2024

View reviewed changes

edoakes added 2 commits May 17, 2024 12:02

unpin to find a version

95f87ec

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

maybe hopefully

4a98513

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

can-anyscale reviewed May 17, 2024

View reviewed changes

can-anyscale mentioned this pull request May 18, 2024

[serve] vllm example to serve llm models #45430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example of serving LLM with Ray Serve and vLLM #45325

Add example of serving LLM with Ray Serve and vLLM #45325

akshay-anyscale commented May 14, 2024 •

edited by edoakes

edoakes left a comment

GeneDer left a comment

aslonnie May 16, 2024

aslonnie May 16, 2024

akshay-anyscale May 16, 2024

akshay-anyscale May 16, 2024

akshay-anyscale commented May 17, 2024

aslonnie left a comment

can-anyscale commented May 17, 2024

edoakes commented May 17, 2024

can-anyscale May 17, 2024

edoakes May 17, 2024

edoakes May 17, 2024

edoakes May 17, 2024

can-anyscale May 17, 2024

edoakes May 17, 2024

can-anyscale May 17, 2024

can-anyscale May 17, 2024

edoakes May 17, 2024

can-anyscale May 17, 2024

edoakes May 17, 2024

can-anyscale May 17, 2024

edoakes May 17, 2024

kousun12 commented May 18, 2024

Add example of serving LLM with Ray Serve and vLLM #45325

Are you sure you want to change the base?

Add example of serving LLM with Ray Serve and vLLM #45325

Conversation

akshay-anyscale commented May 14, 2024 • edited by edoakes

Why are these changes needed?

Related issue number

Checks

edoakes left a comment

Choose a reason for hiding this comment

GeneDer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akshay-anyscale commented May 17, 2024

aslonnie left a comment

Choose a reason for hiding this comment

can-anyscale commented May 17, 2024

edoakes commented May 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kousun12 commented May 18, 2024

akshay-anyscale commented May 14, 2024 •

edited by edoakes