Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of serving LLM with Ray Serve and vLLM #45325

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

akshay-anyscale
Copy link
Contributor

@akshay-anyscale akshay-anyscale commented May 14, 2024

Why are these changes needed?

Adds a documentation example using vLLM to serve LLM models on Ray Serve.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/tutorials/vllm-example.md Show resolved Hide resolved
doc/source/serve/tutorials/vllm-example.md Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_openai_example.py Outdated Show resolved Hide resolved
doc/source/serve/tutorials/vllm-example.md Outdated Show resolved Hide resolved
Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@akshay-anyscale akshay-anyscale force-pushed the vllm_example branch 2 times, most recently from c7f7dce to 65667d9 Compare May 15, 2024 23:20
@akshay-anyscale akshay-anyscale requested a review from a team as a code owner May 16, 2024 18:29
@@ -28,6 +28,9 @@ pip install -U -c python/requirements_compiled.txt \
tensorflow tensorflow-probability torch torchvision \
transformers aioboto3

# Add vllm for llm test
pip install -U vllm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pin a version please.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why does it required the -U flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing if this works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done @aslonnie can you approve?

Signed-off-by: Akshay Malik <akshay@anyscale.com>
@akshay-anyscale
Copy link
Contributor Author

Ready to merge. pending @aslonnie 's approval

Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


via GIPHY

@edoakes edoakes added the go Trigger full test run on premerge label May 17, 2024
@edoakes edoakes enabled auto-merge (squash) May 17, 2024 13:03
@can-anyscale
Copy link
Collaborator

lint is failing; i just added lint to microcheck yesterday ;), sorry for not catching this sooner

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@github-actions github-actions bot disabled auto-merge May 17, 2024 13:30
@edoakes
Copy link
Contributor

edoakes commented May 17, 2024

pushed a commit to fix linter

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes enabled auto-merge (squash) May 17, 2024 13:32
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@github-actions github-actions bot disabled auto-merge May 17, 2024 13:54
@edoakes edoakes enabled auto-merge (squash) May 17, 2024 14:22
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@@ -8,7 +8,7 @@ xgboost==1.7.6
lightgbm==3.3.5

# Huggingface
transformers==4.36.2
transformers==4.40.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work as it is unfortunately, you'll need to recompile the requirements_compiled.txt (step 4 in https://www.notion.so/anyscale-hq/OSS-Python-dependency-management-f32633b0018c423f927727807ea9da08)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so painful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does CI still do it and output the compiled version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice it seems it does

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool trying that

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@@ -14,3 +14,6 @@ torch-scatter==2.1.1+pt20cu118
torch-sparse==0.6.17+pt20cu118
torch-cluster==1.6.1+pt20cu118
torch-spline-conv==1.2.2+pt20cu118

# Install vLLM for documentation example tests (doc/source/serve/doc_code).
vllm==0.4.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the serve build doesn't install this requirement though so the test will likely fail; you might need to add here https://github.com/ray-project/ray/blob/master/ci/docker/serve.build.Dockerfile#L26

-r python/requirements/dl-cpu-requirements.txt

and add python/requirements/dl-cpu-requirements.txt to the source file in https://github.com/ray-project/ray/blob/master/ci/docker/serve.build.py39.wanda.yaml#L13

also perhaps add vllm==0.4.2 to dl-cpu-requirements.txt because dl-gpu-requirements.txt doesn't install on a cpu image (the serve.build.Dockerfile is based on ubuntu not cuda)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about all of these complexities

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need it for the GPU build, do I still need to add it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yess, your docs run on a gpu build, ignore what i say :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it requires the latest version of torch as well, this vllm package is pretty picky ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah :(

@kousun12
Copy link

Am I right in assuming that the prometheus metrics that vllm exports will just automatically get propagated to the ray serve metrics endpoint? (provided i use the log_stats when starting my vllm engine)

can-anyscale added a commit that referenced this pull request May 23, 2024
Adds a documentation example using vLLM to serve LLM models on Ray
Serve.

This is a copy of #45325 + add a
build environment for ray serve + vllm.

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com>
Signed-off-by: Cuong Nguyen <128072568+can-anyscale@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: akshay-anyscale <122416226+akshay-anyscale@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Trigger full test run on premerge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants