Skip to content

aws-samples/sagemaker-genai-hosting-examples

Generative AI Inference Examples on Amazon SageMaker

This repository contains a compilation of examples of optimized deployment of popular Large Language Models (LLMs) utilizing SageMaker Inference. Hosting LLMs comes with a variety of challenges due to the size of the model, inefficient usage of hardware, and scaling LLMs into a production like environment with multiple concurrent users.

SageMaker Inference is a highly performant and versatile hosting option that comes with a variety of options that you can utilize to efficiently host your LLMs. In this repository we showcase how you can take different SageMaker Inference options such as Real-Time Inference (low latency, high throughput use-cases) and Asynchronous Inference (near real-time/batch use-cases) and integrate with Model Servers such as DJL Serving and Text Generation Inference. We showcase how you can tune for performance via optimizing these different Model Serving stacks and also exploring hardware options such as Inferentia2 integration with Amazon SageMaker.

Content

If you are contributing, please add a link to your model below:

Additional Resources

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages