Generative AI Inference Examples on Amazon SageMaker

This repository contains a compilation of examples of optimized deployment of popular Large Language Models (LLMs) utilizing SageMaker Inference. Hosting LLMs comes with a variety of challenges due to the size of the model, inefficient usage of hardware, and scaling LLMs into a production like environment with multiple concurrent users.

SageMaker Inference is a highly performant and versatile hosting option that comes with a variety of options that you can utilize to efficiently host your LLMs. In this repository we showcase how you can take different SageMaker Inference options such as Real-Time Inference (low latency, high throughput use-cases) and Asynchronous Inference (near real-time/batch use-cases) and integrate with Model Servers such as DJL Serving and Text Generation Inference. We showcase how you can tune for performance via optimizing these different Model Serving stacks and also exploring hardware options such as Inferentia2 integration with Amazon SageMaker.

Content

If you are contributing, please add a link to your model below:

Additional Resources

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
CodeLlama		CodeLlama
Codegen25		Codegen25
Falcon		Falcon
FlanT5		FlanT5
Llama2		Llama2
LoRA-Adapters-IC		LoRA-Adapters-IC
Mistral		Mistral
Mixtral/Mixtral-8x7b/LMI		Mixtral/Mixtral-8x7b/LMI
Open-Llama/LMI		Open-Llama/LMI
Zephyr/Zephyr-7B		Zephyr/Zephyr-7B
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

aws-samples/sagemaker-genai-hosting-examples

Folders and files

Latest commit

History

Repository files navigation

Generative AI Inference Examples on Amazon SageMaker

Content

Additional Resources

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages