A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 2, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
TensorRT C++ API Tutorial
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Template designed to kickstart your machine learning training projects in Python
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Substrate Python SDK
Substrate TypeScript SDK
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Large Language Model Text Generation Inference
A universal scalable machine learning model deployment solution
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Search, Knowledge, Uncertainty, Optimization, Learning, Neural Networks and Language.
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."