serving

Here are 104 public repositories matching this topic...

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jun 1, 2024
Python

deepjavalibrary / djl-serving

Star

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated May 31, 2024
Java

Lightning-AI / LitServe

Star

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.

api ai serving

Updated May 31, 2024
Python

pytorch / serve

Star

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Jun 1, 2024
Java

vespa-engine / vespa

Star

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated May 31, 2024
Java

tensorflow / serving

Star

A flexible, high-performance serving system for machine learning models

python machine-learning deep-neural-networks deep-learning neural-network cpp tensorflow ml serving

Updated May 31, 2024
C++

SeldonIO / seldon-core

Star

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

kubernetes machine-learning deployment serving aiops production-machine-learning mlops machine-learning-operations

Updated May 31, 2024
HTML

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

structured-data serving unstructured-data unified-sql vector-database mysql-compatibility embedding-search embedding-store key-value-distributed-store vector-ocean real-time-semantic-search

Updated May 31, 2024
Java

openvinotoolkit / model_server

Star

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated May 31, 2024
C++

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated May 31, 2024
C++

ray-project / ray-llm

Star

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

polyaxon / haupt

Star

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Updated May 28, 2024
Python

friendliai / friendli-client

Star

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated May 25, 2024
Python

torchpipe / torchpipe

Star

An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends