triton-inference-server

Here are 73 public repositories matching this topic...

NVIDIA / GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

microservice gpu-acceleration nemo tensorrt rag triton-inference-server large-language-models llm llm-inference retrieval-augmented-generation

Updated Jun 4, 2024
Python

CoinCheung / BiSeNet

Star

Add bisenetv2. My implementation of BiSeNet

pytorch cityscapes tensorrt ncnn ade20k cocostuff openvino bisenet triton-inference-server

Updated Feb 5, 2023
Python

isarsoft / yolov4-triton-tensorrt

Star

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

docker deep-learning object-detection tensorrt yolov4 triton-inference-server yolov4-tiny

Updated Jun 2, 2022
C++

An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server ray-serve cvcuda

Updated Jun 5, 2024
C++

allegroai / clearml-serving

Star

ClearML - Model-Serving Orchestration and Repository Solution

kubernetes devops machine-learning ai deep-learning triton tensorflow-serving model-serving serving mlops serving-pytorch-models triton-inference-server clearml serving-ml

Updated May 29, 2024
Python

kamalkraj / stable-diffusion-tritonserver

Star

Deploy stable diffusion model with onnx/tenorrt + tritonserver

docker machine-learning deploy transformers inference python3 pytorch nvidia fp16 tensorrt onnx triton-inference-server tensorrt-inference stablediffusion

Updated Aug 15, 2023
Jupyter Notebook

triton-inference-server / onnxruntime_backend

Star

The Triton backend for the ONNX Runtime.

backend inference triton-inference-server onnx-runtime

Updated Jun 4, 2024
C++

NVIDIA-ISAAC-ROS / isaac_ros_dnn_inference

Star

NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

ai deep-learning gpu dnn ros nvidia triton deeplearning tao jetson ros2 tensorrt triton-inference-server tensorrt-inference ros2-humble

Updated May 31, 2024
C++

npuichigo / openai_trtllm

Star

OpenAI compatible API for TensorRT LLM triton backend

triton-inference-server openai-api llm langchain tensorrt-llm

Updated Apr 26, 2024
Rust

notAI-tech / fastDeploy

Star

Deploy DL/ ML inference pipelines with minimal extra code.

Updated Apr 23, 2024
Python

chiehpower / Setup-deeplearning-tools

Star

Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.

Updated Sep 27, 2023
Python

bug-developer021 / YOLOV5_optimization_on_triton

Star

Compare multiple optimization methods on triton to imporve model service performance

gpu inference tensorrt triton-inference-server yolov5

Updated Jan 10, 2024
Jupyter Notebook

rtzr / tritony

Star

Tiny configuration for Triton Inference Server

inference mlops triton-inference-server tritonclient

Updated May 24, 2024
Python

akiragy / recsys_pipeline

Star

Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.

python redis flask elasticsearch retrieval pytorch ranking inverted-index recommender-system recommendation feast vector-database triton-inference-server

Updated Sep 2, 2023
Python

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Star

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX