tensorrt-llm

A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

benchmark pytorch openvino onnxruntime text-generation-inference neural-compressor tensorrt-llm

Updated May 29, 2024
Python

collabora / WhisperLive

Star

A nearly-live implementation of OpenAI's Whisper.

text-to-speech translation voice-recognition openai obs dictation whisper tensorrt tensorrt-llm whisper-tensorrt

Updated May 29, 2024
Python

DefTruth / Awesome-LLM-Inference

Star

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated May 27, 2024

janhq / cortex

Star

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

ai cuda llama accelerated inference-engine openai-api llm stable-diffusion llms llamacpp llama2 gguf tensorrt-llm

Updated May 29, 2024
C++

Improve this page

Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorrt-llm

Here are 15 public repositories matching this topic...

cyanff / nyxt

lix19937 / llm-deploy

CactusQ / TensorRT-LLM-Tutorial

fgblanch / OutlookLLM

zRzRzRzRzRzRzR / lm-fly

EdVince / whisper-trtllm

janhq / cortex.tensorrtllm

openhackathons-org / End-to-End-LLM

rpehkone / Chat-With-RTX-python-api

npuichigo / openai_trtllm

shashikg / WhisperS2T

huggingface / optimum-benchmark

collabora / WhisperLive

DefTruth / Awesome-LLM-Inference

janhq / cortex

Improve this page

Add this topic to your repo