-
Updated
Feb 23, 2024 - TypeScript
tensorrt-llm
Here are 15 public repositories matching this topic...
Getting started with TensorRT-LLM using BLOOM as a case study
-
Updated
Mar 7, 2024 - Jupyter Notebook
Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM
-
Updated
Feb 24, 2024 - Python
大模型推理框架加速,让 LLM 飞起来
-
Updated
May 10, 2024 - Python
Whisper in TensorRT-LLM
-
Updated
Sep 21, 2023 - C++
Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan
-
Updated
May 29, 2024 - C++
This repository is an AI Bootcamp material that consist of a workflow for LLM
-
Updated
May 21, 2024 - Jupyter Notebook
Chat With RTX Python API
-
Updated
May 19, 2024 - Python
OpenAI compatible API for TensorRT LLM triton backend
-
Updated
Apr 26, 2024 - Rust
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
-
Updated
Apr 5, 2024 - Jupyter Notebook
A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
-
Updated
May 29, 2024 - Python
A nearly-live implementation of OpenAI's Whisper.
-
Updated
May 29, 2024 - Python
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
-
Updated
May 27, 2024
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
-
Updated
May 29, 2024 - C++
Improve this page
Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."