#

quantization

Here are 570 public repositories matching this topic...

intel / auto-round

SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

rounding quantization awq int4 gptq neural-compressor weight-only

Updated May 29, 2024
Python

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated May 28, 2024
Python

LLaMA-Factory

hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Updated May 28, 2024
Python

onnx2tf

PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

android docker machine-learning deep-learning tensorflow models keras transformer lstm quantization coreml onnx model-converter tensorflow-lite tflite tfjs yolov7 onnx-tensorflow

Updated May 28, 2024
Python

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated May 28, 2024
Python

huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated May 28, 2024
Jupyter Notebook

Xilinx / finn

Dataflow compiler for QNN inference on FPGAs

fpga neural-network compiler dataflow quantization

Updated May 28, 2024
Python

openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert hawq onnx openvino mmdetection mixed-precision-training quantization-aware-training

Updated May 29, 2024
Python

sony / model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

machine-learning deep-neural-networks deep-learning neural-network tensorflow optimizer pytorch quantization qat network-quantization network-compression edge-ai ptq

Updated May 28, 2024
Python

mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

machine-learning quantization llm

Updated May 28, 2024
Python

OpenNMT / CTranslate2

Fast inference engine for Transformer models

Updated May 28, 2024
C++

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated May 29, 2024
Python

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated May 28, 2024
Python

openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

machine-learning computer-vision deep-learning pytorch semi-supervised-learning image-classification object-detection transfer-learning image-segmentation quantization action-recognition automl incremental-learning anomaly-detection hyper-parameter-optimization self-supervised-learning openvino neural-networks-compression datumaro

Updated May 29, 2024
Python

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

c arm deep-learning cpp x86-64 quantization edge-computing cuda-programming on-device-ai large-language-models

Updated May 28, 2024
C++

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

neural-networks awesome-list papers quantization model-compression edge-computing efficient-inference diffusion-models large-language-models

Updated May 28, 2024

Xilinx / brevitas

Brevitas: neural network quantization in PyTorch

fpga deep-learning pytorch neural-networks xilinx quantization hardware-acceleration qat brevitas ptq

Updated May 28, 2024
Python

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated May 27, 2024
Python

cubicibo / piliq

Lightweight Python PIL-libimagequant/pngquant interface with autonomous lib look-up.

python quantization pngquant libimagequant palettize

Updated May 27, 2024
Python

dzutrinh / Color-Reduction

Extremely fast color quantization. Reduce color information of a 24-bit RGB bitmap down to 8-bit.

dos dithering uniform msdos quantization ordered djgpp colorquantization

Updated May 26, 2024
C

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."