A General-purpose Parallel and Heterogeneous Task Programming System
-
Updated
May 19, 2024 - C++
A General-purpose Parallel and Heterogeneous Task Programming System
Sample codes for my CUDA programming book
CUDA C++ Core Libraries
Thin, unified, C++-flavored wrappers for the CUDA APIs
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Safe rust wrapper around CUDA toolkit
TinyChatEngine: On-Device LLM Inference Library
A simple GPU hash table implemented in CUDA using lock free techniques
An implementation of HIP that works on CPUs, across OSes.
A self-learning tutorail for CUDA High Performance Programing.
SPPU BE COMP Codes of LP1 - HPC, AIR, and DA
HTML/JS port of CUDA Occupancy Calculator
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
CUDA kernel author's tools
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."