Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

rickstaa · 2024-03-14T10:07:11Z

We're exploring various optimizations available in the Diffusers library to enhance VRAM usage and inference speed. @Titan-Node is currently benchmarking these optimizations, using his ai-benchmark wrapper, across his GPU pool and the Livepeer network to evaluate their effectiveness. Preliminary results are documented in this community spreadsheet via the ai-benchmarking wrapper.

Objective

The goal is to identify and implement the most impactful optimizations for improving the performance of AI models, focusing on inference speed and efficient VRAM usage while also keeping an eye on the quality of the results.

Current Optimizations

The following optimizations are already integrated into our codebase:

Half Precision: Utilizing half-precision weights was supported to enhance inference speed and reduce memory consumption, implemented in ai-worker/image_to_video pipeline.
SFAST (xformers & Triton): Adopted from stable-fast, currently speeds up inference and may reduce memory usage in the future. See implementation in ai-worker/sfast pipeline.

Future Explorations

CPU Offloading: @Titan-Node is currently investigating the potential to decrease memory usage by (sequential) CPU offloading certain computations to the CPU, as described in CPU offloading optimization.

Links and Resources

rickstaa · 2024-03-14T10:12:18Z

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Titan-Node · 2024-03-14T18:51:54Z

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

rickstaa · 2024-03-14T21:36:36Z

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

Thanks for the update. According to the docs the effects should be minimal.

rickstaa · 2024-05-08T12:05:29Z

Tracked internally at https://linear.app/livepeer-ai-spe/issue/LIV-321.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

rickstaa commented Mar 14, 2024 •

edited

rickstaa commented Mar 14, 2024 •

edited

Titan-Node commented Mar 14, 2024

rickstaa commented Mar 14, 2024 •

edited

rickstaa commented May 8, 2024

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

Comments

rickstaa commented Mar 14, 2024 • edited

Objective

Current Optimizations

Future Explorations

Links and Resources

rickstaa commented Mar 14, 2024 • edited

Titan-Node commented Mar 14, 2024

rickstaa commented Mar 14, 2024 • edited

rickstaa commented May 8, 2024

rickstaa commented Mar 14, 2024 •

edited

rickstaa commented Mar 14, 2024 •

edited

rickstaa commented Mar 14, 2024 •

edited