Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing VRAM Usage and Inference Speed with Diffusers Optimizations #38

Open
rickstaa opened this issue Mar 14, 2024 · 4 comments
Open

Comments

@rickstaa
Copy link
Contributor

rickstaa commented Mar 14, 2024

We're exploring various optimizations available in the Diffusers library to enhance VRAM usage and inference speed. @Titan-Node is currently benchmarking these optimizations, using his ai-benchmark wrapper, across his GPU pool and the Livepeer network to evaluate their effectiveness. Preliminary results are documented in this community spreadsheet via the ai-benchmarking wrapper.

Objective

The goal is to identify and implement the most impactful optimizations for improving the performance of AI models, focusing on inference speed and efficient VRAM usage while also keeping an eye on the quality of the results.

Current Optimizations

The following optimizations are already integrated into our codebase:

  • Half Precision: Utilizing half-precision weights was supported to enhance inference speed and reduce memory consumption, implemented in ai-worker/image_to_video pipeline.
  • SFAST (xformers & Triton): Adopted from stable-fast, currently speeds up inference and may reduce memory usage in the future. See implementation in ai-worker/sfast pipeline.

Future Explorations

Links and Resources

@rickstaa
Copy link
Contributor Author

rickstaa commented Mar 14, 2024

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

@Titan-Node
Copy link

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

@rickstaa
Copy link
Contributor Author

rickstaa commented Mar 14, 2024

@Titan-Node did you also experiment with model offloading which leads to less memory reduction but higher performance?

Yes I tried model offloading with no effect on RAM or speed

Thanks for the update. According to the docs the effects should be minimal.

@rickstaa
Copy link
Contributor Author

rickstaa commented May 8, 2024

Tracked internally at https://linear.app/livepeer-ai-spe/issue/LIV-321.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants