Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queue per GPU to ensure sequential inference requests #18

Open
yondonfu opened this issue Jan 29, 2024 · 2 comments
Open

Add queue per GPU to ensure sequential inference requests #18

yondonfu opened this issue Jan 29, 2024 · 2 comments

Comments

@yondonfu
Copy link
Member

No description provided.

@ad-astra-video
Copy link

I like this! Most easily accessible GPUs likely have a 1 job at a time limit. Possibly a 1 job queue per gpu? Could this tie into the worker callback saying what step is completed so the O could know its say 75% done with the current job? The O could then allocate to the GPU closest to being done I think.

@rickstaa
Copy link
Contributor

rickstaa commented May 8, 2024

Tracked internally in https://linear.app/livepeer-ai-spe/issue/LIV-318/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants