Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process text-to-image requested image count sequentially #66

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ad-astra-video
Copy link

Updated the text-to-image to sequentially process images if more than 1 requested.

This provides more stable on GPU memory usage and I believe inference is relatively linear for text-to-image models. If a user wants faster inference they can split the request to separate requests that would go to separate orchestrators.

This is a quick fix for issue #49.

Some questions:

  1. Should we set a sensible limit per request? Timeout seems to be 30s so 15 sounds like a good limit that would allow most graphics cards to respond within the timeout.
  2. Is there a check for if a set of images is requested but non-matching number of seeds is provided? For example, batch of 10 requested but only 5 seeds provided. I was not seeing where it would fill out the seeds to match the requested images count.

@ad-astra-video
Copy link
Author

Did some testing and when I get past 12 images, the first ones processed return an ErrNotFound from the broadcaster. 15 images returns 3 ErrNotFound, 13 returns 1 ErrNotFound. 20 returns 8 ErrNotFound.

Is there a limit of 12 somewhere? or is this just my setup?

@ad-astra-video
Copy link
Author

ad-astra-video commented Apr 21, 2024

Did some testing and when I get past 12 images, the first ones processed return an ErrNotFound from the broadcaster. 15 images returns 3 ErrNotFound, 13 returns 1 ErrNotFound. 20 returns 8 ErrNotFound.

Is there a limit of 12 somewhere? or is this just my setup?

I think I found it, the MemoryDriver in go-tools has a cache length of 12.
Node uses a session based cache setup here https://github.com/livepeer/go-livepeer/blob/930533388e62d21b339cccd8d6f4348dc14c5e75/cmd/livepeer/starter/starter.go#L1170.

Memory driver is in livepeer go-tools repo. https://github.com/livepeer/go-tools/blob/ac33a3c30a694a743f0077630aaf62423b5fd1f9/drivers/local.go#L17

@eliteprox
Copy link
Contributor

1. Should we set a sensible limit per request?  Timeout seems to be 30s so 15 sounds like a good limit that would allow most graphics cards to respond within the timeout.

I think we should set a reasonable limit for the number of files sent to a given orchestrator and send remainder to the next orchestrator in the pool if one is available.

2. Is there a check for if a set of images is requested but non-matching number of seeds is provided?  For example, batch of 10 requested but only 5 seeds provided.  I was not seeing where it would fill out the seeds to match the requested images count.

I'll have to debug/learn the multi-file workflow first, but this line should be generating a seed when it's not provided https://github.com/livepeer/ai-worker/pull/66/files#diff-2952f66b536acb78e9bb5ee0337a00485762b802438a3209508b3b0ee088212dR62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants