Design communication protocol for long running requests (i.e. image-to-video) #20

yondonfu · 2024-01-29T20:33:57Z

How will a caller (i.e. a B) know whether to switch Os for long running requests (i.e. > 30s)? Perhaps we can breakdown the job into smaller pieces to make switching/failover easier.

ad-astra-video · 2024-01-30T19:47:42Z

The diffusers pipeline has a callback per step I believe that can be used to send and notification back to the B. May also be good to gauge hardware being used with a timestamp to gauge time per step.

This could be faked by an O potentially but that seems like a lot of work and would only net a short term gain before the B realizes that O is faking.

yondonfu · 2024-02-16T15:15:18Z

@ad-astra-video Good suggestion!

It might make sense for B to expect an update from O every N (i.e. 1, 10, etc.) steps. Then, B can keep track of the steps/sec of Os while requests are being executed as a metric to evaluate O performance. A few related possibilities:

O could expect to receive a payment for every N steps. I think (should be tested) the computational cost for each step is proportional to the output resolution so given a specific price per pixel for a model, the total fee for a request could be calculated as something like price per pixel * output height * output width * # steps and the payment amount for every N steps would be price per pixel * output height * output width * N steps.
The callback per step could be used to add support for interruptions to an in-progress request if O knows that the result is no longer needed.
The callback per step could be used to send intermediate images (before all steps are complete) to B as a part of an update which could be useful for presenting intermediate images to end users before all steps are complete. This would require O to use a callback per step to get intermediate latents from the pipeline and then use the pipeline's VAE to decode the latents([1][2]) into images for each callback. I also wonder if it is possible for the intermediate image to be used as an input to a new diffusion request with a new O as a way to "resume" diffusion on another O?

A more general question I have is how well this step based framework can generalize for non-diffusion models? For example, does the above make sense for inference with models for upscaling, frame interpolation, etc. or is it diffusion specific?

[1] https://discuss.huggingface.co/t/how-to-get-intermeidate-output-images/29144
[2] https://github.com/huggingface/diffusers/blob/777063e1bfda024e7dfc3a9ba2acb20552aec6bc/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1090

papabear99 · 2024-02-16T17:38:35Z

O could expect to receive a payment for every N steps. I think (should be tested) the computational cost for each step is proportional to the output resolution so given a specific price per pixel for a model, the total fee for a request could be calculated as something like price per pixel * output height * output width * # steps and the payment amount for every N steps would be price per pixel * output height * output width * N steps

So each model would have a different ppp? Will each model have a fixed workflow? From my limited experience using comfyui, I've observed significantly different processing times as the workflows gets more complex even though the number of steps remained the same.

If the workflow per model is static I think the proposed payment formula will work, if the workflows can vary using the same model then I think we need to think about another approach that takes into account the actual compute used to perform a task.

yondonfu · 2024-02-16T20:08:08Z

FWIW just to clarify the notes in my previous post were not proposals per se, but rather ideas for iteration and discussion.

So each model would have a different ppp?

In the scenario described, yes that is how it could work. Not the most ideal from a UX POV, but it could be a start.

Will each model have a fixed workflow?

There will need to be parameters that can be adjusted by a user i.e. resolution, seed and possibly things like the scheduler.

I think we need to think about another approach that takes into account the actual compute used to perform a task.

Generally agree that the pricing should align with the compute used though I think it could be reasonable to think about how to start with an imperfect rough approximation that at least mostly captures the bulk of the compute costs incurred even if certain parts are not perfectly metered and even restricting adjustment of certain parameters if needed.

Moving this topic into its own thread though here #28.

rickstaa · 2024-05-08T11:59:41Z

Tracked internally in https://linear.app/livepeer-ai-spe/issue/LIV-13.

yondonfu mentioned this issue Feb 16, 2024

Design initial pricing #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design communication protocol for long running requests (i.e. image-to-video) #20

Design communication protocol for long running requests (i.e. image-to-video) #20

yondonfu commented Jan 29, 2024

ad-astra-video commented Jan 30, 2024

yondonfu commented Feb 16, 2024 •

edited

papabear99 commented Feb 16, 2024

yondonfu commented Feb 16, 2024

rickstaa commented May 8, 2024

Design communication protocol for long running requests (i.e. image-to-video) #20

Design communication protocol for long running requests (i.e. image-to-video) #20

Comments

yondonfu commented Jan 29, 2024

ad-astra-video commented Jan 30, 2024

yondonfu commented Feb 16, 2024 • edited

papabear99 commented Feb 16, 2024

yondonfu commented Feb 16, 2024

rickstaa commented May 8, 2024

yondonfu commented Feb 16, 2024 •

edited