New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image api #961
base: main
Are you sure you want to change the base?
image api #961
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #961 +/- ##
===================================================
+ Coverage 52.08280% 52.27358% +0.19078%
===================================================
Files 70 71 +1
Lines 7826 7983 +157
===================================================
+ Hits 4076 4173 +97
- Misses 3421 3464 +43
- Partials 329 346 +17
Continue to review full report in Codecov by Sentry.
|
ffb1cac
to
fa7e07d
Compare
@thomshutt there are a couple more TODOs but what do you think? |
2596867
to
4e5f3e9
Compare
api/http.go
Outdated
@@ -78,6 +78,12 @@ func NewCatalystAPIRouter(cli config.Cli, vodEngine *pipeline.Coordinator, bal b | |||
for _, path := range [...]string{"/asset/hls/:playbackID/*file", "/webrtc/:playbackID"} { | |||
router.OPTIONS(path, playback) | |||
} | |||
image := middleware.LogAndMetrics(metrics.Metrics.ImageRequestDurationSec)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to be used directly from the public API or via Studio? Isn't it a risk that someone will access this endpoint directly without any authentication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intended to be direct access yes, for the private bucket i would add in access control but since this is just the public bucket at the moment then you can access the thumbnails in the same way you can access the videos without any auth. @thomshutt what do you think?
handlers/image.go
Outdated
return strconv.ParseInt(val, 10, 32) | ||
} | ||
|
||
func (p *ImageHandler) handle(w http.ResponseWriter, playbackID string, time float64, width int64, height int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you run it as a sync call. How much time does the execution of this take? I see quite some logic here and running external command ffmpeg
, which make me think that maybe we should consider async execution here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen it take more than a second or so, I think we should most likely use a CDN in front of it to speed up repeated accesses but initially i think we should just monitor the request rate we are seeing and what the response times are like. I'll add some metrics for different stages of processing so we can see where the time is being taken.
I wouldn't want to make this async as the idea is to provide customers a simple way to dynamically request a thumbnail for any point in a video they want.
Also need to add some rate limiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there are two tricky parts here:
- We're making external calls and executing external processes in the sync request, which is unreliable.
- We're exposing the API directly from Catalyst API for the use of customers. I think that any API exposed to the customers should begin in studio (with proper doc, protected by API Token).
Looping also @thomshutt and @victorges since it opens a broader discussion of the approach we take to our APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I worry about the no-cache full granularity expensive query as well, especially if this will be used by frontend applications to display the thumbnails of dozens of videos at the same time on a home page or something.
Some ideas from the top of my mind that could improve a bit:
- add some simple caching layer on disk or even memory for a couple thousand thumbnails
- I guess longer term we want to save them on OS? is that even that more expensive than the caching logic? if not maybe worth saving them thumbs in the OS way from the start (with some automatic expiration time later)
- if not already done, we can force some maximum granularity on the time point. maybe mod it on
1s
or10s
so users can't set it too arbitrarily and miss the cache all the time - add some simple rate limiting to this API. this can be a basic
atom
memory counter of the number of concurrent thumbnails we generate at once per node, just to make sure it doesn't consume node resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, we should really have some intra-cluster caching on NGINX and we could avoid re-implementing HTTP caches everywhere D:
A new API to allow you to extract an image from any point in a video. Currently just for the public bucket so I don't need to worry about auth, I'll save that for another PR.