Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image api #961

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

image api #961

wants to merge 3 commits into from

Conversation

mjh1
Copy link
Contributor

@mjh1 mjh1 commented Nov 8, 2023

A new API to allow you to extract an image from any point in a video. Currently just for the public bucket so I don't need to worry about auth, I'll save that for another PR.

Copy link

codecov bot commented Nov 8, 2023

Codecov Report

Attention: 60 lines in your changes are missing coverage. Please review.

Comparison is base (a07a67b) 52.08280% compared to head (273e311) 52.27358%.

Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##                main        #961         +/-   ##
===================================================
+ Coverage   52.08280%   52.27358%   +0.19078%     
===================================================
  Files             70          71          +1     
  Lines           7826        7983        +157     
===================================================
+ Hits            4076        4173         +97     
- Misses          3421        3464         +43     
- Partials         329         346         +17     
Files Coverage Δ
config/cli.go 57.05521% <ø> (ø)
metrics/metrics.go 100.00000% <100.00000%> (ø)
handlers/image.go 59.45946% <59.45946%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a07a67b...273e311. Read the comment docs.

Files Coverage Δ
config/cli.go 57.05521% <ø> (ø)
metrics/metrics.go 100.00000% <100.00000%> (ø)
handlers/image.go 59.45946% <59.45946%> (ø)

@mjh1
Copy link
Contributor Author

mjh1 commented Feb 14, 2024

@thomshutt there are a couple more TODOs but what do you think?

@mjh1 mjh1 force-pushed the mh/image-api branch 2 times, most recently from 2596867 to 4e5f3e9 Compare February 15, 2024 18:14
@mjh1 mjh1 marked this pull request as ready for review February 15, 2024 18:15
api/http.go Outdated
@@ -78,6 +78,12 @@ func NewCatalystAPIRouter(cli config.Cli, vodEngine *pipeline.Coordinator, bal b
for _, path := range [...]string{"/asset/hls/:playbackID/*file", "/webrtc/:playbackID"} {
router.OPTIONS(path, playback)
}
image := middleware.LogAndMetrics(metrics.Metrics.ImageRequestDurationSec)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to be used directly from the public API or via Studio? Isn't it a risk that someone will access this endpoint directly without any authentication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended to be direct access yes, for the private bucket i would add in access control but since this is just the public bucket at the moment then you can access the thumbnails in the same way you can access the videos without any auth. @thomshutt what do you think?

return strconv.ParseInt(val, 10, 32)
}

func (p *ImageHandler) handle(w http.ResponseWriter, playbackID string, time float64, width int64, height int64) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you run it as a sync call. How much time does the execution of this take? I see quite some logic here and running external command ffmpeg, which make me think that maybe we should consider async execution here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen it take more than a second or so, I think we should most likely use a CDN in front of it to speed up repeated accesses but initially i think we should just monitor the request rate we are seeing and what the response times are like. I'll add some metrics for different stages of processing so we can see where the time is being taken.

I wouldn't want to make this async as the idea is to provide customers a simple way to dynamically request a thumbnail for any point in a video they want.
Also need to add some rate limiting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there are two tricky parts here:

  1. We're making external calls and executing external processes in the sync request, which is unreliable.
  2. We're exposing the API directly from Catalyst API for the use of customers. I think that any API exposed to the customers should begin in studio (with proper doc, protected by API Token).

Looping also @thomshutt and @victorges since it opens a broader discussion of the approach we take to our APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I worry about the no-cache full granularity expensive query as well, especially if this will be used by frontend applications to display the thumbnails of dozens of videos at the same time on a home page or something.

Some ideas from the top of my mind that could improve a bit:

  • add some simple caching layer on disk or even memory for a couple thousand thumbnails
    • I guess longer term we want to save them on OS? is that even that more expensive than the caching logic? if not maybe worth saving them thumbs in the OS way from the start (with some automatic expiration time later)
  • if not already done, we can force some maximum granularity on the time point. maybe mod it on 1s or 10s so users can't set it too arbitrarily and miss the cache all the time
  • add some simple rate limiting to this API. this can be a basic atom memory counter of the number of concurrent thumbnails we generate at once per node, just to make sure it doesn't consume node resources

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, we should really have some intra-cluster caching on NGINX and we could avoid re-implementing HTTP caches everywhere D:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants