Add crude rate limiter #3283

dureuill · 2022-12-29T09:57:13Z

Pull Request

Related issue

Related to #3126 (rate limiting)

What does this PR do?

Add 3 rate limiters to the search route:
- Global rate limiter that limits the total number of search requests regardless of their origin
- IP rate limiter that limits the number of search requests per IP address of origin
- API key rate limiter that limits the number of search requests per API key used in the request
Add a new set of --rate-limiting-* options to configure the rate limiters (enabled? pool size? cooldown duration?)

Notes on the implementation

Actix Governor does not behave as expected: for example with cargo run --release -- --rate-limiting-api-key-pool 1 --rate-limiting-api-key-cooldown-ns 50000000000 I would expect that any request past the first one would be blocked for 50s, but this is not what happens: even at startup I'm allowed to send up to 8 requests, and then I have to wait for about 45s... Not sure where the fault lies here.
For API keys, all requests without an API key are put in the same rate-limiting bucket. It would be better to disable the rate limiter when there is no API key but actix governor does not give this level of control. We could maybe imply --rate-limiting-disable-api-key when no Master Key was provided 🤔 .
The cooldown is in ns, which is not very practical but might be necessary to allow maximum control. Alternatives involve somehow parsing a "human-readable" duration to allow for a --rate-limiting-cooldown 4s, which the std doesn't allow.

PR checklist

Please check if your PR fulfills the following requirements:

Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
Have you read the contributing guidelines?
Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

irevoire · 2023-01-02T12:21:45Z

API key rate limiter that limits the number of search requests per API key used in the request

I'm not sure I see the point of this limitation?
In which case should we use it?

Global rate limiter that limits the total number of search requests regardless of their origin

IP rate limiter that limits the number of search requests per IP address of origin

Personally, I think 1️⃣ is a bad idea and should be defined by meilisearch itself considering your number of cores / RAM and the number of searches currently running.
I don't see the point of limiting your meilisearch instance globally, and also, if the global rate limiter <= IP rate limiter, then one user can completely DOS your meilisearch instance by using all the available spots.

On the other hand, I think 2️⃣ is great, but if set too low could bring a horrendous experience, like getting results while you type and then getting locked at the end or before the end of your word and thus getting zero results once you gave the most complete query.
That's terrible for a search-as-you-type engine 😔
I think it's a good idea to have it, but we should be clear that its value should be quite high because 50ms could be dramatic with the current engine.
Once we implement the shortcut to return faster results, if we can ensure that we're always going to answer in less than 50ms then it'll become even more useful I think.

dureuill · 2023-01-03T09:15:40Z

[about API key limitation]
I'm not sure I see the point of this limitation?
In which case should we use it?

Limiting API key usage is useful in a scheme where a different API key is distributed to each user. This allows authenticating a user with more precisions than the IP (I'll develop this point further in a minute).

[About global rate limiter]
Personally, I think 1️⃣ is a bad idea and should be defined by meilisearch itself considering your number of cores / RAM and the number of searches currently running.
I don't see the point of limiting your meilisearch instance globally, and also, if the global rate limiter <= IP rate limiter, then one user can completely DOS your meilisearch instance by using all the available spots.

I think it is possible that we actually agree here.

I don't see the point of limiting your meilisearch instance globally

The point is to control what happens when the instance is overloaded in a more graceful manner. When there is no global rate limiting, an instance overload will result in an OOM kill, or being limited in CPU and answering very slowly to requests. In legitimate bursts of usage, I think it might lead lower maintenance load to outright reject a small number of requests than to have the instance crash and starts rejecting all requests.

should be defined by meilisearch itself considering your number of cores / RAM and the number of searches currently running.

Ideally, it should be the case, and maybe we can set the defaults according to these metrics, but in general, it only works when the computing resources are dedicated to Meilisearch which is not necessarily the case, so a manual level of control can be a welcome escape hatch. Also, I don't personally know how to convert CPU/RAM resources to the correct maximum number of "in flight" requests.

if the global rate limiter <= IP rate limiter

absolutely, maybe we should error or warn in this case?

On the other hand, I think [IP rate limiting] is great

IP rate limiting has very serious limitations, some of which I think I stated in my rate limiting survey. To restate in a more condensed form:

Main issue is when the instance is behind a reverse proxy of some form: it will only see the IP of the proxy and so become a de facto global limiter. There are workaround (not implemented here) based on the forwarded from header, but they require the reverse proxy to rewrite this header as the end user might "lie" on the header in their request.
In a DoS attack using Tor, even a single attacker will appear as many IPs (each per Tor endpoint used in the attack), defeating the purpose of IP rate limiting. zulip for instance provides a mitigation by downloading a list of Tor endpoint and handling them as a single IP. We might consider this mitigation if needs be.

Hence the motivation to provide an alternative mechanism such as the API key rate limiter. If we can think of other, more convenient mechanisms, I think it would be good too.

I think it's a good idea to have it, but we should be clear that its value should be quite high because 50ms could be dramatic with the current engine.

I'm not sure I understand that sentence, sorry. To give some context, with the rate limiter as implemented, each IP starts with a pool of 200 requests, and that pool is replenished by 1 request every 50ms. This appears sufficient to me to address the "interactive usage" use case (maybe not if the user just presses "aaaaaaaaaaaaaaaaa" in the form, but I guess in that case, we probably don't want to send that many requests?)

Kerollmops

I like what you've done in this PR.

About the parameters that can't be higher than others, why can't we clamp them to the lowest number and always log the retained values instead of warning the user in the log? Not sure if we prefer to have a silent behavior 🤔

About supporting human readable durations, isn't there an equivalent to byte-unit on crates.io? Something able to parse: 5h30m20s, 5M4d30h. Also, I don't know if being able to specify nanosecond-precise duration is important. Milliseconds or even simple seconds would be enough.

Kerollmops · 2023-01-03T12:41:37Z

Cargo.lock

+name = "governor"
+version = "0.4.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "19775995ee20209163239355bc3ad2f33f83da35d9ef72dea26e5af753552c87"
+dependencies = [
+ "dashmap",
+ "futures",
+ "futures-timer",
+ "no-std-compat",
+ "nonzero_ext",
+ "parking_lot",
+ "quanta",
+ "rand",
+ "smallvec",
+]


Can we reduce the number of dependencies that governor brings, here?

I couldn't find a way to disable the features of governor via actix-governor (that uses governor's default features). It is also unclear to me if we can disable them at all: the guide for governor mentions that it uses dashmap by default in the "std mode" but it is unclear what to do to change this and the impact it might have.

Kerollmops · 2023-01-03T12:48:05Z

meilisearch/src/lib.rs

@@ -384,6 +390,105 @@ pub fn configure_data(
        );
 }

+#[derive(Clone, Copy)]
+pub struct ApiKeyExtractor;


Would it be possible to document the new structures with a simple sentence, please? What is it used for?

Done, tell me what you think.

Kerollmops · 2023-01-03T12:51:11Z

meilisearch/src/routes/indexes/mod.rs

@@ -36,7 +37,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
            )
            .service(web::resource("/stats").route(web::get().to(SeqHandler(get_index_stats))))
            .service(web::scope("/documents").configure(documents::configure))
-            .service(web::scope("/search").configure(search::configure))
+            .service(web::scope("/search").configure(|cfg| search::configure(cfg, rate_limiters)))


Nice, only enabled on the search route. I like that ❤️

Kerollmops · 2023-01-03T12:52:06Z

meilisearch/src/routes/mod.rs

    cfg.service(web::scope("/tasks").configure(tasks::configure))
        .service(web::resource("/health").route(web::get().to(get_health)))
        .service(web::scope("/keys").configure(api_key::configure))
        .service(web::scope("/dumps").configure(dump::configure))
        .service(web::resource("/stats").route(web::get().to(get_stats)))
        .service(web::resource("/version").route(web::get().to(get_version)))
-        .service(web::scope("/indexes").configure(indexes::configure))
+        .service(web::scope("/indexes").configure(|cfg| indexes::configure(cfg, rate_limiters)))


Ho and the indexes route? Why? Is it costly to ask for indexes info? Not sure.

Not enabled in the indexes route: the configure function of the indexes is what calls the configure function for the search, which requires the rate limiters, so we need to pass them through.

The rate limiters are only in effect on the search route.

dureuill · 2023-01-03T13:08:41Z

@Kerollmops

I like what you've done in this PR.

Thanks! I must emphasize that the rate-limiter does not behave as I expected it to, though... as described in the "Notes on the implementation" in the description above

About the parameters that can't be higher than others, why can't we clamp them to the lowest number and always log the retained values instead of warning the user in the log? Not sure if we prefer to have a silent behavior 🤔

I think we can do both: [WARNING]Using the global rate limiting pool value (:globalValue) for the per IP rate limiting pool value (:ipValue) because it doesn't make sense to globally restrict more than per IP or something more elegant.

About supporting human readable durations, isn't there an equivalent to byte-unit on crates.io? Something able to parse: 5h30m20s, 5M4d30h.

I found https://lib.rs/crates/duration-str and https://lib.rs/crates/parse_duration but didn't bother to vet them for a first iteration.

Also, I don't know if being able to specify nanosecond-precise duration is important. Milliseconds or even simple seconds would be enough.

I must stress that the duration is the necessary duration to replenish a single request from the pool (I wish that actix governor had used a more classic "window" rate limiting algorithm, but this does not depend on me). Having it specified in seconds means that users would be unable to replenish, e.g. the 200-request pool before 1 minute and 4à seconds, which seems too much for me.

Kerollmops · 2023-01-05T10:13:13Z

Thanks! I must emphasize that the rate-limiter does not behave as I expected it to, though... as described in the "Notes on the implementation" in the description above

Yup, I noticed and this is the reason why we will not merge this PR in v1 😂

I found https://lib.rs/crates/duration-str and https://lib.rs/crates/parse_duration but didn't bother to vet them for a first iteration.

We found https://lib.rs/crates/humantime-serde that was looking pretty good too!

Having it specified in seconds means that users would be unable to replenish, e.g. the 200-request pool before 1 minute and 4à seconds, which seems too much for me.

Indeed, but having a millisecond precision would be enough.

dureuill added this to the v1.0.0 milestone Dec 29, 2022

dureuill mentioned this pull request Dec 29, 2022

Improve the way Meilisearch handles high volumes of search requests #3126

Open

Kerollmops self-requested a review December 29, 2022 10:04

dureuill force-pushed the with_rate_limiter branch from 853fe3c to f0b4a00 Compare December 29, 2022 10:25

Kerollmops previously approved these changes Jan 3, 2023

View reviewed changes

dureuill added 3 commits January 3, 2023 16:25

Add rate-limiting options

5cf71c6

Add actix-governor

a82f8aa

Use actix-governor to perform rate-limiting

6678491

dureuill dismissed Kerollmops’s stale review via 6678491 January 3, 2023 15:25

dureuill force-pushed the with_rate_limiter branch from f0b4a00 to 6678491 Compare January 3, 2023 15:25

dureuill removed this from the v1.0.0 milestone Jan 9, 2023

curquiza marked this pull request as draft April 26, 2023 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add crude rate limiter #3283

Add crude rate limiter #3283

dureuill commented Dec 29, 2022

irevoire commented Jan 2, 2023

dureuill commented Jan 3, 2023

Kerollmops left a comment

Kerollmops Jan 3, 2023

dureuill Jan 3, 2023

Kerollmops Jan 3, 2023

dureuill Jan 3, 2023

Kerollmops Jan 3, 2023

Kerollmops Jan 3, 2023

dureuill Jan 3, 2023

dureuill commented Jan 3, 2023 •

edited

Kerollmops commented Jan 5, 2023

Add crude rate limiter #3283

Are you sure you want to change the base?

Add crude rate limiter #3283

Conversation

dureuill commented Dec 29, 2022

Pull Request

Related issue

What does this PR do?

Notes on the implementation

PR checklist

irevoire commented Jan 2, 2023

dureuill commented Jan 3, 2023

Kerollmops left a comment

Choose a reason for hiding this comment

Kerollmops Jan 3, 2023

Choose a reason for hiding this comment

dureuill Jan 3, 2023

Choose a reason for hiding this comment

Kerollmops Jan 3, 2023

Choose a reason for hiding this comment

dureuill Jan 3, 2023

Choose a reason for hiding this comment

Kerollmops Jan 3, 2023

Choose a reason for hiding this comment

Kerollmops Jan 3, 2023

Choose a reason for hiding this comment

dureuill Jan 3, 2023

Choose a reason for hiding this comment

dureuill commented Jan 3, 2023 • edited

Kerollmops commented Jan 5, 2023

dureuill commented Jan 3, 2023 •

edited