Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement dynamic pruning strategy #1295

Merged
merged 2 commits into from
May 22, 2024

Conversation

morph-dev
Copy link
Collaborator

@morph-dev morph-dev commented May 10, 2024

What was wrong?

Current pruning logic for IdIndexedV1Store is to delete up to 5% of storage, but no more than 100 elements. However, even 100 elements is too slow on big databases (e.g. 35GB that are currently used). It can take up to a second and sometimes even longer. This wouldn't be such a big problem if this wouldn't block other read/writes.

How was it fixed?

The idea is to have dynamic number of items that we delete. The PruningConfig allows us to set parameters, for this dynamic strategy:

  • what is the optimal pruning duration - default 100-300 milliseconds
  • by how much to increase/decrease number of items to prune when pruning is faster/slower than optimal - default 20%

Also added logic for handling the zero storage capacity to prune the db (if non-empty) and set radius to zero.

To-Do

@morph-dev morph-dev self-assigned this May 10, 2024
@njgheorghita
Copy link
Collaborator

It seems like this might be relevant... Taking into account #1228 and that (IMO) we will sooner rather than later want to support this, along with a "max db" size (aka a user has 100 gbs of disk space they want to contribute, we'll actually spin up 10 nodes of 10gbs under the hood). Once that's implemented, we'll have a consistent max db size, which means we really don't have to worry about improving the pruning mechanism to support large dbs and can optimize db performance against a fixed target (eg a 10gb storage). Assuming that my description is the direction we're heading, does this pr still make sense?

@pipermerriam
Copy link
Member

we'll actually spin up 10 nodes of 10gbs under the hood)

I don't think this is how the architecture will actually work out. I think that under the hood it will be a single 100gb database that is represented in the network across 10 different node-ids.

See: ethereum/portal-network-specs#283 (comment)

@morph-dev
Copy link
Collaborator Author

Together with what Piper said, I would add that there is benefit for smaller dbs as well (as we can purge more there).

In general, I would say it's better to monitor and adjust in any scenario. For example:

  • target of 10GB will perform differently on different devices (e.g. SSD vs HD)
  • even if you have 1 node of 10GB (i.e. our fixed target), you might also run execution+consensus client and your disk usage is already high

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Mostly just nit comments that you're welcome to ignore.

I apparently have a strong opinion about the ideal way to calculate the percentage update of the pruning number (more below). But in the end, it's not a blocker because even if the number isn't ideal, I don't foresee any catastrophe. So don't let me stop you from merging it if you disagree about the percentage update 😆

Comment on lines +127 to +128
self.radius = Distance::ZERO;
self.metrics.report_radius(self.radius);
Copy link
Collaborator

@carver carver May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiniest of nits: maybe it's worth skipping reporting the radius if it was already 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with "if it was already 0." part. This functions is called only once during initialization, so it wasn't anything useful at this point (it is MAX by default).

I also believe that it's still useful to report it at least once, so we have at least some data in grafana. Or do you want on purpose not to report it to grafana?

Comment on lines +442 to +444
if !self.pruning_strategy.should_prune(&self.usage_stats) {
warn!(Db = %self.config.content_type,
"Pruning requested but we are below target capacity. Skipping");
"Pruning requested but not needed. Skipping");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever showing up? It seems strange that the call to prune only happens if should_prune() is true and then prune immediately checks it again inside.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't show up, and with current code, it can't (as you said, should_prune is always called right before prune).
I put it in case there is some refactoring or some other changes to the code, in which case we will see it in the logs (therefore warn level, indicating that something got wrong).

Comment on lines 476 to +478
self.metrics.stop_process_timer(delete_timer);
self.pruning_strategy
.observe_pruning_duration(pruning_start_time.elapsed());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a bummer to have prune run its own Instant timer, when we're already calculating the time in the metrics. I suppose if stop_process_timer returned the duration it observed (looks like it's an f64 in seconds, but we could convert to a Duration), then observe_duration could reuse it. Do you agree that would be preferable? If you don't feel like doing that, I might do it after this merges, just so it's ready the next time we want to re-use the timing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the same thing. But I didn't want to complicate this PR any further. I'm fine with doing it myself in the followup PR.

max_pruning_count_change_fraction: f64,
optimal_pruning_duration_range: Range<Duration>,
) -> Self {
if !(0.0..1.0).contains(&target_capacity_fraction) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I suppose 1.0 is a valid choice here, which Range seems to exclude. Due to database compression, there was even a time when >1.0 choices might be valid. I haven't observed how SQL performs since the switchover. Anyway it all seems mostly moot since it's hard-coded for now, but just wanted to make a note that I think it would be fine to limit the input up to 2.0 or something, since it would just be about letting people play with configuration values eventually, I don't think it's a massive footgun that we need to panic to avoid letting them play with larger values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely while coding I had a mental model that target_capacity should we lower than storage_capacity.

Maybe code would work even if that's not the case, but I would say it doesn't makes much sense, because both storage capacity and target capacity represent the raw content size (content_id.len() + content_key.len() + content_value.len()), or at least approximation (there is some extra storage used for indexes and other smaller columns). Irrelevant, but I don't think that there is any compression happening at the moment.

With that being said, I will allow value 1.0. And once we have real use case for using values higher than 1, we can come back to it (in which case we would probably have to rewrite some of the logic).

target_capacity_fraction
)
}
if !(0.0..1.0).contains(&max_pruning_count_change_fraction) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, it's not totally obvious to me that 1 is a hard upper limit here. If someone wanted to play with 2x the pruning count change, my intuition isn't to panic to prevent them from playing with it.

Edit: I see where the number came from now, but stand by the above comment when paired with the suggestion below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wanted increasing the max_pruning_count to be harder than decreasing, because downside of being too low is not so bad (we prune less items more frequently) while downside of being too high is that db is blocked for longer.

I added better explanation to the max_pruning_count_change_fraction field.

However, if the behavior that you expected is more common way of doing it, I'm willing to change it in the followup PR.

Db = %self.config.content_type,
"Pruning was too slow. Decreasing max_pruning_count",
);
1. - pruning_config.max_pruning_count_change_fraction
Copy link
Collaborator

@carver carver May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now where the upper limit of 1 came from. My intuition before getting to this code is that a change fraction of 1.0 would double on the upside and cut in half on the downside. Those seem like the more symmetric options to me. (ie~ one increase and one decrease takes you back to where you started)

In this approach, choosing a value of .9 nearly doubles on the upside, but is 1/10th on the downside, which is aggressively dropping but not that aggressively climbing (it takes more than 3 upside adjustments to make up for 1 downside adjustment). So if we want one configuration number to govern both directions, my intuition says that this line should be:

Suggested change
1. - pruning_config.max_pruning_count_change_fraction
1. / (1. + pruning_config.max_pruning_count_change_fraction)

Copy link
Member

@ogenev ogenev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carver I do disagree with the proposed changes about pruning number changes, but not strongly :). Willing to discuss it more before merging.

Comment on lines +127 to +128
self.radius = Distance::ZERO;
self.metrics.report_radius(self.radius);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with "if it was already 0." part. This functions is called only once during initialization, so it wasn't anything useful at this point (it is MAX by default).

I also believe that it's still useful to report it at least once, so we have at least some data in grafana. Or do you want on purpose not to report it to grafana?

Comment on lines +442 to +444
if !self.pruning_strategy.should_prune(&self.usage_stats) {
warn!(Db = %self.config.content_type,
"Pruning requested but we are below target capacity. Skipping");
"Pruning requested but not needed. Skipping");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't show up, and with current code, it can't (as you said, should_prune is always called right before prune).
I put it in case there is some refactoring or some other changes to the code, in which case we will see it in the logs (therefore warn level, indicating that something got wrong).

Comment on lines 476 to +478
self.metrics.stop_process_timer(delete_timer);
self.pruning_strategy
.observe_pruning_duration(pruning_start_time.elapsed());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the same thing. But I didn't want to complicate this PR any further. I'm fine with doing it myself in the followup PR.

max_pruning_count_change_fraction: f64,
optimal_pruning_duration_range: Range<Duration>,
) -> Self {
if !(0.0..1.0).contains(&target_capacity_fraction) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely while coding I had a mental model that target_capacity should we lower than storage_capacity.

Maybe code would work even if that's not the case, but I would say it doesn't makes much sense, because both storage capacity and target capacity represent the raw content size (content_id.len() + content_key.len() + content_value.len()), or at least approximation (there is some extra storage used for indexes and other smaller columns). Irrelevant, but I don't think that there is any compression happening at the moment.

With that being said, I will allow value 1.0. And once we have real use case for using values higher than 1, we can come back to it (in which case we would probably have to rewrite some of the logic).

target_capacity_fraction
)
}
if !(0.0..1.0).contains(&max_pruning_count_change_fraction) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wanted increasing the max_pruning_count to be harder than decreasing, because downside of being too low is not so bad (we prune less items more frequently) while downside of being too high is that db is blocked for longer.

I added better explanation to the max_pruning_count_change_fraction field.

However, if the behavior that you expected is more common way of doing it, I'm willing to change it in the followup PR.

@morph-dev
Copy link
Collaborator Author

morph-dev commented May 22, 2024

@carver I do disagree with the proposed changes about pruning number changes, but not strongly :). Willing to discuss it more before merging.

I'm going to merge this and do a followup PR if needed.

@morph-dev morph-dev merged commit 4d603d4 into ethereum:master May 22, 2024
8 checks passed
@morph-dev morph-dev deleted the pruning_strategy branch May 22, 2024 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants