Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capacity for serialization is suboptimal #579

Open
colin-grapl opened this issue Oct 15, 2022 · 3 comments · May be fixed by #582
Open

Capacity for serialization is suboptimal #579

colin-grapl opened this issue Oct 15, 2022 · 3 comments · May be fixed by #582
Assignees
Labels
API-breaking This might introduce incompatible API changes enhancement New feature or request performance Improves performance of existing features

Comments

@colin-grapl
Copy link
Contributor

Currently the driver is preallocating buffers for serialization using a much lower bound than it should. In most cases we know exactly how much to allocate, or we know a more accurate lower bound.

In particular, right now for types like &[i32] of length N only N bytes are preallocated. Instead it should be size_of::<i32i>() * N for the bytes plus size_of::<i32> for the tag.

This is at least true for all integer types. Where it's a bit trickier is if we're in a generic context, such as with Vec, &[T], or tuples of T and if T contains data on the heap.

size_of::` is 24, but the string itself may be smaller than 24 characters (or larger, but that's less of an issue).

I would propose two things.

  1. All integer types have their capacity calculation fixed. This is a trivial change and will reduce allocations considerably for integer types (I have tested this).
  2. I would suggest adding a new method to Value., size_hint. size_hint would return the lower bound for the size of the serialized value (including its tag).

size_hint would return usize, with a default implementation returning size_of::(), as that is the lower bound for all values. The default implementation is purely to preserve compatibility with any external implementations of Value.

This also lets types that are relatively expensive to calculate the size of return a lower bound. For example, HashMap<String, T> could just return (2 * ::size_hint()) + (2 * T::size_hint())

I've already implemented this in a fork.

Here are the results of benchmarking

serialize_lz4_for_iai
  Instructions:               10051 (-3.983569%)
  L1 Accesses:                12901 (-4.238420%)
  L2 Accesses:                   32 (No change)
  RAM Accesses:                 156 (-1.886792%)
  Estimated Cycles:           18521 (-3.521384%)

serialize_none_for_iai
  Instructions:                2415 (-14.93484%)
  L1 Accesses:                 3323 (-14.92576%)
  L2 Accesses:                   14 (No change)
  RAM Accesses:                  73 (-2.666667%)
  Estimated Cycles:            5948 (-9.892441%)

If this is of interest I can cut a PR.

Here's the code. Very minimal changes and, again, nothing breaking.

https://github.com/scylladb/scylla-rust-driver/compare/main...colin-grapl:fix-serialize-capacity?expand=1#diff-3472ebb06a92a0acc29c561335d5af1e24b27ea23f6835fccf4f24518089a0f2

@colin-grapl
Copy link
Contributor Author

BTW, also recommend preallocating more space in data used in SerializedRequest::make

serialize_lz4_for_iai
  Instructions:                9854 (-1.960004%)
  L1 Accesses:                12644 (-1.992094%)
  L2 Accesses:                   32 (No change)
  RAM Accesses:                 155 (-0.641026%)
  Estimated Cycles:           18229 (-1.576589%)

serialize_none_for_iai
  Instructions:                2234 (-7.494824%)
  L1 Accesses:                 3090 (-7.011736%)
  L2 Accesses:                   18 (+28.57143%)
  RAM Accesses:                  71 (-2.739726%)
  Estimated Cycles:            5665 (-4.757902%)

Here's the diff for that:

-        let mut data = vec![0; HEADER_SIZE];
+        let mut data = Vec::with_capacity(32);
+        data.resize(HEADER_SIZE, 0);

At minimum I'd suggest preallocating to 16, but I think 32 is probably a sweet spot.

@Jasperav
Copy link
Contributor

Possibly related? #385

@colin-grapl
Copy link
Contributor Author

Definitely related, my fix is more of a stop gap. The reality is that the entire write path could be optimized to just use a single buffer to write to, and reuse that buffer such that it generally allocates for one path and never again.

@colin-grapl colin-grapl linked a pull request Oct 21, 2022 that will close this issue
5 tasks
@mykaul mykaul added the performance Improves performance of existing features label Oct 30, 2022
@mykaul mykaul added the enhancement New feature or request label Jan 31, 2023
@piodul piodul added the API-breaking This might introduce incompatible API changes label Mar 28, 2023
@Lorak-mmk Lorak-mmk self-assigned this Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes enhancement New feature or request performance Improves performance of existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants