Deserialization refactor: introduce new deserialization traits #970

wprzytula · 2024-03-24T14:16:12Z

The new traits

This PR introduces the new deserialization traits, DeserializeValue and DeserializeRow, which are going to replace FromCqlVal and FromRow, respectively.

If a type implements DeserializeValue<'f>, this means that it can take a buffer of lifetime 'f and deserialize itself, treating the contents of the buffer as a CQL value. Analogously, DeserializeRow<'f> allows for deserialization of types that are supposed to represent a whole row returned in the results.

Deserialization is now split into two phases: type checking and actual deserialization. Both traits have two methods: type_check and deserialize. The idea behind this is to validate the column types of the response only once, after the response is received. The deserialize method is then called for each row/value to perform the actual parsing: that method is allowed to assume that type_check was called and may skip some type checking for performance (although it cannot use that assumption to perform unsafe operations, it is not an unsafe method).

The previous framework only supported deserialization of types that exclusively own their data, for example String or Vec<u8>. However, the new framework allows for types that borrow or co-own the frame buffer:
- Types that borrow the buffer are &'f str and &'f [u8]. They just point to the raw data from the serialized response. Rust's lifetime system makes sure that the the user doesn't deallocate the serialized response until using the deserialized types that point to the buffer are dropped. Apart from the aforementioned types, a bunch of iterator types have been introduced that allow consuming collections without allocations.
- Serialized frame is represented with bytes::Bytes. It is possible to create a subslice (also of type Bytes) that keep the original alive. The type being deserialized can obtain access to such subslice. Like in the case of e.g. &str vs. String, keeping Bytes allows to avoid an allocation, but it is also easier to handle as deserialized type doesn't have to be bound by any lifetime. It important to be careful when handling such types as they keep whole serialized frame alive, even if they only point to a subslice. This can lead to a space leak and more memory being used than necessary.

Implementation of the new traits for all supported types

All the previously supported types (i.e. those that could be deserialized using the old framework) are supported now as well.
There is only one exception: CQL List is no longer deserializable into a set (HashSet or BTreeSet), because such conversion could be lossy, which is an unwanted property for deserialization.

Errors

The error framework is analogous to one used in the serialization framework: DeserializationError is added that resembles SerializationError and is type-erased as well.

As the new deserialization framework has separate type_check() and deserialize() function (contrary to the serialization framework, which has only one serialize() method), it makes sense for them to return distinct errors.
That's why a new TypeCheckError is added, too, which is type-erased as well.

Testing

All new types are unit tested.
Additionally, a decent test suite is added to verify compatibility of the old and the new framework.
New errors are tested, too, although not exhaustively. I'd prefer not to bog down too much in testing them thoroughly.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

github-actions · 2024-03-24T14:20:52Z

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: 42c69c7

See the following report for details:

cargo semver-checks output

./scripts/semver-checks.sh --baseline-rev 5dbb10b800ef90a2f5ec08ca9b39c2c3e78c5e86
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev 5dbb10b800ef90a2f5ec08ca9b39c2c3e78c5e86
     Cloning 5dbb10b800ef90a2f5ec08ca9b39c2c3e78c5e86
     Parsing scylla v0.13.0 (current)
      Parsed [  21.692s] (current)
     Parsing scylla v0.13.0 (baseline)
      Parsed [  18.171s] (baseline)
    Checking scylla v0.13.0 -> v0.13.0 (no change)
     Checked [   0.525s] 72 checks; 72 passed, 0 unnecessary
    Finished [  40.440s] scylla
     Parsing scylla-cql v0.2.0 (current)
      Parsed [   9.845s] (current)
     Parsing scylla-cql v0.2.0 (baseline)
      Parsed [   9.941s] (baseline)
    Checking scylla-cql v0.2.0 -> v0.2.0 (no change)
     Checked [   0.302s] 72 checks; 71 passed, 1 failed, 0 unnecessary

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.31.0/src/lints/enum_variant_added.ron

Failed in:
  variant ParseError:DeserializationError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/frame_errors.rs:43
     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  20.130s] scylla-cql
make: *** [Makefile:61: semver-rev] Error 1

scylla-cql/src/types/deserialize/value.rs

scylla-cql/src/types/deserialize/mod.rs

scylla-cql/src/types/deserialize/row.rs

scylla-cql/src/types/deserialize/value.rs

Lorak-mmk · 2024-04-01T13:50:00Z

scylla-cql/src/types/deserialize/value.rs

+            compat_check_serialized::<BigDecimal>(&ColumnType::Decimal, &num1);
+            compat_check_serialized::<BigDecimal>(&ColumnType::Decimal, &num2);
+            compat_check_serialized::<BigDecimal>(&ColumnType::Decimal, &num3);
+        }


Why some types are only tested in this function and some are both tested here and in a separate function? I see there are more occurrences of this, but I won't comment each one, just the comment here and previous one about bool

No idea.
@piodul ?

Why some types are only tested in this function and some are both tested here and in a separate function?

I'm not entirely sure because it was a long time ago, so what I'll write here might be wrong (with more probability than usual, at least).

There are two kinds of tests in this file:

Tests which verify that the DeserializeCql works properly (this is new code which needs to be tested)

Tests which verify that the DeserializeCql implementation agrees with the legacy FromCqlVal-based one

They both serve a different purpose.

My main question still remains: why are some types only tested in the second way and not the first way?

I don't remember, sorry. I might have just forgot to write the first kind of test for some types.

Ok, unless there is a reason for some types to be absent from one type of tests please implenent both @wprzytula

scylla-cql/src/types/deserialize/value.rs

wprzytula · 2024-05-10T08:35:53Z

v2:

addressed most review comments (though DeserializeCql -> DeserializeValue rename has not yet been done),
set up and used an error framework similar to that used in serialization.

While reviewing, please pay attention to the error types used, especially to the use of GenericParseError variant (which I hate, but before errors refactor I can do little about this).

muzarski

Additional question: do we want to expose all of the iterator types to the users? I'm aware that UdtIterator might be useful to implement deserialization for custom user types. What about other iterators such as ListLikeIterator or MapIterator? Do you see any reason for exposing these?

scylla-cql/src/types/deserialize/value.rs

wprzytula · 2024-05-14T11:50:23Z

Additional question: do we want to expose all of the iterator types to the users? I'm aware that UdtIterator might be useful to implement deserialization for custom user types. What about other iterators such as ListLikeIterator or MapIterator? Do you see any reason for exposing these?

exposing them might be useful for cpp-rust-driver (but in this aspect it's better to start with private visilibity and make it public only if really needed there),
it's not scylla-cql crate that should control visibility of the exports; it's scylla crate's responsibility to organise exports, just like you've recently done wrt some other exports from scylla-cql. Ideally, only leaf items (e.g. structs, consts, enums, etc.) are re-exported by scylla from scylla-cql, no node items (i.e. modules) - this gives the finest control over visibility.

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

There is a purposeful change from the previous framework: CQL List can no longer be deserialized straight to a HashSet or a BTreeSet. Such deserialization would be lossy, which is a property we don't want in our framework. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

wprzytula · 2024-05-28T13:58:11Z

v5.1:

renamed got field in Builtin*Errors to cql_type(s), as suggested by @Lorak-mmk.

wprzytula · 2024-05-29T10:03:58Z

v5.0:

* refactored `deser_cql_value` and `deser_rows` to use the new deserialization framework. Wins:
  
  1. less code duplication,
  2. better error messages in the old framework (coming from the new framework).

Keep in mind that now, with those changes introduced, compatibility tests between the new and the old framework and no longer relevant, because the old framework's backbone (deser_cql_value with all the impls) is deleted. Then:

Should we even keep those compatibility tests? They no longer prove anything. After all, they proved what we wanted them to prove, in earlier commits, and now are useless.
Should we perhaps postpone introducing those changes until we delete the whole old framework altogether? Then the tests are still valid.

@Lorak-mmk @muzarski

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

This implementation is important for two reasons: 1. It enables using the upper layers of the old framework over the new one, which makes the transition smoother. 2. Some users (perhaps ORM users?) are going to need the dynamic capabilities that the previous framework offered: receiving rows consisting of arbitrary number of columns of arbitrary types. This is a perfect use case for Row. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

This is an iterator over rows, allowing lazy and flexible deserialization. Returns ColumnIterator for each row. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

This iterator wraps over RowIterator and for each row consumes the ColumnIterator and deserializes the given type. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

In the future, we will probably deprecate and remove `deser_cql_value` altogether. For now, let's make it at least less bloaty. To reduce code duplication, `deser_cql_value()` now uses DeserializeValue impls for nearly all of the deserialized types. Two notable exceptions are: 1. CQL Map - because it is represented as Vec<(CqlValue, CqlValue)> in CqlValue, and Vec<T> is only deserializable from CQL Set|Map. Therefore, MapIterator is deserialized using its DeserializeValue impl, and then collected into Vec. 2. CQL Tuple - because it is represented in CqlValue much differently than in DeserializeValue impls: Vec<CqlValue> vs (T1, T2, ..., Tn). Therefore, it's similarly to how it was before, just style is changed from imperative to iterator-based, and DeserializeValue impl is called instead of `deser_cql_value` there. As a bonus, we get more descriptive error messages (as compared to old `ParseError::BadIncomingData` ones).

In a manner similar to the previous commit, old imperative logic in `deser_row` is replaced with new iterator-based one, which uses the new deserialization framework. As a bonus, we get more descriptive error messages (as compared to old `ParseError::BadIncomingData` ones).

wprzytula · 2024-05-29T10:48:32Z

v5.2: impl_tuple_multiple() macro related changes:

documented the macro;
added a special entry rule to the macro, so now its syntax is cleaner (no longer requires <, > characters to be passed) and the impl for () comes first, not last;
deduplicated the macro by importing it to row.rs from value.rs.

wprzytula force-pushed the value_tests branch from e8e2189 to 9fc957a Compare March 24, 2024 14:52

wprzytula marked this pull request as ready for review March 24, 2024 14:52

wprzytula force-pushed the value_tests branch 2 times, most recently from bed4295 to a1cf055 Compare March 24, 2024 15:14

wprzytula requested review from Lorak-mmk and muzarski March 24, 2024 15:15

wprzytula self-assigned this Mar 26, 2024

wprzytula added the performance Improves performance of existing features label Mar 26, 2024

wprzytula force-pushed the value_tests branch from a1cf055 to 5a40d0a Compare March 28, 2024 09:05

muzarski suggested changes Mar 29, 2024

View reviewed changes

scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved

Lorak-mmk requested changes Apr 1, 2024

View reviewed changes

muzarski mentioned this pull request Apr 19, 2024

prepared: docs for PreparedStatement #986

Merged

8 tasks

wprzytula added the waiting-on-author Waiting for a response from issue/PR author label May 7, 2024

wprzytula force-pushed the value_tests branch from 5a40d0a to e1a5536 Compare May 8, 2024 10:06

wprzytula marked this pull request as draft May 8, 2024 10:07

github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label May 8, 2024

wprzytula force-pushed the value_tests branch 3 times, most recently from faa92aa to 8f09a90 Compare May 10, 2024 08:25

wprzytula removed the waiting-on-author Waiting for a response from issue/PR author label May 10, 2024

wprzytula marked this pull request as ready for review May 10, 2024 08:26

wprzytula requested review from Lorak-mmk and muzarski May 10, 2024 08:26

wprzytula added this to the 0.14.0 milestone May 10, 2024

wprzytula requested a review from piodul May 10, 2024 08:36

muzarski suggested changes May 14, 2024

View reviewed changes

scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved

scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved

scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved

wprzytula force-pushed the value_tests branch from 8f09a90 to aab3df0 Compare May 15, 2024 14:28

wprzytula and others added 11 commits May 28, 2024 15:56

value: impl DeserializeValue for duration types

2c20899

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for date types

02ae796

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for time types

10e8097

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for timestamp types

8f997e4

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for inet

ce8e508

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for Uuid and Timeuuid

5d40530

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for Secrecy

f659966

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: Option/null tests

59d2c27

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: test null and empty

1fa1b16

value: impl DeserializeValue for Map

08983a3

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

wprzytula force-pushed the value_tests branch from 8c992d8 to 42c69c7 Compare May 28, 2024 13:57

wprzytula and others added 11 commits May 29, 2024 12:43

value: impl DeserializeValue for tuples

c58ac83

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

value: impl DeserializeValue for UDTs

782cdf8

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

deser/row: test deser as ColumnIterator

e28650a

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

deser/row: impl DeserializeRow for tuples

da4f694

Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

deser/result: introduce RowIterator

d4b827c

This is an iterator over rows, allowing lazy and flexible deserialization. Returns ColumnIterator for each row. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

deser/result: introduce TypedRowIterator

7a00119

This iterator wraps over RowIterator and for each row consumes the ColumnIterator and deserializes the given type. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>

deser/value: errors tests

32c9463

deser/row: errors tests

5be8264

wprzytula force-pushed the value_tests branch from 42c69c7 to 3a4b563 Compare May 29, 2024 10:44

wprzytula requested review from muzarski and Lorak-mmk May 29, 2024 10:53

Lorak-mmk self-assigned this Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserialization refactor: introduce new deserialization traits #970

Deserialization refactor: introduce new deserialization traits #970

wprzytula commented Mar 24, 2024 •

edited

github-actions bot commented Mar 24, 2024 •

edited

Lorak-mmk Apr 1, 2024

wprzytula Apr 3, 2024 •

edited

piodul May 10, 2024

Lorak-mmk May 10, 2024

piodul May 10, 2024

Lorak-mmk May 27, 2024

wprzytula commented May 10, 2024

muzarski left a comment

wprzytula commented May 14, 2024

wprzytula commented May 28, 2024

wprzytula commented May 29, 2024 •

edited

wprzytula commented May 29, 2024

Deserialization refactor: introduce new deserialization traits #970

Are you sure you want to change the base?

Deserialization refactor: introduce new deserialization traits #970

Conversation

wprzytula commented Mar 24, 2024 • edited

The new traits

Implementation of the new traits for all supported types

Errors

Testing

Pre-review checklist

github-actions bot commented Mar 24, 2024 • edited

Lorak-mmk Apr 1, 2024

Choose a reason for hiding this comment

wprzytula Apr 3, 2024 • edited

Choose a reason for hiding this comment

piodul May 10, 2024

Choose a reason for hiding this comment

Lorak-mmk May 10, 2024

Choose a reason for hiding this comment

piodul May 10, 2024

Choose a reason for hiding this comment

Lorak-mmk May 27, 2024

Choose a reason for hiding this comment

wprzytula commented May 10, 2024

muzarski left a comment

Choose a reason for hiding this comment

wprzytula commented May 14, 2024

wprzytula commented May 28, 2024

wprzytula commented May 29, 2024 • edited

wprzytula commented May 29, 2024

wprzytula commented Mar 24, 2024 •

edited

github-actions bot commented Mar 24, 2024 •

edited

wprzytula Apr 3, 2024 •

edited

wprzytula commented May 29, 2024 •

edited