-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization refactor: introduce new deserialization traits #970
base: main
Are you sure you want to change the base?
Conversation
See the following report for details: cargo semver-checks output
|
bed4295
to
a1cf055
Compare
compat_check_serialized::<BigDecimal>(&ColumnType::Decimal, &num1); | ||
compat_check_serialized::<BigDecimal>(&ColumnType::Decimal, &num2); | ||
compat_check_serialized::<BigDecimal>(&ColumnType::Decimal, &num3); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why some types are only tested in this function and some are both tested here and in a separate function? I see there are more occurrences of this, but I won't comment each one, just the comment here and previous one about bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea.
@piodul ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why some types are only tested in this function and some are both tested here and in a separate function?
I'm not entirely sure because it was a long time ago, so what I'll write here might be wrong (with more probability than usual, at least).
There are two kinds of tests in this file:
- Tests which verify that the
DeserializeCql
works properly (this is new code which needs to be tested) - Tests which verify that the
DeserializeCql
implementation agrees with the legacyFromCqlVal
-based one
They both serve a different purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main question still remains: why are some types only tested in the second way and not the first way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, sorry. I might have just forgot to write the first kind of test for some types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, unless there is a reason for some types to be absent from one type of tests please implenent both @wprzytula
faa92aa
to
8f09a90
Compare
v2:
While reviewing, please pay attention to the error types used, especially to the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional question: do we want to expose all of the iterator types to the users? I'm aware that UdtIterator
might be useful to implement deserialization for custom user types. What about other iterators such as ListLikeIterator
or MapIterator
? Do you see any reason for exposing these?
|
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
There is a purposeful change from the previous framework: CQL List can no longer be deserialized straight to a HashSet or a BTreeSet. Such deserialization would be lossy, which is a property we don't want in our framework. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
v5.1:
|
Keep in mind that now, with those changes introduced, compatibility tests between the new and the old framework and no longer relevant, because the old framework's backbone (
|
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
This implementation is important for two reasons: 1. It enables using the upper layers of the old framework over the new one, which makes the transition smoother. 2. Some users (perhaps ORM users?) are going to need the dynamic capabilities that the previous framework offered: receiving rows consisting of arbitrary number of columns of arbitrary types. This is a perfect use case for Row. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
This is an iterator over rows, allowing lazy and flexible deserialization. Returns ColumnIterator for each row. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
This iterator wraps over RowIterator and for each row consumes the ColumnIterator and deserializes the given type. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
In the future, we will probably deprecate and remove `deser_cql_value` altogether. For now, let's make it at least less bloaty. To reduce code duplication, `deser_cql_value()` now uses DeserializeValue impls for nearly all of the deserialized types. Two notable exceptions are: 1. CQL Map - because it is represented as Vec<(CqlValue, CqlValue)> in CqlValue, and Vec<T> is only deserializable from CQL Set|Map. Therefore, MapIterator is deserialized using its DeserializeValue impl, and then collected into Vec. 2. CQL Tuple - because it is represented in CqlValue much differently than in DeserializeValue impls: Vec<CqlValue> vs (T1, T2, ..., Tn). Therefore, it's similarly to how it was before, just style is changed from imperative to iterator-based, and DeserializeValue impl is called instead of `deser_cql_value` there. As a bonus, we get more descriptive error messages (as compared to old `ParseError::BadIncomingData` ones).
In a manner similar to the previous commit, old imperative logic in `deser_row` is replaced with new iterator-based one, which uses the new deserialization framework. As a bonus, we get more descriptive error messages (as compared to old `ParseError::BadIncomingData` ones).
v5.2:
|
Refs #961
The new traits
This PR introduces the new deserialization traits,
DeserializeValue
andDeserializeRow
, which are going to replaceFromCqlVal
andFromRow
, respectively.If a type implements
DeserializeValue<'f>
, this means that it can take a buffer of lifetime'f
and deserialize itself, treating the contents of the buffer as a CQL value. Analogously,DeserializeRow<'f>
allows for deserialization of types that are supposed to represent a whole row returned in the results.Deserialization is now split into two phases: type checking and actual deserialization. Both traits have two methods:
type_check
anddeserialize
. The idea behind this is to validate the column types of the response only once, after the response is received. Thedeserialize
method is then called for each row/value to perform the actual parsing: that method is allowed to assume thattype_check
was called and may skip some type checking for performance (although it cannot use that assumption to perform unsafe operations, it is not an unsafe method).The previous framework only supported deserialization of types that exclusively own their data, for example
String
orVec<u8>
. However, the new framework allows for types that borrow or co-own the frame buffer:- Types that borrow the buffer are
&'f str
and&'f [u8]
. They just point to the raw data from the serialized response. Rust's lifetime system makes sure that the the user doesn't deallocate the serialized response until using the deserialized types that point to the buffer are dropped. Apart from the aforementioned types, a bunch of iterator types have been introduced that allow consuming collections without allocations.- Serialized frame is represented with
bytes::Bytes
. It is possible to create a subslice (also of typeBytes
) that keep the original alive. The type being deserialized can obtain access to such subslice. Like in the case of e.g.&str
vs.String
, keepingBytes
allows to avoid an allocation, but it is also easier to handle as deserialized type doesn't have to be bound by any lifetime. It important to be careful when handling such types as they keep whole serialized frame alive, even if they only point to a subslice. This can lead to a space leak and more memory being used than necessary.Implementation of the new traits for all supported types
All the previously supported types (i.e. those that could be deserialized using the old framework) are supported now as well.
There is only one exception: CQL List is no longer deserializable into a set (
HashSet
orBTreeSet
), because such conversion could be lossy, which is an unwanted property for deserialization.Errors
The error framework is analogous to one used in the serialization framework:
DeserializationError
is added that resemblesSerializationError
and is type-erased as well.As the new deserialization framework has separate
type_check()
anddeserialize()
function (contrary to the serialization framework, which has only oneserialize()
method), it makes sense for them to return distinct errors.That's why a new
TypeCheckError
is added, too, which is type-erased as well.Testing
All new types are unit tested.
Additionally, a decent test suite is added to verify compatibility of the old and the new framework.
New errors are tested, too, although not exhaustively. I'd prefer not to bog down too much in testing them thoroughly.
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.