Decouple table and column specs #956

wprzytula · 2024-03-13T10:23:07Z

Motivation

Every column in a Response::Result comes from the same table (and it follows that from the same keyspace as well). It's thus redundant to store a copy of TableSpec (table and keyspace names) in each ColumnSpec (name and type of a column), which was done before.

What's done

This PR moves TableSpec out of ColumnSpec and only allocates TableSpec once per each query response, effectively saving 2 * (C-1) string allocations, where C denotes the number of columns returned in the response.
TableSpec is now stored in ResultMetadata and PreparedMetadata.

As table spec is no longer available in column specs, a public field in QueryResult is added for users to still
be able to retrieve this information from QueryResult. Keep in mind that this is a temporary measure, because QueryResult in the current form will be deprecated soon as part of the upcoming deserialization refactor (#462).

Notes to reviewers

Please pay special attention to how user's experience changes after this API change. Don't they lose access to some information?

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
~~[ ] I added relevant tests for new features and bug fixes.~~
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
~~[ ] I have adjusted the documentation in ./docs/source/.~~
~~<[ ] I added appropriate Fixes: annotations to PR description.~~

github-actions · 2024-03-13T10:39:45Z

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: c37eb55

See the following report for details:

cargo semver-checks output

./scripts/semver-checks.sh --baseline-rev edfb28a04c609a0ebf1813797966b39bbbb3029c
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev edfb28a04c609a0ebf1813797966b39bbbb3029c
     Cloning edfb28a04c609a0ebf1813797966b39bbbb3029c
     Parsing scylla v0.12.0 (current)
      Parsed [  18.389s] (current)
     Parsing scylla v0.12.0 (baseline)
      Parsed [  17.517s] (baseline)
    Checking scylla v0.12.0 -> v0.12.0 (no change)
     Checked [   0.472s] 67 checks; 67 passed, 0 unnecessary
    Finished [  36.422s] scylla
     Parsing scylla-cql v0.1.0 (current)
      Parsed [   9.626s] (current)
     Parsing scylla-cql v0.1.0 (baseline)
      Parsed [   9.651s] (baseline)
    Checking scylla-cql v0.1.0 -> v0.1.0 (no change)
     Checked [   0.251s] 67 checks; 65 passed, 2 failed, 0 unnecessary

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.30.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field PreparedMetadata.table_spec in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/response/result.rs:411

--- failure struct_pub_field_missing: pub struct's pub field removed or renamed ---

Description:
A publicly-visible struct has at least one public field that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.30.0/src/lints/struct_pub_field_missing.ron

Failed in:
  field table_spec of struct ColumnSpec, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-edfb28a04c609a0ebf1813797966b39bbbb3029c/f589045e3f86a4a0964789e2de12a7e48f112b11/scylla-cql/src/frame/response/result.rs:384
     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  19.565s] scylla-cql
make: *** [Makefile:53: semver-rev] Error 1

Lorak-mmk · 2024-03-13T11:13:20Z

scylla-cql/src/frame/response/result.rs

+// Only allocates a new `TableSpec` if one is not yet given.
+// TODO: consider equality check between known and deserialized spec.
+fn deser_table_spec(
+    buf: &mut &[u8],
+    known_spec: Option<TableSpec>,
+) -> StdResult<TableSpec, ParseError> {
+    let ks_name = types::read_string(buf)?;
+    let table_name = types::read_string(buf)?;
+
+    Ok(known_spec.unwrap_or_else(|| TableSpec {
+        ks_name: ks_name.to_owned(),
+        table_name: table_name.to_owned(),
+    }))


I'd like those equality checks to be a part of this PR so that we have more chances of catching any issues caused by this change.

As you know CQL protocol allows table spec to be either global or per column.
Do you know why that is? I assume there is some reason for the server to not always send the global one, and right now this PR assumes there is no reason and we can just use table spec of last column everywhere.

If there really is no reason for those per-column specs to exist, then that should be explained in the comment, and there should still be panicking checks in case this assumption turns out to be false.

Right, let's mind the Chesterton’s Fence: don’t ever take down a fence until you know why it was put up.
I'll investigate those per-column table specs.

Python driver makes the same assumption, i.e. uses the table spec of the first column for all columns.

This may be a good question for #engineering channel on Slack. I'd like to know when will Scylla / Cassandra send global spec and when per-column spec, and if it's posibble for columns specs to differ. If it's not possible, then learning a bit of history would be beneficial here imo: was it possible in the past? What was the case for it?

@nyh This is what I've talked to you about.

@Lorak-mmk I'm willing to merge this. This seems to be a worthy optimisation.

To sum up, at least the Python driver uses the first column table and keyspace names as the names for all columns, and no one ever complained about that.
Based on our research and scylladb/scylladb#17788 (comment), all columns are going to have the same keyspace and table names, so we can represent them only once.

As discussed, I'm going change the code so that it checks that those name are indeed the same and returns ParseError in case this assumption is violated, this way avoiding quiet passes in such case.

One more thing to consider: how likely is it that in some future version Cassandra / Scylla adds some form of joins / multi table queries?
AFAIK Cassandra already added ACID transactions (using their ACCORD algorithm), it doesn't seem so improbable for them to add something that queries more than 1 table in the future.
As those structs you modify are public, supporting this would require a breaking change.

Do you think we could use Cow / Arc or something else to make this future-proof? That way we could have global table spec in ResultMetadata, but also have per-column spec if necessary.

Adding joins to CQL would be such a giant change in ScyllaDB - wide-column DBs aren't made for joins - that I strongly doubt it will ever happen.
Arcs and Cows have considerable overhead that I don't deem worth incurring here just due to an extremely unprobable scenario.

Every column in a Response::Result comes from the same table (and it follows that from the same keyspace as well). It's thus redundant to store a copy of TableSpec (table and keyspace names) in each ColumnSpec (name and type of a column), which was done in the code before. Correctness of the above assumption is checked: if any two columns of the same Result differ in keyspace or table names, ParseError is returned. This commit moves TableSpec out of ColumnSpec and only allocates it once per query response, effectively saving `2 * (C-1)` string allocations, where `C` denotes the number of columns returned in the response. TableSpec is now stored in ResultMetadata and PreparedMetadata. A getter for table_spec in RowIterator is provided for users and for existing tests.

Before the previous commit, table spec was available in column specs. As it's no longer held there, a public field is added for users to still be able to retrieve this information from QueryResult.

wprzytula · 2024-03-26T10:43:11Z

v2:

the assumption of the same table specs for every code is checked and else ParseError is returned.

wprzytula self-assigned this Mar 13, 2024

wprzytula requested review from Lorak-mmk and piodul March 13, 2024 10:23

github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Mar 13, 2024

Lorak-mmk requested changes Mar 13, 2024

View reviewed changes

wprzytula added the performance Improves performance of existing features label Mar 13, 2024

wprzytula mentioned this pull request Mar 13, 2024

Don't repeat keyspace and table name for every column in the results scylladb/scylladb#17788

Open

wprzytula added 2 commits March 26, 2024 11:32

query_result: store table_spec

c37eb55

Before the previous commit, table spec was available in column specs. As it's no longer held there, a public field is added for users to still be able to retrieve this information from QueryResult.

wprzytula force-pushed the decouple-table-and-col-specs branch from 5dfc972 to c37eb55 Compare March 26, 2024 10:41

wprzytula requested a review from Lorak-mmk March 26, 2024 13:52

muzarski mentioned this pull request Apr 16, 2024

Introduce support for Tablets #937

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple table and column specs #956

Decouple table and column specs #956

wprzytula commented Mar 13, 2024

github-actions bot commented Mar 13, 2024 •

edited

Lorak-mmk Mar 13, 2024

wprzytula Mar 13, 2024

wprzytula Mar 13, 2024

Lorak-mmk Mar 13, 2024

wprzytula Mar 13, 2024

wprzytula Mar 26, 2024 •

edited

Lorak-mmk Mar 26, 2024

wprzytula Mar 26, 2024

wprzytula commented Mar 26, 2024

Decouple table and column specs #956

Are you sure you want to change the base?

Decouple table and column specs #956

Conversation

wprzytula commented Mar 13, 2024

Motivation

What's done

Notes to reviewers

Pre-review checklist

github-actions bot commented Mar 13, 2024 • edited

Lorak-mmk Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 13, 2024

Choose a reason for hiding this comment

Lorak-mmk Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 13, 2024

Choose a reason for hiding this comment

wprzytula Mar 26, 2024 • edited

Choose a reason for hiding this comment

Lorak-mmk Mar 26, 2024

Choose a reason for hiding this comment

wprzytula Mar 26, 2024

Choose a reason for hiding this comment

wprzytula commented Mar 26, 2024

github-actions bot commented Mar 13, 2024 •

edited

wprzytula Mar 26, 2024 •

edited