-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple table and column specs #956
base: main
Are you sure you want to change the base?
Conversation
See the following report for details: cargo semver-checks output
|
// Only allocates a new `TableSpec` if one is not yet given. | ||
// TODO: consider equality check between known and deserialized spec. | ||
fn deser_table_spec( | ||
buf: &mut &[u8], | ||
known_spec: Option<TableSpec>, | ||
) -> StdResult<TableSpec, ParseError> { | ||
let ks_name = types::read_string(buf)?; | ||
let table_name = types::read_string(buf)?; | ||
|
||
Ok(known_spec.unwrap_or_else(|| TableSpec { | ||
ks_name: ks_name.to_owned(), | ||
table_name: table_name.to_owned(), | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like those equality checks to be a part of this PR so that we have more chances of catching any issues caused by this change.
As you know CQL protocol allows table spec to be either global or per column.
Do you know why that is? I assume there is some reason for the server to not always send the global one, and right now this PR assumes there is no reason and we can just use table spec of last column everywhere.
If there really is no reason for those per-column specs to exist, then that should be explained in the comment, and there should still be panicking checks in case this assumption turns out to be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let's mind the Chesterton’s Fence: don’t ever take down a fence until you know why it was put up.
I'll investigate those per-column table specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python driver makes the same assumption, i.e. uses the table spec of the first column for all columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a good question for #engineering channel on Slack. I'd like to know when will Scylla / Cassandra send global spec and when per-column spec, and if it's posibble for columns specs to differ. If it's not possible, then learning a bit of history would be beneficial here imo: was it possible in the past? What was the case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nyh This is what I've talked to you about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lorak-mmk I'm willing to merge this. This seems to be a worthy optimisation.
To sum up, at least the Python driver uses the first column table and keyspace names as the names for all columns, and no one ever complained about that.
Based on our research and scylladb/scylladb#17788 (comment), all columns are going to have the same keyspace and table names, so we can represent them only once.
As discussed, I'm going change the code so that it checks that those name are indeed the same and returns ParseError
in case this assumption is violated, this way avoiding quiet passes in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing to consider: how likely is it that in some future version Cassandra / Scylla adds some form of joins / multi table queries?
AFAIK Cassandra already added ACID transactions (using their ACCORD algorithm), it doesn't seem so improbable for them to add something that queries more than 1 table in the future.
As those structs you modify are public, supporting this would require a breaking change.
Do you think we could use Cow / Arc or something else to make this future-proof? That way we could have global table spec in ResultMetadata
, but also have per-column spec if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding joins to CQL would be such a giant change in ScyllaDB - wide-column DBs aren't made for joins - that I strongly doubt it will ever happen.
Arc
s and Cow
s have considerable overhead that I don't deem worth incurring here just due to an extremely unprobable scenario.
Every column in a Response::Result comes from the same table (and it follows that from the same keyspace as well). It's thus redundant to store a copy of TableSpec (table and keyspace names) in each ColumnSpec (name and type of a column), which was done in the code before. Correctness of the above assumption is checked: if any two columns of the same Result differ in keyspace or table names, ParseError is returned. This commit moves TableSpec out of ColumnSpec and only allocates it once per query response, effectively saving `2 * (C-1)` string allocations, where `C` denotes the number of columns returned in the response. TableSpec is now stored in ResultMetadata and PreparedMetadata. A getter for table_spec in RowIterator is provided for users and for existing tests.
Before the previous commit, table spec was available in column specs. As it's no longer held there, a public field is added for users to still be able to retrieve this information from QueryResult.
5dfc972
to
c37eb55
Compare
v2:
|
Motivation
Every column in a
Response::Result
comes from the same table (and it follows that from the same keyspace as well). It's thus redundant to store a copy ofTableSpec
(table and keyspace names) in eachColumnSpec
(name and type of a column), which was done before.What's done
This PR moves
TableSpec
out ofColumnSpec
and only allocatesTableSpec
once per each query response, effectively saving2 * (C-1)
string allocations, whereC
denotes the number of columns returned in the response.TableSpec
is now stored inResultMetadata
andPreparedMetadata.
As table spec is no longer available in column specs, a public field in
QueryResult
is added for users to stillbe able to retrieve this information from
QueryResult
. Keep in mind that this is a temporary measure, becauseQueryResult
in the current form will be deprecated soon as part of the upcoming deserialization refactor (#462).Notes to reviewers
Please pay special attention to how user's experience changes after this API change. Don't they lose access to some information?
Pre-review checklist
[ ] I added relevant tests for new features and bug fixes.[ ] I have adjusted the documentation in./docs/source/
.<[ ] I added appropriateFixes:
annotations to PR description.