Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDT schema change causes incorrect CQL queries in CQL-first GraphQL API #2723

Open
marksurnin opened this issue Aug 10, 2023 · 0 comments
Open

Comments

@marksurnin
Copy link
Contributor

marksurnin commented Aug 10, 2023

Summary

We've run into an edge case with UDTs in Stargate: after a schema change it started sending CQL queries with extra columns to Cassandra. This caused elevated latency and a spike in IOPS in Cassandra, which we're investigating separately. Ideally the CQL query generated by Stargate would have stayed the same.

Here are steps to reproduce this and a potential solution. The gist is that QueryFetcher.java may need to use selectedField.getQualifiedName() instead of selectedField.getName() when building CQL query columns.

Steps to reproduce

Create a CQL schema:

cqlsh> create type udt_test.my_udt (field_a int, field_b int);
cqlsh> create table udt_test.my_table (id int PRIMARY KEY, metadata my_udt);

Run nodetool setraceprobability 1.0

And execute this GraphQL query

query {
  my_table {
    values {
      id,
      metadata {
        field_a
        field_b
      }
    }
  }
}

We see SELECT id, metadata FROM udt_test.my_table in system_traces.sessions table, as expected.

There was a schema change, which was done to "flatten" the UDT by adding table columns with the same names as UDT's columns.

cqlsh> ALTER TABLE udt_test.my_table ADD field_a int, field_b int;
cqlsh> desc udt_test.my_table;
CREATE TABLE udt_test.my_table (
    id int PRIMARY KEY,
    field_a int,
    field_b int,
    metadata my_udt
);

After applying the schema and executing the same GraphQL query we see SELECT id, metadata, field_a, field_b FROM udt_test.my_table in system_traces.sessions table. This seems incorrect as the GraphQL query was unchanged. We don't yet know why that impacted query latency.

Root cause

Debugging this led us to this snippet in GraphQL API's QueryFetcher.java:

String column = dbColumnGetter.getDBColumnName(table, selectedField.getName());
if (column != null) {
    queryColumns.add(Column.reference(column));
}

In case selectedField is a regular column such as id, selectedField.getName() returns id, dbColumnGetter.getDBColumnName(table, selectedField.getName()); returns the column name id and adds it to the list of query columns – this is fine.

If selectedField is a UDT field like field_a that belongs to the metadata UDT, invoking selectedField.getName() returns just field_a. This is the problem as the subsequent call to dbColumnGetter.getDBColumnName(table, selectedField.getName()); returns the newly added table-level column that's unrelated to the UDT column in the query.

A potential solution is to use selectedField.getQualifiedName() instead as that returns metadata/field_a. In turn, dbColumnGetter.getDBColumnName(table, "metadata/field_a") returns null and it's not added to the query columns.

@sync-by-unito sync-by-unito bot closed this as completed Feb 16, 2024
@sync-by-unito sync-by-unito bot reopened this Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant