Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the write timestamp in atomic GraphQL mutations #2893

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

marksurnin
Copy link
Contributor

What this PR does:
Set timestamp and now_in_seconds in batch parameters only if they have been passed via query parameters. com.google.protobuf.Int64Value values in query.proto are serialized to 0 if they are unset, which causes the bug described below.

Which issue(s) this PR fixes:
Fixes #2875

As described in this comment, Stargate currently sets the write timestamp of all @atomic batch mutations to Unix time 0.

Consequently, the TTL is calculated as (Unix time 0 + TTL). Therefore, even when using the maximum TTL of 20 years, Cassandra interprets these rows as having already expired and they cannot be retrieved.

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated – N/A?
  • CLA Signed: DataStax CLA

@marksurnin marksurnin requested a review from a team as a code owner February 12, 2024 11:44

// Batch timestamp and nowInSeconds parameters should be unset to avoid using them in
// BatchHandler's `makeParameters`
assertFalse(batch.getParameters().hasTimestamp());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more idiomatic to use assertThat(batch.getParameters().hasTimestamp()).isFalse() here, given that we're already using assertJ

@kathirsvn
Copy link
Contributor

Thanks for the PR @marksurnin.
The changes LGTM!
Can we have this PR created against the main branch please? since this fix should be applied on both v2 (main) and v2.1. We can pull this change to v2.1 once it's merged with the main.

@tatu-at-datastax
Copy link
Collaborator

+1 for PR against main (2.x) so that we can get this deployed to all of our production systems (there's a split)

@marksurnin marksurnin changed the base branch from v2.1 to main February 12, 2024 19:41
@kathirsvn
Copy link
Contributor

@marksurnin may be you will have to rebase your fork or create new PR? Right now, its showing a lot of diff (around 644 files as diff)

@marksurnin marksurnin force-pushed the u/marksurnin/fix_atomic_batch_write_timestamp branch from a0c81df to ccf73c0 Compare February 13, 2024 17:24
@marksurnin
Copy link
Contributor Author

@marksurnin may be you will have to rebase your fork or create new PR? Right now, its showing a lot of diff (around 644 files as diff)

Yep, changing the base created a large diff. There was a large number of merge conflicts, so I ended up resetting my branch with git reset --hard upstream/main and cherry-picking the 2 commits.

Copy link
Contributor

@kathirsvn kathirsvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marksurnin
Copy link
Contributor Author

@kathirsvn the coordinator build is failing to download dependencies from DataStax's internal Maven repo. I don't build persistence-dse-next or persistence-dse-6.8 locally so unsure how to proceed.

Error: Failed to execute goal on project persistence-dse-6.8: Could not resolve dependencies for project io.stargate.db.dse:persistence-dse-6.8:jar:2.0.25-SNAPSHOT: The following artifacts could not be resolved: io.netty:netty-codec:jar:4.1.100.1.dse, io.netty:netty-common:jar:4.1.100.1.dse, io.netty:netty-buffer:jar:4.1.100.1.dse, io.netty:netty-transport:jar:4.1.100.1.dse, io.netty:netty-resolver:jar:4.1.100.1.dse: Could not transfer artifact io.netty:netty-codec:jar:4.1.100.1.dse from/to artifactory-snapshots (https://repo.datastax.com/datastax-snapshots-local): status code: 401, reason phrase: (401) -> [Help 1]

@tatu-at-datastax
Copy link
Collaborator

@marksurnin This is likely due to access problems: DSE jar and some of its dependencies (like netty with backported fixes) are only available from DataStax internal Maven repo. I was hoping we could trigger re-run and that'd pass but that does not seem to be the case (I tried and it appears to fail still).

@kathirsvn
Copy link
Contributor

Yes, I just triggered another re-run and it failed. Let me follow up on this.

@kathirsvn
Copy link
Contributor

As per suggestion from @jeffreyscarpenter, I tried recreating this PR here and the build step goes through fine now. So, it looks like some access issue (i.e. my account seems to have read access to the datastax internal repo but @marksurnin account doesn't).

Now, we are getting below IT error
https://github.com/stargate/stargate/actions/runs/7907256239/job/21584210196?pr=2895

Results:
[INFO] 
Error:  Failures: 
Error:    AtomicBatchIntegrationTest.successfulBatchWithTTL:318 
expected: null
 but was: [{"extensions"={"classification"="DataFetchingException"}, "locations"=[{"column"=3, "line"=2}], "message"="Exception while fetching data (/foo) : Column 'v_writetime' does not exist", "path"=["foo"]}]

Probably, we will have to read the writetime and ttl as in getWriteTime method in io.stargate.sgv2.graphql.integration.graphqlfirst.AtomicIntegrationTest

@marksurnin
Copy link
Contributor Author

As per suggestion from @jeffreyscarpenter, I tried recreating this PR here and the build step goes through fine now. So, it looks like some access issue (i.e. my account seems to have read access to the datastax internal repo but @marksurnin account doesn't).

@kathirsvn could you please try pushing an empty commit to my branch? Perhaps the maven repo authentication would work this way.

I see that the credentials are read from secrets here but I don't know where the account's organization is used to check access.

Now, we are getting below IT error https://github.com/stargate/stargate/actions/runs/7907256239/job/21584210196?pr=2895

Results:
[INFO] 
Error:  Failures: 
Error:    AtomicBatchIntegrationTest.successfulBatchWithTTL:318 
expected: null
 but was: [{"extensions"={"classification"="DataFetchingException"}, "locations"=[{"column"=3, "line"=2}], "message"="Exception while fetching data (/foo) : Column 'v_writetime' does not exist", "path"=["foo"]}]

Probably, we will have to read the writetime and ttl as in getWriteTime method in io.stargate.sgv2.graphql.integration.graphqlfirst.AtomicIntegrationTest

Secondly, I can update the tests to read writetime and TTL via CQL but it's concerning that it's failing with Column 'v_writetime' does not exist. These itests pass locally. The field that AtomicBatchIntegrationTest is retrieving is

v_writetime: _bigint_function(name:"writetime", args: ["v"])

which is a standard use-case of an aggregation function. v_writetime a GraphQL alias, not a column. Do you know if the integration test image lacks aggregation function support or if something could be off with aliases?

@tatu-at-datastax
Copy link
Collaborator

@marksurnin I think the issue is that forks by developers without push rights to original repo cannot access secrets; this to prevent potential security problems.
But I guess it'd be easy enough to see whether commit to PR by someone with access would let CI run access secrets and see.

@kathirsvn
Copy link
Contributor

@marksurnin I accepted the invitation to your fork and just pushed an empty commit to see if coordinator build CI job works fine now, let's see.

@kathirsvn
Copy link
Contributor

Regarding the alias v_writetime: _bigint_function(name:"writetime", args: ["v"]), I'll give it a try on my local and see where the gap is.

@kathirsvn
Copy link
Contributor

@marksurnin I accepted the invitation to your fork and just pushed an empty commit to see if coordinator build CI job works fine now, let's see.

Cool, it looks like the coordinator build job is good now. Let's check the aggregate function problem (i.e. v_writetime) next.

@kathirsvn
Copy link
Contributor

@marksurnin I accepted the invitation to your fork and just pushed an empty commit to see if coordinator build CI job works fine now, let's see.

Cool, it looks like the coordinator build job is good now. Let's check the aggregate function problem (i.e. v_writetime) next.

When I run the test AtomicBatchIntegrationTest.successfulBatchWithTTL on my local, it fails with below error

Expected :null
Actual   :[{"extensions"={"classification"="DataFetchingException"}, "locations"=[{"column"=3, "line"=2}], "message"="Exception while fetching data (/foo) : Column 'v_writetime' does not exist", "path"=["foo"]}]

@kathirsvn
Copy link
Contributor

kathirsvn commented Feb 20, 2024

@marksurnin I get this error when I use any function other than what is in the supported aggregation functions list.

apis/sgv2-graphqlapi/src/main/java/io/stargate/sgv2/graphql/schema/cqlfirst/dml/fetchers/aggregations/SupportedAggregationFunction.java

For example below query works:

query getAll {
  foo {
    values {
      k
      cc
      v
      v_count: _bigint_function(name:"count", args: ["v"])
    }
  }
}

Note: I don't remember tryingwritetime, ttl functions this way in the past too. Probably, we will have to change the way we fetch the writetime and ttl in the tests as done in getWriteTime method in io.stargate.sgv2.graphql.integration.graphqlfirst.AtomicIntegrationTest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GraphQL mutations with @atomic directive and TTL are not persisted
4 participants