-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GetUpdateSince() to crash test #12646
base: main
Are you sure you want to change the base?
Conversation
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
||
last_seqno = res.sequence; | ||
iter->Next(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can check status after !iter->Valid()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
iter->Next(); | ||
|
||
while (iter->Valid()) { | ||
if (!iter->status().ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetUpdateSince() seems to expect WAL to contain updates with consecutive sequence numbers. It can fail the stress test here when there is a concurrent file ingestion: #10007. There may be other incompatibilities like writes with disable_wal can cause a hole in sequence numbers in WAL too (https://groups.google.com/g/rocksdb/c/W1Axka8CVf8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - need to look into more of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and launching more tests to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found more incompatibility run - need to fix them
tools/db_crashtest.py
Outdated
if dest_params.get("get_update_since_one_in") != 0: | ||
# Set very high values to avoid WAL cleanup during `GetUpdatesSince()` | ||
dest_params["WAL_ttl_seconds"] = 0xFFFFFFFFFFFFFFFF | ||
dest_params["WAL_size_limit_MB"] = 0xFFFFFFFFFFFFFFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still set some limit to avoid running out of space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuning down the parameter will make the GetUpdateSinceOneIn fail in more ways. This is more difficult than I anticipate will take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - it turns out that we don't have to keep such a high value. If WAL was deleted, the returned iterator will just be ok() but not valid.
tools/db_crashtest.py
Outdated
@@ -118,6 +118,7 @@ | |||
"optimize_filters_for_memory": lambda: random.randint(0, 1), | |||
"partition_filters": lambda: random.randint(0, 1), | |||
"partition_pinning": lambda: random.randint(0, 3), | |||
"get_update_since_one_in": lambda: random.choice([0, 0, 0, 1000]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May need to sanity check that disable_wal is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
~~Might need to temporarily pause on this as the space vs functionality trade-off is harder to solve than expected. ~~ Got some new ideas from @ajkr so can continue now |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
Context/Summary: as titled
Test: CI [ongoing]