Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GetUpdateSince() to crash test #12646

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented May 11, 2024

Context/Summary: as titled

Test: CI [ongoing]

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 requested review from ajkr, jowlyzhang and cbi42 May 13, 2024 19:48

last_seqno = res.sequence;
iter->Next();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can check status after !iter->Valid()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

iter->Next();

while (iter->Valid()) {
if (!iter->status().ok()) {
Copy link
Member

@cbi42 cbi42 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetUpdateSince() seems to expect WAL to contain updates with consecutive sequence numbers. It can fail the stress test here when there is a concurrent file ingestion: #10007. There may be other incompatibilities like writes with disable_wal can cause a hole in sequence numbers in WAL too (https://groups.google.com/g/rocksdb/c/W1Axka8CVf8).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - need to look into more of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and launching more tests to check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found more incompatibility run - need to fix them

if dest_params.get("get_update_since_one_in") != 0:
# Set very high values to avoid WAL cleanup during `GetUpdatesSince()`
dest_params["WAL_ttl_seconds"] = 0xFFFFFFFFFFFFFFFF
dest_params["WAL_size_limit_MB"] = 0xFFFFFFFFFFFFFFFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still set some limit to avoid running out of space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuning down the parameter will make the GetUpdateSinceOneIn fail in more ways. This is more difficult than I anticipate will take a look

Copy link
Contributor Author

@hx235 hx235 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - it turns out that we don't have to keep such a high value. If WAL was deleted, the returned iterator will just be ok() but not valid.

@@ -118,6 +118,7 @@
"optimize_filters_for_memory": lambda: random.randint(0, 1),
"partition_filters": lambda: random.randint(0, 1),
"partition_pinning": lambda: random.randint(0, 3),
"get_update_since_one_in": lambda: random.choice([0, 0, 0, 1000]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to sanity check that disable_wal is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@hx235
Copy link
Contributor Author

hx235 commented May 17, 2024

~~Might need to temporarily pause on this as the space vs functionality trade-off is harder to solve than expected. ~~ Got some new ideas from @ajkr so can continue now

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants