-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/rawdb: freezer index repair #29792
base: master
Are you sure you want to change the base?
core/rawdb: freezer index repair #29792
Conversation
a6ed90b
to
1b10ff8
Compare
9be55a0
to
660e460
Compare
} | ||
// ensure two consecutive index items are in order | ||
if err := t.checkIndexItems(prev, entry); err != nil { | ||
return truncate(offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should log the error here. Maybe pass the error into truncate
to log it as part of the warning it prints.
Performance wise, it takes ~0.5 second to verify ~20m index items, namely it will introduce ~2.5s delay in geth launching, but i think it's acceptable.
|
63ffdbb
to
8381d61
Compare
} | ||
log.Warn("Truncated index file", "offset", offset, "truncated", size-offset) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions could just be methods on freezerTable
This pull request removes the
fsync
of index files in freezer.ModifyAncients function forperformance gain.
Originally, fsync is added after each freezer write operation to ensure the written data is truly
transferred into disk. Unfortunately, it turns out
fsync
could be relatively slow, especially onmacOS. see #28754 for more information.
In this pull request, fsync for index file is removed as it turns out index file can be recovered
even after a unclean shutdown. But fsync for data file is still kept, as we have no meaningful
way to validate the data correctness after unclean shutdown.
But why do we need the
fsync
in the first place?As it's necessary for freezer to survive/recover after the machine crash (e.g. power failure).
In linux, whenever the file write is performed, the file metadata update and data update are
not necessarily performed at the same time. Typically, the metadata will be flushed/journalled
ahead of the file data. Therefore, we make the pessimistic assumption that the file is first
extended with invalid "garbage" data (normally zero bytes) and that afterwards the correct
data replaces the garbage.
We have observed that the index file of the freezer often contain garbage entry with zero value
(filenumber = 0, offset = 0) after a machine power failure. It proves that the index file is extended
without the data being flushed. And this corruption can destroy the whole freezer data eventually.
Performing fsync after each write operation can reduce the time window for data to be transferred
to the disk and ensure the correctness of the data in the disk to the greatest extent.
How can we maintain this guarantee without relying on fsync?
Because the items in the index file are strictly in order, we can leverage this characteristic to
detect the corruption and truncate them when freezer is opened. Specifically these validation
rules are performed for each index file:
For two consecutive index items:
And also, for the first non-head item, it must refer to the earliest data file, or the next file if the
earliest file is not sufficient to place the first item(very special case, only theoretical possible
in tests)
With these validation rules, we can detect the invalid item in index file with greatest possibility.
But unfortunately, these scenarios are not covered and could still lead to a freezer corruption if it occurs:
All items in index file are in zero value
It's impossible to distinguish if they are truly zero (e.g. all the data entries maintained in freezer
are zero size) or just the garbage left by OS. In this case, these index items will be kept by truncating
the entire data file, namely the freezer is corrupted.
However, we can consider that the probability of this situation occurring is quite low, and even
if it occurs, the freezer can be considered to be close to an empty state. Rerun the state sync
should be acceptable.
Index file is integral while relative data file is corrupted
It might be possible the data file is corrupted whose file size is extended correctly with garbage
filled (e.g. zero bytes). In this case, it's impossible to detect the corruption by index validation.
We can either choose to
fsync
the data file, or blindly believe that if index file is integral thenthe data file could be integral with very high chance. In this pull request, the first option is taken.