core/rawdb: freezer index repair #29792

rjl493456442 · 2024-05-16T06:46:47Z

This pull request removes the fsync of index files in freezer.ModifyAncients function for
performance gain.

Originally, fsync is added after each freezer write operation to ensure the written data is truly
transferred into disk. Unfortunately, it turns out fsync could be relatively slow, especially on
macOS. see #28754 for more information.

In this pull request, fsync for index file is removed as it turns out index file can be recovered
even after a unclean shutdown. But fsync for data file is still kept, as we have no meaningful
way to validate the data correctness after unclean shutdown.

But why do we need the fsync in the first place?

As it's necessary for freezer to survive/recover after the machine crash (e.g. power failure).
In linux, whenever the file write is performed, the file metadata update and data update are
not necessarily performed at the same time. Typically, the metadata will be flushed/journalled
ahead of the file data. Therefore, we make the pessimistic assumption that the file is first
extended with invalid "garbage" data (normally zero bytes) and that afterwards the correct
data replaces the garbage.

We have observed that the index file of the freezer often contain garbage entry with zero value
(filenumber = 0, offset = 0) after a machine power failure. It proves that the index file is extended
without the data being flushed. And this corruption can destroy the whole freezer data eventually.

Performing fsync after each write operation can reduce the time window for data to be transferred
to the disk and ensure the correctness of the data in the disk to the greatest extent.

How can we maintain this guarantee without relying on fsync?

Because the items in the index file are strictly in order, we can leverage this characteristic to
detect the corruption and truncate them when freezer is opened. Specifically these validation
rules are performed for each index file:

For two consecutive index items:

If their file numbers are the same, then the offset of the latter one MUST not be less than that of the former.
If the file number of the latter one is equal to that of the former plus one, then the offset of the latter one MUST not be 0.
If their file numbers are not equal, and the latter's file number is not equal to the former plus 1, the latter one is valid

And also, for the first non-head item, it must refer to the earliest data file, or the next file if the
earliest file is not sufficient to place the first item(very special case, only theoretical possible
in tests)

With these validation rules, we can detect the invalid item in index file with greatest possibility.

But unfortunately, these scenarios are not covered and could still lead to a freezer corruption if it occurs:

All items in index file are in zero value

It's impossible to distinguish if they are truly zero (e.g. all the data entries maintained in freezer
are zero size) or just the garbage left by OS. In this case, these index items will be kept by truncating
the entire data file, namely the freezer is corrupted.

However, we can consider that the probability of this situation occurring is quite low, and even
if it occurs, the freezer can be considered to be close to an empty state. Rerun the state sync
should be acceptable.

Index file is integral while relative data file is corrupted

It might be possible the data file is corrupted whose file size is extended correctly with garbage
filled (e.g. zero bytes). In this case, it's impossible to detect the corruption by index validation.

We can either choose to fsync the data file, or blindly believe that if index file is integral then
the data file could be integral with very high chance. In this pull request, the first option is taken.

core/rawdb/freezer_batch.go

core/rawdb/freezer_table.go

core/rawdb/freezer_batch.go

core/rawdb/freezer_table.go

fjl · 2024-05-21T12:07:13Z

core/rawdb/freezer_table.go

+		}
+		// ensure two consecutive index items are in order
+		if err := t.checkIndexItems(prev, entry); err != nil {
+			return truncate(offset)


I think we should log the error here. Maybe pass the error into truncate to log it as part of the warning it prints.

rjl493456442 · 2024-05-22T06:36:22Z

Performance wise, it takes ~0.5 second to verify ~20m index items, namely it will introduce ~2.5s delay in geth launching, but i think it's acceptable.

INFO [05-22|06:35:09.596] Verified index file                      items=19,803,925 elapsed=529.361ms
INFO [05-22|06:35:10.173] Verified index file                      items=19,803,925 elapsed=576.231ms
INFO [05-22|06:35:10.653] Verified index file                      items=19,803,925 elapsed=480.599ms
INFO [05-22|06:35:11.172] Verified index file                      items=19,803,925 elapsed=516.957ms
INFO [05-22|06:35:11.621] Verified index file                      items=19,803,925 elapsed=447.692ms

rjl493456442 · 2024-05-24T02:01:25Z

After benchmarking the pull request for a while (full sync for more than 24 hours), this branch is slightly faster than master. Specifically:

The average time on state history construction is reduced from 2.32ms to 1.57ms, by getting rid of the index file fsync.

This performance difference is negligible on the benchmark machine with power ssd. However, for normal users, it can make a big difference as fsync on an ordinary SSD does take time. For instance, Datafile fsync takes ~18ms per block, namely we can save 18ms per block by getting rid of indexfile fsync

fjl · 2024-05-28T13:06:07Z

core/rawdb/freezer_table.go

+			}
+			log.Warn("Truncated index file", "offset", offset, "truncated", size-offset)
+			return nil
+		}


These functions could just be methods on freezerTable

rjl493456442 force-pushed the freezer-index-validation branch 4 times, most recently from a6ed90b to 1b10ff8 Compare May 20, 2024 07:52

holiman reviewed May 20, 2024

View reviewed changes

core/rawdb/freezer_batch.go Outdated Show resolved Hide resolved

core/rawdb/freezer_table.go Outdated Show resolved Hide resolved

core/rawdb/freezer_batch.go Outdated Show resolved Hide resolved

core/rawdb/freezer_table.go Outdated Show resolved Hide resolved

rjl493456442 force-pushed the freezer-index-validation branch 3 times, most recently from 9be55a0 to 660e460 Compare May 21, 2024 05:48

rjl493456442 marked this pull request as ready for review May 21, 2024 06:43

rjl493456442 requested a review from karalabe as a code owner May 21, 2024 06:43

fjl reviewed May 21, 2024

View reviewed changes

rjl493456442 added 5 commits May 28, 2024 20:59

core/rawdb: introduce freezer index repair mechanism

b92db0b

core/rawdb, triedb/pathdb: various fixes

c8b8876

core/rawdb: fsync data file

1bc011c

core/rawdb: fix read error

92ea3f7

core/rawdb: log the error message

8381d61

rjl493456442 force-pushed the freezer-index-validation branch from 63ffdbb to 8381d61 Compare May 28, 2024 12:59

fjl reviewed May 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/rawdb: freezer index repair #29792

core/rawdb: freezer index repair #29792

rjl493456442 commented May 16, 2024 •

edited

fjl May 21, 2024 •

edited

rjl493456442 commented May 22, 2024

rjl493456442 commented May 24, 2024

fjl May 28, 2024

core/rawdb: freezer index repair #29792

Are you sure you want to change the base?

core/rawdb: freezer index repair #29792

Conversation

rjl493456442 commented May 16, 2024 • edited

fjl May 21, 2024 • edited

Choose a reason for hiding this comment

rjl493456442 commented May 22, 2024

rjl493456442 commented May 24, 2024

fjl May 28, 2024

Choose a reason for hiding this comment

rjl493456442 commented May 16, 2024 •

edited

fjl May 21, 2024 •

edited