New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pebble compaction causes intermittent, but significant performance impacts #29575
Comments
Could you share some logs from Geth when this happens? |
I usually run with |
Seems to have magically stopped on its own, haven't seen the characteristic 5-6 block pattern for a while now. Will keep an eye out and reopen if it returns. |
Did some more digging, seems the issue is caused by pebble's database compaction. Not sure how/why it gets triggered, but the result is ~1 min of slow block processing. Some recent examples where I've timed
Found the culprit by profiling during the "bad" times, and confirmed that compaction is indeed being triggered by adding logging here: go-ethereum/ethdb/pebble/pebble.go Line 97 in 44a50c9
Saw some earlier optimizations around compaction (#20130), not sure if there's anything more that can be done to smooth these out as well. |
System information
Geth version:
v1.13.14
CL client & version:
teku@24.1.3
OS & Version: Linux
Expected behaviour
Geth receives/processes blocks in a timely manner.
Actual behaviour
I run a number of geth/teku nodes and have recently noticed an infrequent (order of daily) pattern occurring across all of them where geth receives a burst of 6ish blocks around the same time, the oldest one, of course, being 72s stale.
Of course it could just be a temporary network issue, but I keep seeing this same number of blocks across multiple machines in multiple locations.
Could also be a teku issue, but seems a bit unlikely given the logs below.
Steps to reproduce the behaviour
I've added some custom logging in
forkchoiceUpdated()
:go-ethereum/eth/catalyst/api.go
Line 313 in 823719b
With this code in place, yesterday I got the output:
and corresponding Teku logs:
I'm interpreting this as something is infrequently hanging geth for ~1 min.
Some other recent incidents occurred at block 19678501 and 19675309. The signature is always a 6ish block pileup on geth and a late block message on Teku. These late block messages were a bit different from the above though:
and
I was running an older (and less verbose) version of Teku at the time, Lucas Saldanha from the Teku Discord told me that both of these blocks were late because of blob data unavailability.
The text was updated successfully, but these errors were encountered: