Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client: Debug Beacon Sync Forward-Filling Execution Performance #3290

Open
holgerd77 opened this issue Feb 21, 2024 · 2 comments
Open

Client: Debug Beacon Sync Forward-Filling Execution Performance #3290

holgerd77 opened this issue Feb 21, 2024 · 2 comments

Comments

@holgerd77
Copy link
Member

So I continued the sync on Holesky with my EthereumJS/Lighthouse setup from #3289, apart from this one failure this is going on reasonably stable.

Sync is now running for roughly 19 hours.

grafik

Backfilling was done after ~4 hours. The rest of the time is spent on forward-execution, execution is now at block 619,786 (from a total of ~1 Mio). That seems substantially better than what @jochem-brouwer was reporting, I would assume that's likely a mixture of Lighthouse being faster on serving the data from the CL side + my laptop being faster on execution with the Apple M1 chip.

Nevertheless: compared to our forward-sync execution on mainnet things are going extremely slow. Here is an extract of forward-execution logging:

grafik

This is roughly 100x (!!!) or so slower than block execution on mainnet in the range of block 1Mio (+/- 200.000 block) where blocks are already actively used and some state has build up.

To be fair: to some extend this is also comparing apples with pears. I have no real data at hand on state size comparison. Also the Holesky blocks in this range are periodically filled (so every 5-10 blocks or so), e.g. this one https://holesky.etherscan.io/block/619912, likely with a fuzzer.

Nevertheless: I think we never really had some broader look on how post-Merge execution behaves and if things are generally somewhat optimally set up and my assumption is that there is somewhat larger room for improvement and that it's very much worth to spend some dedicated time here and have some deeper look.

That can go from some basic checks if systems are generally properly working in this post-Merge setup? Are e.g. the StateManager caches properly working in this context? Ad hoc I am not so sure actually. Things like that.

Also worth to have some deeper logging what parts are taking what amount of time to get a better picture if performance is really mostly spend along EVM execution or if there are additional parts (the EVM always waiting too long, ...) playing a substantial role here where things might be able to improved along.

@holgerd77
Copy link
Member Author

Ok, noting some observations here.

I am looking mainly at the "full" blocks with something around 800-1300 txs.

Times for these all look a bit similar to this one:

grafik

Time distributes roughly as follows:

Whole block: 1s
Preparatory stuff (New state root, DAO HF, checkpoints, block validation): 300ms
Tx execution: 500-800ms with most in Initialization/Overhead
Tear down stuff (Withdrawals, Rewards, EVM journal commit): 150ms

Generally all three parts somewhat high numbers, and likely worth to have a dedicated look into each one and then see where "time is lost".

@holgerd77
Copy link
Member Author

holgerd77 commented Feb 22, 2024

So for the preparatory phase:

Time here is lost (aka: consumed) more or less completely in the await block.validateData() call in VM.runBlock() -> applyBlock() (so e.g. 396.08ms from 396.08ms for the entire preparatory phase).


Update: here is some more data tearing appart the times in Block.validateData():

txs: 308.374ms
isSigned: 0.166ms
txTrie: 93.523ms
uncleH: 0.012ms
withdrawalsTrie: 0.552ms

So tx validation and tx trie validation taking up most of the time (also scratching my head about the - while lower - pretty impressive tx trie validation time in particular 🤔).


Ok, this "mystery" is solved: this is more or less all signature verification, taking about 200ms (without WASM: 1.5s) for each tx:

grafik

So every speedup that we have here would have a huge impact on block processing times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant