About The HertzFix

Background

BSC-Erigon nodes failed to sync on 2023/11/27, they all got stuck on block 33851236, keep printing the below logs(refer: issue-254):

[WARN] [11-27|06:27:04.533] [downloader] Rejected header marked as bad hash=0x022296e50021d7225b75f3873e7bc5a2bf6376a08079b4368f9dee81946d623b height=33851236
[WARN] [11-27|06:27:04.597] [downloader] Rejected header marked as bad hash=0x022296e50021d7225b75f3873e7bc5a2bf6376a08079b4368f9dee81946d623b height=33851236
[WARN] [11-27|06:27:04.847] [downloader] Rejected header marked as bad hash=0x022296e50021d7225b75f3873e7bc5a2bf6376a08079b4368f9dee81946d623b height=33851236
[WARN] [11-27|06:27:04.899] [downloader] Rejected header marked as bad hash=0x022296e50021d7225b75f3873e7bc5a2bf6376a08079b4368f9dee81946d623b height=33851236
[WARN] [11-27|06:27:04.926] [downloader] Rejected header marked as bad hash=0x022296e50021d7225b75f3873e7bc5a2bf6376a08079b4368f9dee81946d623b height=33851236
[WARN] [11-27|06:27:05.358] [downloader] Rejected header marked as bad hash=0x022296e50021d7225b75f3873e7bc5a2bf6376a08079b4368f9dee81946d623b height=33851236

After some investigation, it turns out that it is a bug of the BSC-Geth client, BSC-Erigon’s execution result is indeed correct.

The RootCause

The bug was introduced by a feature called SharedOriginStorage, which was merged on 2022/03/28 and released with v1.1.9 on 2022/04/12. The BSC-Geth client has been running with this potential bug for more than one and half years before it finally triggered the bug.
SharedOriginStorage was introduced to improve performance by prefetching the IO state and storing it in StateDB, so IO access could be more efficient. The StatePrefetcher module would try to load the IO slots into the sharedOriginStorage structure, in most cases it works. But there is a corner case, that is if a contract performed these operation in one block: 1.Prefetch IO -> 2.Suicided -> 3.Revived(Create2) -> 4.Accessed Prefetched IO, then the issue would occur, since the revived contract would load the prefetched state which is no longer valid as it has been suicided already.

Impact To BSC Users

It only impacts user of this contract: Arb Bot:0x00000000001f8b68515efb546542397d3293ccfd, if you do not interact with this contract then there is no impact to you. And as the contract keeps running suicide/create2 loop, its storage will reset time by time, it suicided again at block 33851765, that is only 529 blocks after the issue occur. So likely most Arb Bot users are ok, only some of the user that interacted with it between Nov-27-2023 06:17:41 AM +UTC and Nov-27-2023 06:44:08 AM +UTC could be affected. Especially these 2 transactions:
https://bscscan.com/tx/0x7eba4edc7c1806d6ee1691d43513838931de5c94f9da56ec865721b402f775b0
https://bscscan.com/tx/0x5217324f0711af744fe8e12d73f13fdb11805c8e29c0c095ac747b7e4563e935

How Is It Fixed

We introduced a hard fork fix called HertzFix in release v1.3.4, the fix is simple, just disable SharedOriginStorage. The PR is https://github.com/bnb-chain/bsc/pull/2025.
The HertzFix hard fork has already been enabled on mainnet since block 34140700, which is Dec-07-2023 08:11:33 AM +UTC.
And for BSC-Erigon, there is a hot fix patch, it has no choice but to hard code the bad state into database in order to keep consistent with the BSC Mainnet, see: https://github.com/node-real/bsc-erigon/pull/261

Conclusion

Client diversity is helpful to increase the chain’s security, but it needs lots of extra resource. But at least 3 clients are needed to make sure none of them would surpass 50%.
Most of the blockchain projects may not have enough resource to maintain several clients, then simplify the protocol and code is also very important.

2 Likes