This is a follow-up PR to #29792 to get rid of the data file sync.
**This is a non-backward compatible change, which increments the
database version from 8 to 9**.
We introduce a flushOffset for each freezer table, which tracks the position
of the most recently fsync’d item in the index file. When this offset moves
forward, it indicates that all index entries below it, along with their corresponding
data items, have been properly persisted to disk. The offset can also be moved
backward when truncating from either the head or tail of the file.
Previously, the data file required an explicit fsync after every mutation, which
was highly inefficient. With the introduction of the flush offset, the synchronization
strategy becomes more flexible, allowing the freezer to sync every 30 seconds
instead.
The data items above the flush offset are regarded volatile and callers must ensure
they are recoverable after the unclean shutdown, or explicitly sync the freezer
before any proceeding operations.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
The total difficulty is the sum of all block difficulties from genesis
to a certain block. This value was used in PoW for deciding which chain
is heavier, and thus which chain to select. Since PoS has a different
fork selection algorithm, all blocks since the merge have a difficulty
of 0, and all total difficulties are the same for the past 2 years.
Whilst the TDs are mostly useless nowadays, there was never really a
reason to mess around removing them since they are so tiny. This
reasoning changes when we go down the path of pruned chain history. In
order to reconstruct any TD, we **must** retrieve all the headers from
chain head to genesis and then iterate all the difficulties to compute
the TD.
In a world where we completely prune past chain segments (bodies,
receipts, headers), it is not possible to reconstruct the TD at all. In
a world where we still keep chain headers and prune only the rest,
reconstructing it possible as long as we process (or download) the chain
forward from genesis, but trying to snap sync the head first and
backfill later hits the same issue, the TD becomes impossible to
calculate until genesis is backfilled.
All in all, the TD is a messy out-of-state, out-of-consensus computed
field that is overall useless nowadays, but code relying on it forces
the client into certain modes of operation and prevents other modes or
other optimizations. This PR completely nukes out the TD from the node.
It doesn't compute it, it doesn't operate on it, it's as if it didn't
even exist.
Caveats:
- Whenever we have APIs that return TD (devp2p handshake, tracer, etc.)
we return a TD of 0.
- For era files, we recompute the TD during export time (fairly quick)
to retain the format content.
- It is not possible to "verify" the merge point (i.e. with TD gone, TTD
is useless). Since we're not verifying PoW any more, just blindly trust
it, not verifying but blindly trusting the many year old merge point
seems just the same trust model.
- Our tests still need to be able to generate pre and post merge blocks,
so they need a new way to split the merge without TTD. The PR introduces
a settable ttdBlock field on the consensus object which is used by tests
as the block where originally the TTD happened. This is not needed for
live nodes, we never want to generate old blocks.
- One merge transition consensus test was disabled. With a
non-operational TD, testing how the client reacts to TTD is useless, it
cannot react.
Questions:
- Should we also drop total terminal difficulty from the genesis json?
It's a number we cannot react on any more, so maybe it would be cleaner
to get rid of even more concepts.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This PR fixes some issues with benchmarks
- [x] Removes log output from a log-test
- [x] Avoids a `nil`-defer in `triedb/pathdb`
- [x] Fixes some crashes re tracers
- [x] Refactors a very resource-expensive benchmark for blobpol.
**NOTE**: this rewrite touches live production code (a little bit), as
it makes the validator-function used by the blobpool configurable.
- [x] Switch some benches over to use pebble over leveldb
- [x] reduce mem overhead in the setup-phase of some tests
- [x] Marks some tests with a long setup-phase to be skipped if `-short`
is specified (where long is on the order of tens of seconds). Ideally,
in my opinion, one should be able to run with `-benchtime 10ms -short`
and sanity-check all tests very quickly.
- [x] Drops some metrics-bechmark which times the speed of `copy`.
---------
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
Changelog: https://golangci-lint.run/product/changelog/#1610
Removes `exportloopref` (no longer needed), replaces it with
`copyloopvar` which is basically the opposite.
Also adds:
- `durationcheck`
- `gocheckcompilerdirectives`
- `reassign`
- `mirror`
- `tenv`
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
* all: refactor so NewBlock(..) and WithBody(..) take a types.Body
* core: fixup comments, remove txs != receipts panic
* core/types: add empty withdrawls to body if len == 0
This change removes a chainconfig parameter passed into rawdb.ReadLogs, which is not used nor needed.
It also modifies the filter loop slightly, avoiding a labeled break and instead using a method.
This change does not modify any behaviour.
* all: implement path-based state scheme
* all: edits from review
* core/rawdb, trie/triedb/pathdb: review changes
* core, light, trie, eth, tests: reimplement pbss history
* core, trie/triedb/pathdb: track block number in state history
* trie/triedb/pathdb: add history documentation
* core, trie/triedb/pathdb: address comments from Peter's review
Important changes to list:
- Cache trie nodes by path in clean cache
- Remove root->id mappings when history is truncated
* trie/triedb/pathdb: fallback to disk if unexpect node in clean cache
* core/rawdb: fix tests
* trie/triedb/pathdb: rename metrics, change clean cache key
* trie/triedb: manage the clean cache inside of disk layer
* trie/triedb/pathdb: move journal function
* trie/triedb/path: fix tests
* trie/triedb/pathdb: fix journal
* trie/triedb/pathdb: fix history
* trie/triedb/pathdb: try to fix tests on windows
* core, trie: address comments
* trie/triedb/pathdb: fix test issues
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
The EmptyRootHash and EmptyCodeHash are defined everywhere in the codebase, this PR replaces all of them with unified one defined in core/types package, and also defines constants for TxRoot, WithdrawalsRoot and UncleRoot
Logs stored on disk have minimal information. Contextual information such as block
number, index of log in block, index of transaction in block are filled in upon request.
We can fill in all these fields only having the block header and list of receipts.
But determining the transaction hash of a log requires the block body.
The goal of this PR is postponing this retrieval until we are sure we the transaction hash.
It happens often that the header bloom filter signals there might be matches in a block,
but after actually checking them reveals the logs do not match. We want to avoid fetching
the body in this case.
Note that this changes the semantics of Backend.GetLogs. Downstream callers of
GetLogs now assume log context fields have not been derived, and need to call
DeriveFields on the logs if necessary.
This commit replaces ioutil.TempDir with t.TempDir in tests. The
directory created by t.TempDir is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using ioutil.TempDir
had to be removed manually by calling os.RemoveAll, which is omitted in
some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but t.TempDir handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This PR reduces the amount of work we do when answering header queries, e.g. when a peer
is syncing from us.
For some items, e.g block bodies, when we read the rlp-data from database, we plug it
directly into the response package. We didn't do that for headers, but instead read
headers-rlp, decode to types.Header, and re-encode to rlp. This PR changes that to keep it
in RLP-form as much as possible. When a node is syncing from us, it typically requests 192
contiguous headers. On master it has the following effect:
- For headers not in ancient: 2 db lookups. One for translating hash->number (even though
the request is by number), and another for reading by hash (this latter one is sometimes
cached).
- For headers in ancient: 1 file lookup/syscall for translating hash->number (even though
the request is by number), and another for reading the header itself. After this, it
also performes a hashing of the header, to ensure that the hash is what it expected. In
this PR, I instead move the logic for "give me a sequence of blocks" into the lower
layers, where the database can determine how and what to read from leveldb and/or
ancients.
There are basically four types of requests; three of them are improved this way. The
fourth, by hash going backwards, is more tricky to optimize. However, since we know that
the gap is 0, we can look up by the parentHash, and stlil shave off all the number->hash
lookups.
The gapped collection can be optimized similarly, as a follow-up, at least in three out of
four cases.
Co-authored-by: Felix Lange <fjl@twurst.com>
* core/types: rm extranous check in test
* core/rawdb: add lightweight types for block logs
* core/rawdb,eth: use lightweight accessor for log filtering
* core/rawdb: add bench for decoding into rlpLogs
This change is a rewrite of the freezer code.
When writing ancient chain data to the freezer, the previous version first encoded each
individual item to a temporary buffer, then wrote the buffer. For small item sizes (for
example, in the block hash freezer table), this strategy causes a lot of system calls for
writing tiny chunks of data. It also allocated a lot of temporary []byte buffers.
In the new version, we instead encode multiple items into a re-useable batch buffer, which
is then written to the file all at once. This avoids performing a system call for every
inserted item.
To make the internal batching work, the ancient database API had to be changed. While
integrating this new API in BlockChain.InsertReceiptChain, additional optimizations were
also added there.
Co-authored-by: Felix Lange <fjl@twurst.com>
This change introduces garbage collection for the light client. Historical
chain data is deleted periodically. If you want to disable the GC, use
the --light.nopruning flag.