docs: add page on sync modes (#25634)
* add sync-modes page * update sync-modes page * replace frontmatter * update les link and line lengths * Apply suggestions from code review * update post-merge syncing description * clarify snap sync sequence * fix typo * Apply suggestions from code review Co-authored-by: Martin Holst Swende <martin@swende.se> Co-authored-by: Martin Holst Swende <martin@swende.se>
This commit is contained in:
parent
d1b4c115c8
commit
fe26ea58fe
|
@ -0,0 +1,133 @@
|
|||
---
|
||||
title: Sync-modes
|
||||
sort-key: L
|
||||
---
|
||||
|
||||
Syncing is the process by which Geth catches up to the latest Ethereum block and current global state.
|
||||
There are several ways to sync a Geth node that differ in their speed, storage requirements and trust
|
||||
assumptions. This page outlines three sync configurations for full nodes and one for light nodes.
|
||||
|
||||
## Full nodes
|
||||
|
||||
There are two types of full node that use different mechanisms to sync up to the head of the chain:
|
||||
|
||||
### Snap (default)
|
||||
|
||||
A snap sync'd node holds the most recent 128 block states in memory, so transactions in that range are always quickly
|
||||
accessible. However, snap-sync only starts processing from a relatively recent block (as opposed to genesis
|
||||
for a full node). Between the initial sync block and the 128 most recent blocks, the node stores occasional
|
||||
checkpoints that can be used to rebuild the state on-the-fly. This means transactions can be traced back as
|
||||
far as the block that was used for the initial sync. Tracing a single transaction requires reexecuting all
|
||||
preceding transactions in the same block **and** all preceding blocks until the previous stored snapshot.
|
||||
Snap-sync'd nodes are therefore full nodes, with the only difference being the initial synchronization required
|
||||
a checkpoint block to sync from instead of independently verifying the chain all the way from genesis.
|
||||
Snap sync then only verifies the proof-of-work and ancestor-child block progression and assumes that the
|
||||
state transitions are correct rather than re-executing the transactions in each block to verify the state
|
||||
changes. Snap sync is much faster than block-by-block sync. To start a node with snap sync pass `--syncmode snap` at
|
||||
startup.
|
||||
|
||||
Snap sync starts by downloading the headers for a chunk of blocks. Once the headers have been verified, the block
|
||||
bodies and receipts for those blocks are downloaded. In parallel, Geth also sync begins state-sync. In state-sync, Geth first downloads the
|
||||
leaves of the state trie for each block without the intermediate nodes along with a range proof. The state trie is
|
||||
then regenerated locally. The state download is the part of the snap-sync that takes the most time to complete
|
||||
and the progress can be monitored using the ETA values in the log messages. However, the blockchain is also
|
||||
progressing at the same time and invalidating some of the regenerated state data. This means it is also necessary
|
||||
to have a 'healing' phase where errors in the state are fixed. It is not possible to monitor the progress of
|
||||
the state heal because the extent of the errors cannot be known until the current state has already been regenerated.
|
||||
The healing has to outpace the growth of the blockchain, otherwise the node will never catch up to the current state.
|
||||
There are some hardware factors that determine the speed of the state healing (speed of disk read/write and internet
|
||||
connection) and also the total gas used in each block (more gas means more changes to the state that have to be
|
||||
handled).
|
||||
|
||||
To summarize, snap sync progresses in the following sequence:
|
||||
- download and verify headers
|
||||
- download block bodies and receipts.In parallel, download raw state data and build state trie
|
||||
- heal state trie to account for newly arriving data
|
||||
|
||||
**Note** Snap sync is the default behaviour, so if the `--syncmode` value is not passed to Geth at startup,
|
||||
Geth will use snap sync. A node that is started using `snap` will switch to block-by-block sync once it has
|
||||
caught up to the head of the chain.
|
||||
|
||||
### Full
|
||||
|
||||
A full sync generates the current state by executing every block starting from the genesis block. A full sync
|
||||
indendently verifies proof-of-work and block provenance as well as all state transitions by re-executing the
|
||||
transactions in the entire historical sequence of blocks. Only the most recent 128 block states are stored in a full
|
||||
node - older block states are pruned periodically and represented as a series of checkpoints from which any previous
|
||||
state can be regenerated on request. 128 blocks is about 25.6 minutes of history with a block time of 12 seconds.
|
||||
To create a full node pass `--syncmode full` at startup.
|
||||
|
||||
## Archive nodes
|
||||
|
||||
An archive node is a node that retains all historical data right back to genesis. There is no need to regenerate
|
||||
any data from checkpoints because all data is directly available in the node's own storage. Archive nodes are
|
||||
therefore ideal for making fast queries about historical states. At the time of writing (September 2022) a full
|
||||
archive node that stores all data since genesis occupies nearly 12 TB of disk space (keep up with the current
|
||||
size on [Etherscan](https://etherscan.io/chartsync/chainarchive)). Archive nodes are created by configuring Geth's
|
||||
garbage collection so that old data is never deleted: `geth --syncmode full --gcmode archive`.
|
||||
|
||||
It is also possible to create a partial/recent archive node where the node was synced using `snap` but the state
|
||||
is never pruned. This creates an archive node that saves all state data from the point that the node first syncs.
|
||||
This is configured by starting Geth with `--syncmode snap --gcmode archive`.
|
||||
|
||||
## Light nodes
|
||||
|
||||
A light node syncs very quickly and stores the bare minimum of blockchain data. Light nodes only process block
|
||||
headers, not entire blocks. This greatly reduces the computation time, storage and bandwidth required relative to a
|
||||
full node. This means light nodes are suitable for resource-constrained devices and can catch up to the head of the
|
||||
chain much faster when they are new or have been offline for a while. The trade-off is that light nodes rely heavily
|
||||
on data served by altruistic full nodes. A light client can be used to query data from Ethereum and submit transactions,
|
||||
acting as a locally-hosted Ethereum wallet. However, because they don't keep local copies of the Ethereum state, light
|
||||
nodes can't validate blocks in the same way as full nodes - they receive a proof from the full node and verify it against their local header chain.
|
||||
To start a node in light mode, pass `--syncmode light`. Be aware that full nodes serving light data are relative scarce
|
||||
so light nodes can struggle to find peers.
|
||||
|
||||
Read more about light nodes on our [LES page](/docs/interface/les.md).
|
||||
|
||||
## Consensus layer syncing
|
||||
|
||||
Now that Ethereum has switched to proof-of-stake, all consensus logic and block propagation is handled by consensus clients.
|
||||
This means that syncing the blockchain is a process shared between the consensus and execution clients. Blocks are
|
||||
downloaded by the consensus client and verified by the execution client. In order for Geth to sync, it requires a header from
|
||||
its connected consensus client. Geth does not import any data until it is instructed to by the consensus client.
|
||||
|
||||
Once a header is available to use as a syncing target, Geth retrieves all headers between that target header and the
|
||||
local header chain in reverse chronological order. These headers show that the sequence of blocks is correct because
|
||||
the parenthashes link one block to the next right up to the target block. Eventually, the sync will reach a block held
|
||||
in the local database, at which point the local data and the target data are considered 'linked' and there is a very high
|
||||
chance the node is syncing the correct chain. The block bodies are then downloaded and then the state data. The consensus
|
||||
client can update the target header - as long as the syncing outpaces the growth of the blockchain then the node will eventually
|
||||
get in sync.
|
||||
|
||||
There are two ways for the consensus client to find a block header that Geth can use as a sync target: optimistic syncing and
|
||||
checkpoint syncing:
|
||||
|
||||
### Optimistic sync
|
||||
|
||||
Optimistic sync downloads blocks before the execution client has validated them. In optimistic sync the node assumes
|
||||
the data it receives from its peers is correct during the downloading phase but then retroactively verifies each
|
||||
downloaded block. Nodes are not allowed to attest or propose blocks while they are still 'optimistic' because they
|
||||
can't yet guarantee their view of the head of the chain is correct.
|
||||
|
||||
Read more in the [optimistic sync specs](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md).
|
||||
|
||||
### Checkpoint sync
|
||||
|
||||
Alternatively, the consensus client can grab a checkpoint from a trusted source which provides a target state to sync
|
||||
up to, before switching to full sync and verifying each block in turn. In this mode, the node trusts that the checkpoint
|
||||
is correct. There are many possible sources for this checkpoint - the gold standard would be to get it out-of-band
|
||||
from another trusted friend, but it could also come from block explorers or public APIs/web apps.
|
||||
|
||||
**Note** it is not currently possible to use a Geth light node as an execution client on proof-of-stake Ethereum.
|
||||
|
||||
## Summary
|
||||
|
||||
There are several ways to sync a Geth node. The default is to use snap sync to create a full node. This verifies all
|
||||
blocks using some recent block that is old enough to be safe from re-orgs as a sync target. A trust-minimized alternative
|
||||
is full-sync, which verifies every block since genesis. These modes drop state data older than 128 blocks, keeping only
|
||||
checkpoints that enable on-request regeneration of historical states. For rapid queries of historical data an archive node
|
||||
is required. Archive nodes keep local copies of all historical data right back to genesis - currently about 12 TB and growing.
|
||||
The opposite extreme is a light node that doesn't store any blockchain data - it requests everything from full nodes.
|
||||
These configurations are controlled by passing `full`, `snap` or `light` to `--syncmode` at startup. For an archive node,
|
||||
`--syncmode` should be `full` and `--gcmode` should be set to `archive`. Currently, due to the transition to proof-of-stake,
|
||||
light-sync dot not work (new light client protocols are being developed).
|
Loading…
Reference in New Issue