233 lines
12 KiB
Markdown
233 lines
12 KiB
Markdown
|
---
|
||
|
title: EVM Tracing
|
||
|
sort_key: A
|
||
|
---
|
||
|
|
||
|
There are two different types of [transactions][transactions]
|
||
|
in Ethereum: simple value transfers and contract executions. A value transfer just
|
||
|
moves Ether from one account to another. If however the recipient of a transaction is
|
||
|
a contract account with associated [EVM][evm] (Ethereum Virtual Machine) bytecode - beside
|
||
|
transferring any Ether - the code will also be executed as part of the transaction.
|
||
|
|
||
|
Having code associated with Ethereum accounts permits transactions to do arbitrarily
|
||
|
complex data storage and enables them to act on the previously stored data by further
|
||
|
transacting internally with outside accounts and contracts. This creates an interlinked
|
||
|
ecosystem of contracts, where a single transaction can interact with tens or hundreds of
|
||
|
accounts.
|
||
|
|
||
|
The downside of contract execution is that it is very hard to say what a transaction
|
||
|
actually did. A transaction receipt does contain a status code to check whether execution
|
||
|
succeeded or not, but there is no way to see what data was modified, nor what external
|
||
|
contracts where invoked. Geth resolves this by re-running transactions locally and collecting
|
||
|
data about precisely what was executed by the EVM. This is known as "tracing" the transaction.
|
||
|
|
||
|
|
||
|
* TOC
|
||
|
{:toc}
|
||
|
|
||
|
|
||
|
## Tracing prerequisites
|
||
|
|
||
|
In its simplest form, tracing a transaction entails requesting the Ethereum node to
|
||
|
reexecute the desired transaction with varying degrees of data collection and have it
|
||
|
return the aggregated summary for post processing. Reexecuting a transaction however has a
|
||
|
few prerequisites to be met.
|
||
|
|
||
|
In order for an Ethereum node to reexecute a transaction, all historical state accessed
|
||
|
by the transaction must be available. This includes:
|
||
|
|
||
|
* Balance, nonce, bytecode and storage of both the recipient as well as all internally invoked contracts.
|
||
|
* Block metadata referenced during execution of both the outer as well as all internally created transactions.
|
||
|
* Intermediate state generated by all preceding transactions contained in the same block as the one being traced.
|
||
|
|
||
|
This means there are limits on the transactions that can be traced imposed by the synchronization and
|
||
|
pruning configuration of a node.
|
||
|
|
||
|
* An **archive** node retains **all historical data** back to genesis. It can therefore
|
||
|
trace arbitrary transactions at any point in the history of the chain. Tracing a single
|
||
|
transaction requires reexecuting all preceding transactions in the same block.
|
||
|
|
||
|
* A **full synced** node retains the most recent 128 blocks in memory, so transactions in
|
||
|
that range are always accessible. Full nodes also store occasional checkpoints back to genesis
|
||
|
that can be used to rebuild the state at any point on-the-fly. This means older transactions
|
||
|
can be traced but if there is a large distance between the requested transaction and the most
|
||
|
recent checkpoint rebuilding the state can take a long time. Tracing a single
|
||
|
transaction requires reexecuting all preceding transactions in the same block
|
||
|
**and** all preceding blocks until the previous stored snapshot.
|
||
|
|
||
|
* A **snap synced** node holds the most recent 128 blocks in memory, so transactions in that
|
||
|
range are always accessible. However, snap-sync only starts processing from a relatively recent
|
||
|
block (as opposed to genesis for a full node). Between the initial sync block and the 128 most
|
||
|
recent blocks, the node stores occasional checkpoints that can be used to rebuild the state on-the-fly.
|
||
|
This means transactions can be traced back as far as the block that was used for the initial sync.
|
||
|
Tracing a single transaction requires reexecuting all preceding transactions in the same block,
|
||
|
**and** all preceding blocks until the previous stored snapshot.
|
||
|
|
||
|
* A **light synced** node retrieving data **on demand** can in theory trace transactions
|
||
|
for which all required historical state is readily available in the network. This is because the data
|
||
|
required to generate the trace is requested from an les-serving full node. In practice, data
|
||
|
availability **cannot** be reasonably assumed.
|
||
|
|
||
|
*There are exceptions to the above rules when running batch traces of entire blocks or
|
||
|
chain segments. Those will be detailed later.*
|
||
|
|
||
|
## Basic traces
|
||
|
|
||
|
The simplest type of transaction trace that Geth can generate are raw EVM opcode
|
||
|
traces. For every VM instruction the transaction executes, a structured log entry is
|
||
|
emitted, containing all contextual metadata deemed useful. This includes the *program
|
||
|
counter*, *opcode name*, *opcode cost*, *remaining gas*, *execution depth* and any
|
||
|
*occurred error*. The structured logs can optionally also contain the content of the
|
||
|
*execution stack*, *execution memory* and *contract storage*.
|
||
|
|
||
|
The entire output of a raw EVM opcode trace is a JSON object having a few metadata
|
||
|
fields: *consumed gas*, *failure status*, *return value*; and a list of *opcode entries*:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"gas": 25523,
|
||
|
"failed": false,
|
||
|
"returnValue": "",
|
||
|
"structLogs": []
|
||
|
}
|
||
|
```
|
||
|
|
||
|
An example log for a single opcode entry has the following format:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"pc": 48,
|
||
|
"op": "DIV",
|
||
|
"gasCost": 5,
|
||
|
"gas": 64532,
|
||
|
"depth": 1,
|
||
|
"error": null,
|
||
|
"stack": [
|
||
|
"00000000000000000000000000000000000000000000000000000000ffffffff",
|
||
|
"0000000100000000000000000000000000000000000000000000000000000000",
|
||
|
"2df07fbaabbe40e3244445af30759352e348ec8bebd4dd75467a9f29ec55d98d"
|
||
|
],
|
||
|
"memory": [
|
||
|
"0000000000000000000000000000000000000000000000000000000000000000",
|
||
|
"0000000000000000000000000000000000000000000000000000000000000000",
|
||
|
"0000000000000000000000000000000000000000000000000000000000000060"
|
||
|
],
|
||
|
"storage": {
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Generating basic traces
|
||
|
|
||
|
To generate a raw EVM opcode trace, Geth provides a few [RPC API endpoints](/docs/rpc/ns-debug).
|
||
|
The most commonly used is [`debug_traceTransaction`](/docs/rpc/ns-debug#debug_tracetransaction).
|
||
|
|
||
|
In its simplest form, `traceTransaction` accepts a transaction hash as its only argument. It then
|
||
|
traces the transaction, aggregates all the generated data and returns it as a **large**
|
||
|
JSON object. A sample invocation from the Geth console would be:
|
||
|
|
||
|
```js
|
||
|
debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f")
|
||
|
```
|
||
|
|
||
|
The same call can also be invoked from outside the node too via HTTP RPC (e.g. using Curl). In this
|
||
|
case, the HTTP endpoint must be enabled in Geth using the `--http` command and the `debug` API
|
||
|
namespace must be exposed using `--http.api=debug`.
|
||
|
|
||
|
```
|
||
|
$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f"]}' localhost:8545
|
||
|
```
|
||
|
|
||
|
To follow along with this tutorial, transaction hashes can be found from a local Geth node (e.g. by
|
||
|
attaching a [Javascript console](/docs/interface/javascript-console) and running `eth.getBlock('latest')`
|
||
|
then passing a transaction hash from the returned block to `debug.traceTransaction()`) or from a block
|
||
|
explorer (for [Mainnet](https://etherscan.io/) or a [testnet](https://goerli.etherscan.io/)).
|
||
|
|
||
|
It is also possible to configure the trace by passing Boolean (true/false) values for four parameters
|
||
|
that tweak the verbosity of the trace. By default, the *EVM memory* and *Return data* are not reported
|
||
|
but the *EVM stack* and *EVM storage* are. To report the maximum amount of data:
|
||
|
|
||
|
```shell
|
||
|
enableMemory: true
|
||
|
disableStack: false
|
||
|
disableStorage: false
|
||
|
enableReturnData: true
|
||
|
```
|
||
|
|
||
|
An example call, made in the Geth Javascript console, configured to report the maximum amount of data
|
||
|
looks as follows:
|
||
|
|
||
|
```js
|
||
|
debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f",{enableMemory: true, disableStack: false, disableStorage: false, enableReturnData: true})
|
||
|
```
|
||
|
|
||
|
Running the above operation on the Rinkeby network (with a node retaining enough history)
|
||
|
will result in this [trace dump](https://gist.github.com/karalabe/c91f95ac57f5e57f8b950ec65ecc697f).
|
||
|
|
||
|
Alternatively, disabling *EVM Stack*, *EVM Memory*, *Storage* and *Return data* (as demonstrated in the Curl request below)
|
||
|
results in the following, much shorter, [trace dump](https://gist.github.com/karalabe/d74a7cb33a70f2af75e7824fc772c5b4).
|
||
|
|
||
|
```
|
||
|
$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {"disableStack": true, "disableStorage": true}]}' localhost:8545
|
||
|
```
|
||
|
|
||
|
### Limits of basic traces
|
||
|
|
||
|
Although the raw opcode traces generated above are useful, having an individual log entry for every single
|
||
|
opcode is too low level for most use cases, and will require developers to create additional tools to
|
||
|
post-process the traces. Additionally, a full opcode trace can easily go into the hundreds of
|
||
|
megabytes, making them very resource intensive to get out of the node and process externally.
|
||
|
|
||
|
To avoid those issues, Geth supports running custom JavaScript tracers *within* the Ethereum node,
|
||
|
which have full access to the EVM stack, memory and contract storage. This means developers only have to
|
||
|
gather the data they actually need, and do any processing at the source.
|
||
|
|
||
|
## Pruning
|
||
|
|
||
|
Geth does in-memory state-pruning by default, discarding state entries that it deems
|
||
|
no longer necessary to maintain. This is configured via the `--gcmode` command. An error
|
||
|
message alerting the user that the necessary state is not available is common in EVM tracing on
|
||
|
anything other than an archive node.
|
||
|
|
||
|
```sh
|
||
|
Error: required historical state unavailable (reexec=128)
|
||
|
at web3.js:6365:37(47)
|
||
|
at send (web3,js:5099:62(35))
|
||
|
at <eval>:1:23(13)
|
||
|
```
|
||
|
|
||
|
The pruning behaviour, and consequently the state availability and tracing capability of
|
||
|
a node depends on its sync and pruning configuration. The 'oldest' block after which
|
||
|
state is immediately available, and before which state is not immediately available,
|
||
|
is known as the "pivot block". There are then several possible cases for a trace request
|
||
|
on a Geth node.
|
||
|
|
||
|
For tracing a transaction in block `B` where the pivot block is `P` can regenerate the desired
|
||
|
state by replaying blocks from the last :
|
||
|
|
||
|
1. a fast-sync'd node can regenerate the desired state by replaying blocks from the most recent
|
||
|
checkpoint between `P` and `B` as long as `P` < `B`. If `P` > `B` there is no available checkpoint
|
||
|
and the state cannot be regenerated without replying the chain from genesis.
|
||
|
|
||
|
2. a fully sync'd node can regenerate the desired state by replaying blocks from the last available
|
||
|
full state before `B`. A fully sync'd node re-executes all blocks from genesis, so checkpoints are available
|
||
|
across the entire history of the chain. However, database pruning discards older data, moving `P` to a more
|
||
|
recent position in the chain. If `P` > `B` there is no available checkpoint and the state cannot be
|
||
|
regenerated without replaying the chain from genesis.
|
||
|
|
||
|
3. A fully-sync'd node without pruning (i.e. an archive node configured with `--gcmode=archive`)
|
||
|
does not need to replay anything, it can immediately load up any state and serve the request for any `B`.
|
||
|
|
||
|
The time taken to regenerate a specific state increases with the distance between `P` and `B`. If the distance
|
||
|
between `P` and `B` is large, the regeneration time can be substantial.
|
||
|
|
||
|
## Summary
|
||
|
|
||
|
This page covered the concept of EVM tracing and how to generate traces with the default opcode-based tracers using RPC.
|
||
|
More advanced usage is possible, including using other built-in tracers as well as writing [custom tracing](/docs/dapp/custom-tracer) code in Javascript
|
||
|
and Go. The API as well as the JS tracing hooks are defined in [the reference](/docs/rpc/ns-debug#debug_traceTransaction).
|
||
|
|
||
|
|
||
|
[transactions]: https://ethereum.org/en/developers/docs/transactions
|
||
|
[evm]: https://ethereum.org/en/developers/docs/evm
|