go-ethereum/docs/_dapp/tracing.md

217 lines
9.9 KiB
Markdown

---
title: EVM Tracing
---
There are two different types of transactions in Ethereum: plain value transfers and
contract executions. A plain value transfer just moves Ether from one account to another
and as such is uninteresting from this guide's perspective. If however the recipient of a
transaction is a contract account with associated EVM (Ethereum Virtual Machine)
bytecode - beside transferring any Ether - the code will also be executed as part of the
transaction.
Having code associated with Ethereum accounts permits transactions to do arbitrarilly
complex data storage and enables them to act on the previously stored data by further
transacting internally with outside accounts and contracts. This creates an intertwined
ecosystem of contracts, where a single transaction can interact with tens or hunderds of
accounts.
The downside of contract execution is that it is very hard to say what a transaction
actually did. A transaction receipt does contain a status code to check whether execution
succeeded or not, but there's no way to see what data was modified, nor what external
contracts where invoked. In order to introspect a transaction, we need to trace its
execution.
## Tracing prerequisites
In its simplest form, tracing a transaction entails requesting the Ethereum node to
reexecute the desired transaction with varying degrees of data collection and have it
return the aggregated summary for post processing. Reexecuting a transaction however has a
few prerequisites to be met.
In order for an Ethereum node to reexecute a transaction, it needs to have available all
historical state accessed by the transaction:
* Balance, nonce, bytecode and storage of both the recipient as well as all internally invoked contracts.
* Block metadata referenced during execution of both the outer as well as all internally created transactions.
* Intermediate state generated by all preceding transactions contained in the same block as the one being traced.
Depending on your node's mode of synchronization and pruning, different configurations
result in different capabilities:
* An **archive** node retaining **all historical data** can trace arbitrary transactions
at any point in time. Tracing a single transaction also entails reexecuting all
preceding transactions in the same block.
* A **fast synced** node retaining **all historical data** after initial sync can only
trace transactions from blocks following the initial sync point. Tracing a single
transaction also entails reexecuting all preceding transactions in the same block.
* A **fast synced** node retaining only **periodic state data** after initial sync can
only trace transactions from blocks following the initial sync point. Tracing a single
transaction entails reexecuting all preceding transactions **both** in the same block,
as well as all preceding blocks until the previous stored snapshot.
* A **light synced** node retrieving data **on demand** can in theory trace transactions
for which all required historical state is readily available in the network. In
practice, data availability is **not** a feasible assumption.
*There are exceptions to the above rules when running batch traces of entire blocks or
chain segments. Those will be detailed later.*
## Basic traces
The simplest type of transaction trace that `go-ethereum` can generate are raw EVM opcode
traces. For every VM instruction the transaction executes, a structured log entry is
emitted, containing all contextual metadata deemed useful. This includes the *program
counter*, *opcode name*, *opcode cost*, *remaining gas*, *execution depth* and any
*occurred error*. The structured logs can optionally also contain the content of the
*execution stack*, *execution memory* and *contract storage*.
An example log entry for a single opcode looks like:
```json
{
"pc": 48,
"op": "DIV",
"gasCost": 5,
"gas": 64532,
"depth": 1,
"error": null,
"stack": [
"00000000000000000000000000000000000000000000000000000000ffffffff",
"0000000100000000000000000000000000000000000000000000000000000000",
"2df07fbaabbe40e3244445af30759352e348ec8bebd4dd75467a9f29ec55d98d"
],
"memory": [
"0000000000000000000000000000000000000000000000000000000000000000",
"0000000000000000000000000000000000000000000000000000000000000000",
"0000000000000000000000000000000000000000000000000000000000000060"
],
"storage": {
}
}
```
The entire output of an raw EVM opcode trace is a JSON object having a few metadata
fields: *consumed gas*, *failure status*, *return value*; and a list of *opcode entries*
that take the above form:
```json
{
"gas": 25523,
"failed": false,
"returnValue": "",
"structLogs": []
}
```
### Generating basic traces
To generate a raw EVM opcode trace, `go-ethereum` provides a few [RPC API
endpoints](debug-api), out of which the most commonly used is
[`debug_traceTransaction`](trace-tx).
In its simplest form, `traceTransaction` accepts a transaction hash as its sole argument,
traces the transaction, aggregates all the generated data and returns it as a **large**
JSON object. A sample invocation from the Geth console would be:
```js
debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f")
```
The same call can of course be invoked from outside the node too via HTTP RPC. In this
case, please make sure the HTTP endpoint is enabled via `--rpc` and the `debug` API
namespace exposed via `--rpcapi=debug`.
```
$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f"]}' localhost:8545
```
Running the above operation on the Rinkeby network (with a node retaining enough history)
will result in this [trace dump](rinkeby-example-trace-big).
### Tuning basic traces
By default the raw opcode tracer emits all relevant events that occur within the EVM while
processing a transaction, such as *EVM stack*, *EVM memory* and *updated storage slots*.
Certain use cases however may not need some of these data fields reported. To cater for
those use cases, these massive fields may be omitted using a second *options* parameter
for the tracer:
```json
{
"disableStack": true,
"disableMemory": true,
"disableStorage": true
}
```
Running the previous tracer invocation from the Geth console with the data fields
disabled:
```js
debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {disableStack: true, disableMemory: true, disableStorage: true})
```
Analogously running the filtered tracer from outside the node too via HTTP RPC:
```
$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {"disableStack": true, "disableMemory": true, "disableStorage": true}]}' localhost:8545
```
Running the above operation on the Rinkeby network will result in this significantly
shorter [trace dump](rinkeby-example-trace).
### Limits of basic traces
Although the raw opcode traces we've generated above have their use, this basic way of
tracing is problematic in the real world. Having an individual log entry for every single
opcode is too low level for most use cases, and will require developers to create
additional tools to post-process the traces. Additionally, a full opcode trace can easily
go into the hundreds of megabytes, making them very resource intensive to get out of the
node and process externally.
To avoid all of the previously mentioned issues, `go-ethereum` supports running custom
JavaScript tracers *within* the Ethereum node, which have full access to the EVM stack,
memory and contract storage. This permits developers to only gather the data they need,
and do any processing **at** the data. Please see the next section for our *custom in-node
tracers*.
### Pruning
Geth by default does in-memory pruning of state, discarding state entries that it deems is
no longer necessary to maintain. This is configured via the `--gcmode` option. Often,
people run into the error that state is not available.
Say you want to do a trace on block `B`. Now there are a couple of cases:
1. You have done a fast-sync, pivot block `P` where `P <= B`.
2. You have done a fast-sync, pivot block `P` where `P > B`.
3. You have done a full-sync, with pruning
4. You have done a full-sync, without pruning (`--gcmode=archive`)
Here's what happens in each respective case:
1. Geth will regenerate the desired state by replaying blocks from the closest point ina
time before `B` where it has full state. This defaults to `128` blocks max, but you can
specify more in the actual call `... "reexec":1000 .. }` to the tracer.
2. Sorry, can't be done without replaying from genesis.
3. Same as 1)
4. Does not need to replay anything, can immediately load up the state and serve the request.
There is one other option available to you, which may or may not suit your needs. That is
to use [Evmlab](evmlab).
docker pull holiman/evmlab && docker run -it holiman/evmlab
There you can use the reproducer. The reproducer will incrementally fetch data from infura
until it has all the information required to create the trace locally on an evm which is
bundled with the image. It will create a custom genesis containing the state that the
transaction touches (balances, code, nonce etc). It should be mentioned that the evmlab
reproducer is strictly guaranteed to be totally exact with regards to gascosts incurred by
the outer transaction, as evmlab does not fully calculate the gascosts for nonzero data
etc, but is usually sufficient to analyze contracts and events.
[evmlab]: https://github.com/holiman/evmlab
[rinkeby-example-trace]: https://gist.github.com/karalabe/d74a7cb33a70f2af75e7824fc772c5b4
[rinkeby-example-trace-big]: https://gist.github.com/karalabe/c91f95ac57f5e57f8b950ec65ecc697f
[debug-api]: ../rpc/ns-debug
[trace-tx]: ../rpc/ns-debug#debug_tracetransaction