2019-11-05 06:46:00 -06:00
|
|
|
---
|
|
|
|
title: EVM Tracing
|
|
|
|
---
|
|
|
|
|
|
|
|
There are two different types of transactions in Ethereum: plain value transfers and
|
|
|
|
contract executions. A plain value transfer just moves Ether from one account to another
|
|
|
|
and as such is uninteresting from this guide's perspective. If however the recipient of a
|
|
|
|
transaction is a contract account with associated EVM (Ethereum Virtual Machine)
|
|
|
|
bytecode - beside transferring any Ether - the code will also be executed as part of the
|
|
|
|
transaction.
|
|
|
|
|
|
|
|
Having code associated with Ethereum accounts permits transactions to do arbitrarilly
|
|
|
|
complex data storage and enables them to act on the previously stored data by further
|
|
|
|
transacting internally with outside accounts and contracts. This creates an intertwined
|
|
|
|
ecosystem of contracts, where a single transaction can interact with tens or hunderds of
|
|
|
|
accounts.
|
|
|
|
|
|
|
|
The downside of contract execution is that it is very hard to say what a transaction
|
|
|
|
actually did. A transaction receipt does contain a status code to check whether execution
|
|
|
|
succeeded or not, but there's no way to see what data was modified, nor what external
|
|
|
|
contracts where invoked. In order to introspect a transaction, we need to trace its
|
|
|
|
execution.
|
|
|
|
|
|
|
|
## Tracing prerequisites
|
|
|
|
|
|
|
|
In its simplest form, tracing a transaction entails requesting the Ethereum node to
|
|
|
|
reexecute the desired transaction with varying degrees of data collection and have it
|
|
|
|
return the aggregated summary for post processing. Reexecuting a transaction however has a
|
|
|
|
few prerequisites to be met.
|
|
|
|
|
|
|
|
In order for an Ethereum node to reexecute a transaction, it needs to have available all
|
|
|
|
historical state accessed by the transaction:
|
|
|
|
|
|
|
|
* Balance, nonce, bytecode and storage of both the recipient as well as all internally invoked contracts.
|
|
|
|
* Block metadata referenced during execution of both the outer as well as all internally created transactions.
|
|
|
|
* Intermediate state generated by all preceding transactions contained in the same block as the one being traced.
|
|
|
|
|
|
|
|
Depending on your node's mode of synchronization and pruning, different configurations
|
|
|
|
result in different capabilities:
|
|
|
|
|
|
|
|
* An **archive** node retaining **all historical data** can trace arbitrary transactions
|
|
|
|
at any point in time. Tracing a single transaction also entails reexecuting all
|
|
|
|
preceding transactions in the same block.
|
2020-03-18 04:14:57 -05:00
|
|
|
* A **full synced** node retaining **all historical data** after initial sync can only
|
2019-11-05 06:46:00 -06:00
|
|
|
trace transactions from blocks following the initial sync point. Tracing a single
|
|
|
|
transaction also entails reexecuting all preceding transactions in the same block.
|
|
|
|
* A **fast synced** node retaining only **periodic state data** after initial sync can
|
|
|
|
only trace transactions from blocks following the initial sync point. Tracing a single
|
|
|
|
transaction entails reexecuting all preceding transactions **both** in the same block,
|
|
|
|
as well as all preceding blocks until the previous stored snapshot.
|
|
|
|
* A **light synced** node retrieving data **on demand** can in theory trace transactions
|
|
|
|
for which all required historical state is readily available in the network. In
|
|
|
|
practice, data availability is **not** a feasible assumption.
|
|
|
|
|
|
|
|
*There are exceptions to the above rules when running batch traces of entire blocks or
|
|
|
|
chain segments. Those will be detailed later.*
|
|
|
|
|
|
|
|
## Basic traces
|
|
|
|
|
|
|
|
The simplest type of transaction trace that `go-ethereum` can generate are raw EVM opcode
|
|
|
|
traces. For every VM instruction the transaction executes, a structured log entry is
|
|
|
|
emitted, containing all contextual metadata deemed useful. This includes the *program
|
|
|
|
counter*, *opcode name*, *opcode cost*, *remaining gas*, *execution depth* and any
|
|
|
|
*occurred error*. The structured logs can optionally also contain the content of the
|
|
|
|
*execution stack*, *execution memory* and *contract storage*.
|
|
|
|
|
|
|
|
An example log entry for a single opcode looks like:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"pc": 48,
|
|
|
|
"op": "DIV",
|
|
|
|
"gasCost": 5,
|
|
|
|
"gas": 64532,
|
|
|
|
"depth": 1,
|
|
|
|
"error": null,
|
|
|
|
"stack": [
|
|
|
|
"00000000000000000000000000000000000000000000000000000000ffffffff",
|
|
|
|
"0000000100000000000000000000000000000000000000000000000000000000",
|
|
|
|
"2df07fbaabbe40e3244445af30759352e348ec8bebd4dd75467a9f29ec55d98d"
|
|
|
|
],
|
|
|
|
"memory": [
|
|
|
|
"0000000000000000000000000000000000000000000000000000000000000000",
|
|
|
|
"0000000000000000000000000000000000000000000000000000000000000000",
|
|
|
|
"0000000000000000000000000000000000000000000000000000000000000060"
|
|
|
|
],
|
|
|
|
"storage": {
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
The entire output of an raw EVM opcode trace is a JSON object having a few metadata
|
|
|
|
fields: *consumed gas*, *failure status*, *return value*; and a list of *opcode entries*
|
|
|
|
that take the above form:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"gas": 25523,
|
|
|
|
"failed": false,
|
|
|
|
"returnValue": "",
|
|
|
|
"structLogs": []
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
### Generating basic traces
|
|
|
|
|
|
|
|
To generate a raw EVM opcode trace, `go-ethereum` provides a few [RPC API
|
2020-12-27 15:40:20 -06:00
|
|
|
endpoints](../rpc/ns-debug), out of which the most commonly used is
|
|
|
|
[`debug_traceTransaction`](../rpc/ns-debug#debug_tracetransaction).
|
2019-11-05 06:46:00 -06:00
|
|
|
|
|
|
|
In its simplest form, `traceTransaction` accepts a transaction hash as its sole argument,
|
|
|
|
traces the transaction, aggregates all the generated data and returns it as a **large**
|
|
|
|
JSON object. A sample invocation from the Geth console would be:
|
|
|
|
|
|
|
|
```js
|
|
|
|
debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f")
|
|
|
|
```
|
|
|
|
|
|
|
|
The same call can of course be invoked from outside the node too via HTTP RPC. In this
|
2021-02-16 03:38:52 -06:00
|
|
|
case, please make sure the HTTP endpoint is enabled via `--http` and the `debug` API
|
|
|
|
namespace exposed via `--http.api=debug`.
|
2019-11-05 06:46:00 -06:00
|
|
|
|
|
|
|
```
|
|
|
|
$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f"]}' localhost:8545
|
|
|
|
```
|
|
|
|
|
|
|
|
Running the above operation on the Rinkeby network (with a node retaining enough history)
|
2020-12-27 15:40:20 -06:00
|
|
|
will result in this [trace dump](https://gist.github.com/karalabe/c91f95ac57f5e57f8b950ec65ecc697f).
|
2019-11-05 06:46:00 -06:00
|
|
|
|
|
|
|
### Tuning basic traces
|
|
|
|
|
|
|
|
By default the raw opcode tracer emits all relevant events that occur within the EVM while
|
|
|
|
processing a transaction, such as *EVM stack*, *EVM memory* and *updated storage slots*.
|
|
|
|
Certain use cases however may not need some of these data fields reported. To cater for
|
|
|
|
those use cases, these massive fields may be omitted using a second *options* parameter
|
|
|
|
for the tracer:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"disableStack": true,
|
|
|
|
"disableMemory": true,
|
|
|
|
"disableStorage": true
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Running the previous tracer invocation from the Geth console with the data fields
|
|
|
|
disabled:
|
|
|
|
|
|
|
|
```js
|
|
|
|
debug.traceTransaction("0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {disableStack: true, disableMemory: true, disableStorage: true})
|
|
|
|
```
|
|
|
|
|
|
|
|
Analogously running the filtered tracer from outside the node too via HTTP RPC:
|
|
|
|
|
|
|
|
```
|
|
|
|
$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {"disableStack": true, "disableMemory": true, "disableStorage": true}]}' localhost:8545
|
|
|
|
```
|
|
|
|
|
|
|
|
Running the above operation on the Rinkeby network will result in this significantly
|
2020-12-27 15:40:20 -06:00
|
|
|
shorter [trace dump](https://gist.github.com/karalabe/d74a7cb33a70f2af75e7824fc772c5b4).
|
2019-11-05 06:46:00 -06:00
|
|
|
|
|
|
|
### Limits of basic traces
|
|
|
|
|
|
|
|
Although the raw opcode traces we've generated above have their use, this basic way of
|
|
|
|
tracing is problematic in the real world. Having an individual log entry for every single
|
|
|
|
opcode is too low level for most use cases, and will require developers to create
|
|
|
|
additional tools to post-process the traces. Additionally, a full opcode trace can easily
|
|
|
|
go into the hundreds of megabytes, making them very resource intensive to get out of the
|
|
|
|
node and process externally.
|
|
|
|
|
|
|
|
To avoid all of the previously mentioned issues, `go-ethereum` supports running custom
|
|
|
|
JavaScript tracers *within* the Ethereum node, which have full access to the EVM stack,
|
|
|
|
memory and contract storage. This permits developers to only gather the data they need,
|
|
|
|
and do any processing **at** the data. Please see the next section for our *custom in-node
|
|
|
|
tracers*.
|
|
|
|
|
|
|
|
### Pruning
|
|
|
|
|
|
|
|
Geth by default does in-memory pruning of state, discarding state entries that it deems is
|
|
|
|
no longer necessary to maintain. This is configured via the `--gcmode` option. Often,
|
|
|
|
people run into the error that state is not available.
|
|
|
|
|
|
|
|
Say you want to do a trace on block `B`. Now there are a couple of cases:
|
|
|
|
|
|
|
|
1. You have done a fast-sync, pivot block `P` where `P <= B`.
|
|
|
|
2. You have done a fast-sync, pivot block `P` where `P > B`.
|
|
|
|
3. You have done a full-sync, with pruning
|
|
|
|
4. You have done a full-sync, without pruning (`--gcmode=archive`)
|
|
|
|
|
|
|
|
Here's what happens in each respective case:
|
|
|
|
|
|
|
|
1. Geth will regenerate the desired state by replaying blocks from the closest point ina
|
|
|
|
time before `B` where it has full state. This defaults to `128` blocks max, but you can
|
|
|
|
specify more in the actual call `... "reexec":1000 .. }` to the tracer.
|
|
|
|
2. Sorry, can't be done without replaying from genesis.
|
|
|
|
3. Same as 1)
|
|
|
|
4. Does not need to replay anything, can immediately load up the state and serve the request.
|
|
|
|
|
|
|
|
There is one other option available to you, which may or may not suit your needs. That is
|
2020-12-27 15:40:20 -06:00
|
|
|
to use [Go-evmlab](https://github.com/holiman/goevmlab).
|
2019-11-05 06:46:00 -06:00
|
|
|
|
|
|
|
docker pull holiman/evmlab && docker run -it holiman/evmlab
|
|
|
|
|
|
|
|
There you can use the reproducer. The reproducer will incrementally fetch data from infura
|
|
|
|
until it has all the information required to create the trace locally on an evm which is
|
|
|
|
bundled with the image. It will create a custom genesis containing the state that the
|
|
|
|
transaction touches (balances, code, nonce etc). It should be mentioned that the evmlab
|
|
|
|
reproducer is strictly guaranteed to be totally exact with regards to gascosts incurred by
|
|
|
|
the outer transaction, as evmlab does not fully calculate the gascosts for nonzero data
|
|
|
|
etc, but is usually sufficient to analyze contracts and events.
|