go-ethereum/docs/monitoring/metrics.md

140 lines
7.9 KiB
Markdown
Raw Normal View History

2022-09-09 07:28:25 -05:00
---
title: Metrics
description: Introduction to reporting metrics from Geth
---
Geth includes a variety of optional metrics that can be reported to the user. However, metrics are disabled by default to save on the computational overhead for the average user. Users that choose to see more detailed metrics can enable them using the `--metrics` flag when starting Geth. Some metrics are classed as especially expensive and are only enabled when the `--metrics.expensive` flag is supplied. For example, per-packet network traffic data is considered expensive.
The goal of the Geth metrics system is that - similar to logs - arbitrary metric collections can be added to any part of the code without requiring fancy constructs to analyze them (counter variables, public interfaces, crossing over the APIs, console hooks, etc). Instead, metrics should be "updated" whenever and wherever needed and be automatically collected, surfaced through the APIs, queryable and visualizable for analysis.
2022-11-25 11:34:46 -06:00
## Metric types {#metric-types}
2022-09-09 07:28:25 -05:00
2023-08-12 15:30:09 -05:00
Geth's metrics can be classified into four types: meters, timers, counters and gauges.
2022-09-09 07:28:25 -05:00
2022-11-25 11:34:46 -06:00
### Meters {#meters}
2022-09-09 07:28:25 -05:00
Analogous to physical meters (electricity, water, etc), Geth's meters are capable of measuring the _amount_ of "things" that pass through and at the _rate_ at which they do. A meter doesn't have a specific unit of measure (byte, block, malloc, etc), it just counts arbitrary _events_. At any point in time a meter can report:
- _Total number of events_ that passed through the meter
- _Mean throughput rate_ of the meter since startup (events / second)
- _Weighted throughput rate_ in the last _1_, _5_ and _15_ minutes (events / second)
("weighted" means that recent seconds count more that in older ones\*)
2022-11-25 11:34:46 -06:00
### Timers {#timers}
2022-09-09 07:28:25 -05:00
Timers are extensions of _meters_, the _duration_ of an event is collected alongside a log of its occurrence. Similarly to meters, a timer can also measure arbitrary events but each requires a duration to be assigned individually. In addition generating all of the meter report types, a timer also reports:
- _Percentiles (5, 20, 50, 80, 95)_, reporting that some percentage of the events took less than the reported time to execute (_e.g. Percentile 20 = 1.5s would mean that 20% of the measured events took less time than 1.5 seconds to execute; inherently 80%(=100%-20%) took more that 1.5s_)
- Percentile 5: minimum durations (this is as fast as it gets)
- Percentile 50: well behaved samples (boring, just to give an idea)
- Percentile 80: general performance (these should be optimised)
- Percentile 95: worst case outliers (rare, just handle gracefully)
2022-12-13 07:30:30 -06:00
### Counters {#counters}
2022-09-09 07:28:25 -05:00
A counter is a single int64 value that can be incremented and decremented. The current value of the counter can be queried.
2023-08-12 15:30:09 -05:00
### Gauges {#gauges}
2022-09-09 07:28:25 -05:00
A gauge is a single int64 value. Its value can increment and decrement - as with a counter - but can also be set arbitrarily.
2022-11-25 11:34:46 -06:00
## Querying metrics {#querying-metrics}
2022-09-09 07:28:25 -05:00
Geth collects metrics if the `--metrics` flag is provided at startup. Those metrics are available via an HTTP server if the `--metrics.addr` flag is also provided. By default the metrics are served at `127.0.0.1:6060/debug/metrics` but a custom IP address can be provided. A custom port can also be provided to the `--metrics.port` flag. More computationally expensive metrics are toggled on or off by providing or omitting the `--metrics.expensive` flag. For example, to serve all metrics at the default address and port:
2022-11-30 04:38:00 -06:00
```sh
2022-09-09 07:28:25 -05:00
geth <other commands> --metrics --metrics.addr 127.0.0.1 --metrics.expensive
```
Navigating the browser to the given metrics address displays all the available metrics in the form
of JSON data that looks similar to:
2022-11-30 04:38:00 -06:00
```sh
2022-09-09 07:28:25 -05:00
chain/account/commits.50-percentile: 374072
chain/account/commits.75-percentile: 830356
chain/account/commits.95-percentile: 1783005.3999976
chain/account/commits.99-percentile: 3991806
chain/account/commits.99.999-percentile: 3991806
chain/account/commits.count: 43
chain/account/commits.fifteen-minute: 0.029134344092314267
chain/account/commits.five-minute: 0.029134344092314267
...
```
Any developer is free to add, remove or modify the available metrics as they see fit. The precise list of available metrics is always available by opening the metrics server in the browser.
Geth also supports dumping metrics directly into an influx database. In order to activate this, the `--metrics.influxdb` flag must be provided at startup. The API endpoint,username, password and other influxdb tags can also be provided. The available tags are:
2022-11-30 04:38:00 -06:00
```sh
2022-09-09 07:28:25 -05:00
--metrics.influxdb.endpoint value InfluxDB API endpoint to report metrics to (default: "http://localhost:8086")
--metrics.influxdb.database value InfluxDB database name to push reported metrics to (default: "geth")
--metrics.influxdb.username value Username to authorize access to the database (default: "test")
--metrics.influxdb.password value Password to authorize access to the database (default: "test")
--metrics.influxdb.tags value Comma-separated InfluxDB tags (key/values) attached to all measurements (default: "host=localhost")
--metrics.influxdbv2 Enable metrics export/push to an external InfluxDB v2 database
--metrics.influxdb.token value Token to authorize access to the database (v2 only) (default: "test")
--metrics.influxdb.bucket value InfluxDB bucket name to push reported metrics to (v2 only) (default: "geth")
--metrics.influxdb.organization value InfluxDB organization name (v2 only) (default: "geth")
```
We also provide Prometheus-formatted metrics data, which can be obtained through the `http://127.0.0.1:6060/debug/metrics/prometheus` URL, eg:
```sh
# TYPE chain_account_commits_count counter
chain_account_commits_count 6506
# TYPE chain_account_commits summary
chain_account_commits {quantile="0.5"} 8.194577e+06
chain_account_commits {quantile="0.75"} 1.016841725e+07
chain_account_commits {quantile="0.95"} 1.4334824899999999e+07
chain_account_commits {quantile="0.99"} 1.923948246000001e+07
chain_account_commits {quantile="0.999"} 5.038267952400009e+07
chain_account_commits {quantile="0.9999"} 5.108694e+07
# TYPE chain_account_hashes_count counter
chain_account_hashes_count 6506
# TYPE chain_account_hashes summary
chain_account_hashes {quantile="0.5"} 1.565746e+06
chain_account_hashes {quantile="0.75"} 1.87953975e+06
chain_account_hashes {quantile="0.95"} 4.6262716e+06
chain_account_hashes {quantile="0.99"} 8.655076970000029e+06
chain_account_hashes {quantile="0.999"} 4.823811956800011e+07
chain_account_hashes {quantile="0.9999"} 4.9055682e+07
...
```
2022-11-25 11:34:46 -06:00
## Creating and updating metrics {#creating-and-updating-metrics}
2022-09-09 07:28:25 -05:00
Metrics can be added easily in the Geth source code:
```go
meter := metrics.NewMeter("system/memory/allocs")
timer := metrics.NewTimer("chain/inserts")
```
In order to use the same meter from two different packages without creating dependency cycles, the metrics can be created using `NewOrRegisteredX()` functions. This creates a new meter if no meter with this name is available or returns the existing meter.
```go
meter := metrics.NewOrRegisteredMeter("system/memory/allocs")
timer := metrics.NewOrRegisteredTimer("chain/inserts")
```
The name given to the metric can be any arbitrary string. However, since Geth assumes it to be some meaningful sub-system hierarchy, it should be named accordingly.
Metrics can then be updated:
```go
meter.Mark(n) // Record the occurrence of `n` events
timer.Update(duration) // Record an event that took `duration`
timer.UpdateSince(time) // Record an event that started at `time`
timer.Time(function) // Measure and record the execution of `function`
```
2022-11-25 11:34:46 -06:00
## Summary {#summary}
2022-09-09 07:28:25 -05:00
Geth can be configured to report metrics to an HTTP server or database. These functions are disabled by default but can be configured by passing the appropriate commands on startup. Users can easily create custom metrics by adding them to the Geth source code, following the instructions on this page.