Commit Graph

693 Commits

Author SHA1 Message Date
Lucas 804d45cc2e
p2p: DNS resolution for static nodes (#30822)
Closes #23210 

# Context 
When deploying Geth in Kubernetes with ReplicaSets, we encountered two
DNS-related issues affecting node connectivity. First, during startup,
Geth tries to resolve DNS names for static nodes too early in the config
unmarshaling phase. If peer nodes aren't ready yet (which is common in
Kubernetes rolling deployments), this causes an immediate failure:


```
INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.geth.node: no such host
``` 

The second issue comes up when pods get rescheduled to different nodes -
their IPs change but peers keep using the initially resolved IP, never
updating the DNS mapping.

This PR adds proper DNS support for enode:// URLs by deferring resolution
to connection time. It also handles DNS failures gracefully instead of failing
fatally during startup, making it work better in container environments where
IPs are dynamic and peers come and go during rollouts.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-13 12:46:12 +01:00
lorenzo c1c2507148
p2p: fix DiscReason encoding/decoding (#30855)
This fixes an issue where the disconnect message was not wrapped in a list.
The specification requires it to be a list like any other message.

In order to remain compatible with legacy geth versions, we now accept both
encodings when parsing a disconnect message.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-12 12:33:42 +01:00
Martin HS 9045b79bc2
metrics, cmd/geth: change init-process of metrics (#30814)
This PR modifies how the metrics library handles `Enabled`: previously,
the package `init` decided whether to serve real metrics or just
dummy-types.

This has several drawbacks: 
- During pkg init, we need to determine whether metrics are enabled or
not. So we first hacked in a check if certain geth-specific
commandline-flags were enabled. Then we added a similar check for
geth-env-vars. Then we almost added a very elaborate check for
toml-config-file, plus toml parsing.

- Using "real" types and dummy types interchangeably means that
everything is hidden behind interfaces. This has a performance penalty,
and also it just adds a lot of code.

This PR removes the interface stuff, uses concrete types, and allows for
the setting of Enabled to happen later. It is still assumed that
`metrics.Enable()` is invoked early on.

The somewhat 'heavy' operations, such as ticking meters and exp-decay,
now checks the enable-flag to prevent resource leak.

The change may be large, but it's mostly pretty trivial, and from the
last time I gutted the metrics, I ensured that we have fairly good test
coverage.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-10 13:27:29 +01:00
tianyeyouyou ae83912841
p2p/netutil: unittests for addrutil (#30439)
add unit tests for `p2p/addrutil`

---------

Co-authored-by: Martin HS <martin@swende.se>
2024-11-11 11:43:22 +01:00
Martin HS 5adc314817
build: update to golangci-lint 1.61.0 (#30587)
Changelog: https://golangci-lint.run/product/changelog/#1610 

Removes `exportloopref` (no longer needed), replaces it with
`copyloopvar` which is basically the opposite.

Also adds: 
- `durationcheck`
- `gocheckcompilerdirectives`
- `reassign`
- `mirror`
- `tenv`

---------

Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
2024-10-14 19:25:22 +02:00
Felix Lange 6b61b54dc7
p2p/discover: add config option for disabling FINDNODE liveness check (#30512)
This is for fixing Prysm integration tests.
2024-09-30 10:56:14 +02:00
Martin HS b5a88dafae
p2p/discover: fix flaky tests writing to test.log after completion (#30506)
This PR fixes two tests, which had a tendency to sometimes write to the `*testing.T` `log` facility after the test function had completed, which is not allowed. This PR fixes it by using waitgroups to ensure that the handler/logwriter terminates before the test exits.

closes #30505
2024-09-26 08:12:12 +02:00
Guillaume Michel f544fc3b46
p2p/enode: add quic ENR entry (#30283)
Add `quic` entry to the ENR as proposed in
https://github.com/ethereum/consensus-specs/pull/3644

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2024-09-13 23:47:18 +02:00
Nicolas Gotchac 87377c58bc
p2p/discover: fix Write method in metered connection (#30355)
`WriteToUDP` was never called, since `meteredUdpConn` exposed directly
all the methods from the underlying `UDPConn` interface.

This fixes the `discover/egress` metric never being updated.
2024-08-27 14:10:32 +02:00
lightclient 00294e9d28
cmd/utils,p2p: enable discv5 by default (#30327) 2024-08-20 16:02:54 +02:00
lightclient 33a13b6f21
p2p/simulations: remove packages (#30250)
Looking at the history of these packages over the past several years, there
haven't been any meaningful contributions or usages:
https://github.com/ethereum/go-ethereum/commits/master/p2p/simulations?before=de6d5976794a9ed3b626d4eba57bf7f0806fb970+35

Almost all of the commits are part of larger refactors or low-hanging-fruit contributions.
Seems like it's not providing much value and taking up team + contributor time.
2024-08-12 10:36:48 +02:00
Daniel Knopik de6d597679
p2p/discover: schedule revalidation also when all nodes are excluded (#30239)
## Issue

If `nextTime` has passed, but all nodes are excluded, `get` would return
`nil` and `run` would therefore not invoke `schedule`. Then, we schedule
a timer for the past, as neither `nextTime` value has been updated. This
creates a busy loop, as the timer immediately returns.

## Fix

With this PR, revalidation will be also rescheduled when all nodes are
excluded.

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2024-07-31 21:38:23 +02:00
Marius G 6e33dbf96a
p2p: fix flaky test TestServerPortMapping (#30241)
The test specifies `ListenAddr: ":0"`, which means a random ephemeral
port will be chosen for the TCP listener by the OS. Additionally, since
no `DiscAddr` was specified, the same port that is chosen automatically
by the OS will also be used for the UDP listener in the discovery UDP
setup. This sometimes leads to test failures if the TCP listener picks a
free TCP port that is already taken for UDP. By specifying `DiscAddr:
":0"`, the UDP port will be chosen independently from the TCP port,
fixing the random failure.

See issue #29830.

Verified using
```
cd p2p
go test -c -race
stress ./p2p.test -test.run=TestServerPortMapping
...
5m0s: 4556 runs so far, 0 failures
```

The issue described above can technically lead to sporadic failures on
systems that specify a listen address via the `--port` flag of 0 while
not setting `--discovery.port`. Since the default is using port `30303`
and using a random ephemeral port is likely not used much to begin with,
not addressing the root cause might be acceptable.
2024-07-30 07:31:27 -06:00
dknopik b0f66e34ca
p2p/nat: return correct port for ExtIP NAT (#30234)
Return the actually requested external port instead of 0 in the
AddMapping implementation for `--nat extip:<IP>`.
2024-07-27 10:18:05 +02:00
yukionfire 4dfc75deef
beacon/types, cmd/devp2p, p2p/enr: clean up uses of fmt.Errorf (#30182) 2024-07-25 00:32:58 +02:00
Felix Lange ad49c708f5
p2p/discover: remove type encPubkey (#30172)
The pubkey type was moved to package v4wire a long time ago. Remaining uses of
encPubkey were probably left in due to laziness.
2024-07-18 11:09:02 +02:00
Nathan Jo 4bbe993252
p2p: fix ip change log parameter (#30158) 2024-07-15 10:15:35 +02:00
Halimao a71f6f91fd
p2p/discover: improve flaky revalidation tests (#30023) 2024-06-21 15:29:07 +02:00
David Theodore 27654d3022
p2p/rlpx: 2KB maximum size for handshake messages (#30029)
Co-authored-by: Felix Lange <fjl@twurst.com>
2024-06-20 14:08:54 +02:00
bugmaker9371 b6f2bbd417
p2p/simulations: update doc of HTTP endpoints (#29894) 2024-06-11 19:41:17 +02:00
Gealber Morales 8bda642963
p2p: use package slices to sort in PeersInfo (#29957) 2024-06-09 22:50:22 +02:00
Gealber Morales 349fcdd22d
p2p/discover: add missing lock when calling tab.handleAddNode (#29960) 2024-06-09 22:47:51 +02:00
Felix Lange 85459e1439
p2p/discover: unwrap 4-in-6 UDP source addresses (#29944)
Fixes an issue where discovery responses were not recognized.
2024-06-06 16:15:22 +03:00
Hteev Oli 0750cb0c8f
p2p/netutil: fix comments (#29942) 2024-06-06 10:56:41 +03:00
Felix Lange bc6569462d
p2p: use netip.Addr where possible (#29891)
enode.Node was recently changed to store a cache of endpoint information. The IP address in the cache is a netip.Addr. I chose that type over net.IP because it is just better. netip.Addr is meant to be used as a value type. Copying it does not allocate, it can be compared with ==, and can be used as a map key.

This PR changes most uses of Node.IP() into Node.IPAddr(), which returns the cached value directly without allocating.
While there are still some public APIs left where net.IP is used, I have converted all code used internally by p2p/discover to the new types. So this does change some public Go API, but hopefully not APIs any external code actually uses.

There weren't supposed to be any semantic differences resulting from this refactoring, however it does introduce one: In package p2p/netutil we treated the 0.0.0.0/8 network (addresses 0.x.y.z) as LAN, but netip.Addr.IsPrivate() doesn't. The treatment of this particular IP address range is controversial, with some software supporting it and others not. IANA lists it as special-purpose and invalid as a destination for a long time, so I don't know why I put it into the LAN list. It has now been marked as special in p2p/netutil as well.
2024-06-05 19:31:04 +02:00
Felix Lange 94a8b296e4
p2p/discover: refactor node and endpoint representation (#29844)
Here we clean up internal uses of type discover.node, converting most code to use
enode.Node instead. The discover.node type used to be the canonical representation of
network hosts before ENR was introduced. Most code worked with *node to avoid conversions
when interacting with Table methods. Since *node also contains internal state of Table and
is a mutable type, using *node outside of Table code is prone to data races. It's also
cleaner not having to wrap/unwrap *enode.Node all the time.

discover.node has been renamed to tableNode to clarify its purpose.

While here, we also change most uses of net.UDPAddr into netip.AddrPort. While this is
technically a separate refactoring from the *node -> *enode.Node change, it is more
convenient because *enode.Node handles IP addresses as netip.Addr. The switch to package
netip in discovery would've happened very soon anyway.

The change to netip.AddrPort stops at certain interface points. For example, since package
p2p/netutil has not been converted to use netip.Addr yet, we still have to convert to
net.IP/net.UDPAddr in a few places.
2024-05-29 15:02:26 +02:00
lilasxie 153f8da887
p2p/nodestate: remove unused package (#29872) 2024-05-29 12:11:18 +02:00
bugmaker9371 daf4f72077
p2p/simulations: remove stale information about docker adapter (#29874) 2024-05-29 12:09:58 +02:00
lightclient cc22e0cdf0
p2p/discover: fix update logic in handleAddNode (#29836)
It seems the semantic differences between addFoundNode and addInboundNode were lost in
#29572. My understanding is addFoundNode is for a node you have not contacted directly
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.

New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-05-28 21:30:17 +02:00
Felix Lange af0a3274be
p2p/discover: fix crash when revalidated node is removed (#29864)
In #29572, I assumed the revalidation list that the node is contained in could only ever
be changed by the outcome of a revalidation request. But turns out that's not true: if the
node gets removed due to FINDNODE failure, it will also be removed from the list it is in.
This causes a crash.

The invariant is: while node is in table, it is always in exactly one of the two lists. So
it seems best to store a pointer to the current list within the node itself.
2024-05-28 18:13:03 +02:00
gitglorythegreat 64b1cd8aaf
p2p: fix typos (#29828) 2024-05-24 11:33:19 +02:00
Aaron Chen 61b3d93bb0
p2p/enode: fix TCPEndpoint (#29827) 2024-05-23 23:17:51 +02:00
Felix Lange cc9e2bd9dd
p2p/enode: fix endpoint determination for IPv6 (#29801)
enode.Node has separate accessor functions for getting the IP, UDP port and TCP port.
These methods performed separate checks for attributes set in the ENR.

With this PR, the accessor methods will now return cached information, and the endpoint is
determined when the node is created. The logic to determine the preferred endpoint is now
more correct, and considers how 'global' each address is when both IPv4 and IPv6 addresses
are present in the ENR.
2024-05-23 14:27:03 +02:00
Felix Lange 6a9158bb1b
p2p/discover: improved node revalidation (#29572)
Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.

In this change, the revalidation process is improved with the following logic:

- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
  faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
  Once validation passes, it transfers to the 'slow' list. If a request fails, or the
  node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
  we will not drop the node on the first failing check. We instead quickly decay the
  livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.

I am also adding a debug API endpoint to dump the node table content.

Co-authored-by: Martin HS <martin@swende.se>
2024-05-23 14:26:09 +02:00
Kiarash Hajian 905e325cd8
p2p/discover/v5wire: add tests for invalid handshake and auth data size (#29708) 2024-05-06 13:17:19 +02:00
Aaron Chen 8c3fc56d7f
p2p/simulations/adapters: use maps.Clone (#29626) 2024-04-29 19:44:41 +02:00
Undefinedor a13b92524d
eth/protocols/eth,p2p/discover: remove unnecessary checks (#29590)
fix useless condition
2024-04-25 08:40:29 +02:00
bugmaker9371 98f504f69f
p2p/discover: fix test error messages (#29592) 2024-04-21 11:13:36 +02:00
Seungbae Yu 67422e2a56
p2p/nat: fix typos in comments (#29536) 2024-04-15 14:58:17 +02:00
Abirdcfly b179b7b8e7
all: remove duplicate word in comments (#29531)
This change removes some duplicate words in in comments
2024-04-15 08:34:31 +02:00
Bin 0bbd88bda0
all: use timer instead of time.After in loops, to avoid memleaks (#29241)
time.After is equivalent to NewTimer(d).C, and does not call Stop if the timer is no longer needed. This can cause memory leaks. This change changes many such occations to use NewTimer instead, and calling Stop once the timer is no longer needed.
2024-04-09 08:51:54 +02:00
cui 9dfe728909
p2p/discover: using slices.Contains (#29395) 2024-04-04 12:24:49 +02:00
cui 2e0c5e05ba
p2p/dnsdisc: using clear builtin func (#29418)
Co-authored-by: Felix Lange <fjl@twurst.com>
2024-04-04 12:19:48 +02:00
Ng Wei Han dfb3d46098
p2p: add inbound and outbound peers metric (#29424) 2024-04-02 21:18:28 +02:00
cui 3754a6cc92
p2p/dnsdisc: using maps.Copy (#29377) 2024-03-28 12:07:38 +01:00
Aaron Chen 723b1e36ad
all: fix mismatched names in comments (#29348)
* all: fix mismatched names in comments

* metrics: fix mismatched name in UpdateIfGt
2024-03-26 21:01:28 +01:00
Martin HS 14cc967d19
all: remove dependency on golang.org/exp (#29314)
This change includes a leftovers from https://github.com/ethereum/go-ethereum/pull/29307
- using the [new `slices` package](https://go.dev/doc/go1.21#slices) and
- using the [new `cmp.Ordered`](https://go.dev/doc/go1.21#cmp) instead of exp `constraints.Ordered`
2024-03-25 07:50:18 +01:00
Martin HS d9bde37ac3
log: use native log/slog instead of golang/exp (#29302) 2024-03-22 13:17:59 +01:00
Martin HS 14eb8967be
all: use min/max/clear from go1.21 (#29307) 2024-03-21 13:50:13 +01:00
Martin HS ab49f228ad
all: update to go version 1.22.1 (#28946)
Since Go 1.22 has deprecated certain elliptic curve operations, this PR removes 
references to the affected functions and replaces them with a custom implementation
in package crypto. This causes backwards-incompatible changes in some places.

---------

Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
2024-03-18 17:36:50 +01:00