Commit Graph

36 Commits

Author SHA1 Message Date
Lucas 804d45cc2e
p2p: DNS resolution for static nodes (#30822)
Closes #23210 

# Context 
When deploying Geth in Kubernetes with ReplicaSets, we encountered two
DNS-related issues affecting node connectivity. First, during startup,
Geth tries to resolve DNS names for static nodes too early in the config
unmarshaling phase. If peer nodes aren't ready yet (which is common in
Kubernetes rolling deployments), this causes an immediate failure:


```
INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.geth.node: no such host
``` 

The second issue comes up when pods get rescheduled to different nodes -
their IPs change but peers keep using the initially resolved IP, never
updating the DNS mapping.

This PR adds proper DNS support for enode:// URLs by deferring resolution
to connection time. It also handles DNS failures gracefully instead of failing
fatally during startup, making it work better in container environments where
IPs are dynamic and peers come and go during rollouts.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-13 12:46:12 +01:00
rene 129cf075e9
p2p: move rlpx into separate package (#21464)
This change moves the RLPx protocol implementation into a separate package,
p2p/rlpx. The new package can be used to establish RLPx connections for
protocol testing purposes.

Co-authored-by: Felix Lange <fjl@twurst.com>
2020-09-22 10:17:39 +02:00
Felix Lange 90caa2cabb
p2p: new dial scheduler (#20592)
* p2p: new dial scheduler

This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:

- The time between discovery of a node and dialing that node is
  significantly lower in the new version. The old dialState kept
  a buffer of nodes and launched a task to refill it whenever the buffer
  became empty. This worked well with the discovery interface we used to
  have, but doesn't really work with the new iterator-based discovery
  API.

- Selection of static dial candidates (created by Server.AddPeer or
  through static-nodes.json) performs much better for large amounts of
  static peers. Connections to static nodes are now limited like dynanic
  dials and can no longer overstep MaxPeers or the dial ratio.

* p2p/simulations/adapters: adapt to new NodeDialer interface

* p2p: re-add check for self in checkDial

* p2p: remove peersetCh

* p2p: allow static dials when discovery is disabled

* p2p: add test for dialScheduler.removeStatic

* p2p: remove blank line

* p2p: fix documentation of maxDialPeers

* p2p: change "ok" to "added" in static node log

* p2p: improve dialTask docs

Also increase log level for "Can't resolve node"

* p2p: ensure dial resolver is truly nil without discovery

* p2p: add "looking for peers" log message

* p2p: clean up Server.run comments

* p2p: fix maxDialedConns for maxpeers < dialRatio

Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.

* p2p: fix RemovePeer to disconnect the peer again

Also make RemovePeer synchronous and add a test.

* p2p: remove "Connection set up" log message

* p2p: clean up connection logging

We previously logged outgoing connection failures up to three times.

- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."

This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).

Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.

* p2p: document that RemovePeer blocks
2020-02-13 11:10:03 +01:00
Rick 6f1a600f6c p2p: fix bug in TestPeerDisconnect (#20277) 2019-11-13 12:01:52 +01:00
Felix Lange c420dcb39c
p2p: enforce connection retry limit on server side (#19684)
The dialer limits itself to one attempt every 30s. Apply the same limit
in Server and reject peers which try to connect too eagerly. The check
against the limit happens right after accepting the connection.

Further changes in this commit ensure we pass the Server logger
down to Peer instances, discovery and dialState. Unit test logging now
works in all Server tests.
2019-06-11 12:45:33 +02:00
Felix Lange 30cd5c1854
all: new p2p node representation (#17643)
Package p2p/enode provides a generalized representation of p2p nodes
which can contain arbitrary information in key/value pairs. It is also
the new home for the node database. The "v4" identity scheme is also
moved here from p2p/enr to remove the dependency on Ethereum crypto from
that package.

Record signature handling is changed significantly. The identity scheme
registry is removed and acceptable schemes must be passed to any method
that needs identity. This means records must now be validated explicitly
after decoding.

The enode API is designed to make signature handling easy and safe: most
APIs around the codebase work with enode.Node, which is a wrapper around
a valid record. Going from enr.Record to enode.Node requires a valid
signature.

* p2p/discover: port to p2p/enode

This ports the discovery code to the new node representation in
p2p/enode. The wire protocol is unchanged, this can be considered a
refactoring change. The Kademlia table can now deal with nodes using an
arbitrary identity scheme. This requires a few incompatible API changes:

  - Table.Lookup is not available anymore. It used to take a public key
    as argument because v4 protocol requires one. Its replacement is
    LookupRandom.
  - Table.Resolve takes *enode.Node instead of NodeID. This is also for
    v4 protocol compatibility because nodes cannot be looked up by ID
    alone.
  - Types Node and NodeID are gone. Further commits in the series will be
    fixes all over the the codebase to deal with those removals.

* p2p: port to p2p/enode and discovery changes

This adapts package p2p to the changes in p2p/discover. All uses of
discover.Node and discover.NodeID are replaced by their equivalents from
p2p/enode.

New API is added to retrieve the enode.Node instance of a peer. The
behavior of Server.Self with discovery disabled is improved. It now
tries much harder to report a working IP address, falling back to
127.0.0.1 if no suitable address can be determined through other means.
These changes were needed for tests of other packages later in the
series.

* p2p/simulations, p2p/testing: port to p2p/enode

No surprises here, mostly replacements of discover.Node, discover.NodeID
with their new equivalents. The 'interesting' API changes are:

 - testing.ProtocolSession tracks complete nodes, not just their IDs.
 - adapters.NodeConfig has a new method to create a complete node.

These changes were needed to make swarm tests work.

Note that the NodeID change makes the code incompatible with old
simulation snapshots.

* whisper/whisperv5, whisper/whisperv6: port to p2p/enode

This port was easy because whisper uses []byte for node IDs and
URL strings in the API.

* eth: port to p2p/enode

Again, easy to port because eth uses strings for node IDs and doesn't
care about node information in any way.

* les: port to p2p/enode

Apart from replacing discover.NodeID with enode.ID, most changes are in
the server pool code. It now deals with complete nodes instead
of (Pubkey, IP, Port) triples. The database format is unchanged for now,
but we should probably change it to use the node database later.

* node: port to p2p/enode

This change simply replaces discover.Node and discover.NodeID with their
new equivalents.

* swarm/network: port to p2p/enode

Swarm has its own node address representation, BzzAddr, containing both
an overlay address (the hash of a secp256k1 public key) and an underlay
address (enode:// URL).

There are no changes to the BzzAddr format in this commit, but certain
operations such as creating a BzzAddr from a node ID are now impossible
because node IDs aren't public keys anymore.

Most swarm-related changes in the series remove uses of
NewAddrFromNodeID, replacing it with NewAddr which takes a complete node
as argument. ToOverlayAddr is removed because we can just use the node
ID directly.
2018-09-25 00:59:00 +02:00
Felix Lange 96ae35e2ac p2p, p2p/discover, p2p/nat: rework logging using context keys 2017-02-28 10:20:29 +01:00
Felix Lange 35a7dcb162 all: gofmt -w -s 2017-01-06 15:52:03 +01:00
Felix Lange bfbcfbe4a9 all: fix license headers one more time
I forgot to update one instance of "go-ethereum" in commit 3f047be5a.
2015-07-23 18:35:11 +02:00
Felix Lange 3f047be5aa all: update license headers to distiguish GPL/LGPL
All code outside of cmd/ is licensed as LGPL. The headers
now reflect this by calling the whole work "the go-ethereum library".
2015-07-22 18:51:45 +02:00
Felix Lange ea54283b30 all: update license information 2015-07-07 14:12:44 +02:00
Péter Szilágyi 216fc267fa p2p: fix local/remote cap/protocol mixup 2015-06-26 20:45:13 +03:00
Péter Szilágyi d84638bd31 p2p: support protocol version negotiation 2015-06-26 15:48:50 +03:00
Felix Lange 70da79f04c p2p: improve disconnect logging 2015-06-15 15:03:46 +02:00
Felix Lange 1440f9a37a p2p: new dialer, peer management without locks
The most visible change is event-based dialing, which should be an
improvement over the timer-based system that we have at the moment.
The dialer gets a chance to compute new tasks whenever peers change or
dials complete. This is better than checking peers on a timer because
dials happen faster. The dialer can now make more precise decisions
about whom to dial based on the peer set and we can test those
decisions without actually opening any sockets.

Peer management is easier to test because the tests can inject
connections at checkpoints (after enc handshake, after protocol
handshake).

Most of the handshake stuff is now part of the RLPx code. It could be
exported or move to its own package because it is no longer entangled
with Server logic.
2015-05-25 01:17:14 +02:00
Felix Lange dbdc5fd4b3 p2p: delete Server.Broadcast 2015-05-25 01:17:14 +02:00
Felix Lange 206fe25971 p2p: remove testlog 2015-05-14 14:56:34 +02:00
Felix Lange 691cb90284 p2p: log remote reason when disconnect is requested
The returned reason is currently not used except for the log
message. This change makes the log messages a bit more useful.
The handshake code also returns the remote reason.
2015-05-14 14:53:29 +02:00
Felix Lange f1d710af00 p2p: fix Peer shutdown deadlocks
There were multiple synchronization issues in the disconnect handling,
all caused by the odd special-casing of Peer.readLoop errors. Remove the
special handling of read errors and make readLoop part of the Peer
WaitGroup.

Thanks to @Gustav-Simonsson for pointing at arrows in a diagram
and playing rubber-duck.
2015-04-10 13:26:27 +02:00
Felix Lange 5ba51594c7 p2p: use package rlp to encode messages
Message encoding functions have been renamed to catch any uses.
The switch to the new encoder can cause subtle incompatibilities.
If there are any users outside of our tree, they will at least be
alerted that there was a change.

NewMsg no longer exists. The replacements for EncodeMsg are called
Send and SendItems.
2015-03-19 15:11:02 +01:00
Felix Lange 4811f460e7 p2p: export ExpectMsg (for eth protocol testing) 2015-03-19 15:08:04 +01:00
Felix Lange 7964f30dcb p2p: msg.Payload contains list data
With RLPx frames, the message code is contained in the
frame and is no longer part of the encoded data.

EncodeMsg, Msg.Decode have been updated to match.
Code that decodes RLP directly from Msg.Payload will need
to change.
2015-03-04 12:27:24 +01:00
Felix Lange 736e632215 p2p: use RLPx frames for messaging 2015-03-04 12:27:23 +01:00
Felix Lange 73f94f3755 p2p: disable encryption handshake
The diff is a bit bigger than expected because the protocol handshake
logic has moved out of Peer. This is necessary because the protocol
handshake will have custom framing in the final protocol.
2015-02-19 16:54:53 +01:00
Felix Lange e34d134102 p2p: fixes for actual connections
The unit test hooks were turned on 'in production'.
2015-02-07 00:43:52 +01:00
Felix Lange 5bdc115943 p2p: integrate p2p/discover
Overview of changes:

- ClientIdentity has been removed, use discover.NodeID
- Server now requires a private key to be set (instead of public key)
- Server performs the encryption handshake before launching Peer
- Dial logic takes peers from discover table
- Encryption handshake code has been cleaned up a bit
- baseProtocol is gone because we don't exchange peers anymore
- Some parts of baseProtocol have moved into Peer instead
2015-02-06 00:00:36 +01:00
Felix Lange eb0e7b1b81 eth, p2p: remove EncodeMsg from p2p.MsgWriter
...and make it a top-level function instead.

The original idea behind having EncodeMsg in the interface was that
implementations might be able to encode RLP data to their underlying
writer directly instead of buffering the encoded data. The encoder
will buffer anyway, so that doesn't matter anymore.

Given the recent problems with EncodeMsg (copy-pasted implementation
bug) I'd rather implement once, correctly.
2015-01-06 12:23:38 +01:00
obscuren 6abf8ef78f Merge 2015-01-05 17:10:42 +01:00
Felix Lange e28c60caf9 p2p: improve and test eofSignal 2014-12-12 11:40:02 +01:00
Felix Lange cfd7e74c25 p2p: add test for NewPeer 2014-11-26 22:49:40 +01:00
Felix Lange 9b85002b70 p2p: remove Msg.Value and MsgLoop 2014-11-25 16:01:39 +01:00
Felix Lange 6049fcd52a p2p: use package rlp for baseProtocol 2014-11-25 12:25:31 +01:00
Felix Lange c1fca72552 p2p: use package rlp 2014-11-24 19:03:20 +01:00
Felix Lange 59b63caf5e p2p: API cleanup and PoC 7 compatibility
Whoa, one more big commit. I didn't manage to untangle the
changes while working towards compatibility.
2014-11-21 21:52:45 +01:00
Felix Lange f38052c499 p2p: rework protocol API 2014-11-21 21:52:45 +01:00
zelig 771fbcc02e initial commit of p2p package 2014-10-23 16:57:54 +01:00