Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.
In this change, the revalidation process is improved with the following logic:
- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
Once validation passes, it transfers to the 'slow' list. If a request fails, or the
node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
we will not drop the node on the first failing check. We instead quickly decay the
livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.
I am also adding a debug API endpoint to dump the node table content.
Co-authored-by: Martin HS <martin@swende.se>
* p2p/discover: add liveness check in collectTableNodes
* p2p/discover: fix test
* p2p/discover: rename to appendLiveNodes
* p2p/discover: add dedup logic back
* p2p/discover: simplify
* p2p/discover: fix issue found by test
* p2p/discover: remove ReadRandomNodes
Even though it's public, this method is not callable by code outside of
package p2p/discover because one can't get a valid instance of Table.
* p2p/discover: add Table.Nodes
* p2p/discover: make Table settings configurable
In unit tests and externally developed cmd/devp2p test runs, it can be
useful to tune the timer intervals used by Table.
This changes TALKREQ message processing to run the handler on separate goroutine,
instead of running on the main discv5 dispatcher goroutine. It's better this way because
it allows the handler to perform blocking actions.
I'm also adding a new method TalkRequestToID here. The method allows implementing
a request flow where one node A sends TALKREQ to another node B, and node B later
sends a TALKREQ back. With TalkRequestToID, node B does not need the ENR of A to
send its request.
* p2p/discover: add more packet information in logs
This adds more fields to discv5 packet logs. These can be useful when
debugging multi-packet interactions.
The FINDNODE message also gets an additional field, OpID for debugging
purposes. This field is not encoded onto the wire.
I'm also removing topic system related message types in this change.
These will come back in the future, where support for them will be
guarded by a config flag.
* p2p/discover/v5wire: rename 'Total' to 'RespCount'
The new name captures the meaning of this field better.
Instead of using a limit of three nodes per message, we can pack more nodes
into each message based on ENR size. In my testing, this halves the number
of sent NODES messages, because ENR size is usually < 300 bytes.
This also adds RLP helper functions that compute the encoded size of
[]byte and string.
Co-authored-by: Martin Holst Swende <martin@swende.se>
Noticed that lookupDistances for FINDNODE requests didn't consider 256 a valid
distance. This is actually part of the example in the comment above the
function, surprised that wasn't tested before.
When receiving PING from an IPv4 address over IPv6, the implementation sent
back a IPv4-in-IPv6 address. This change makes it reflect the IPv4 address.
This PR implements the first one of the "lespay" UDP queries which
is already useful in itself: the capacity query. The server pool is making
use of this query by doing a cheap UDP query to determine whether it is
worth starting the more expensive TCP connection process.
This fixes a deadlock that could occur when a response packet arrived
after a call had already received enough responses and was about to
signal completion to the dispatch loop.
Co-authored-by: Felix Lange <fjl@twurst.com>
This adds two new methods to UDPv5, AllNodes and LocalNode.
AllNodes returns all the nodes stored in the local table; this is
useful for the purposes of metrics collection and also debugging any
potential issues with other discovery v5 implementations.
LocalNode returns the local node object. The reason for exposing this
is so that users can modify and set/delete new key-value entries in
the local record.
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.