This PR adds service value measurement statistics to the light client. It
also adds a private API that makes these statistics accessible. A follow-up
PR will add the new server pool which uses these statistics to select
servers with good performance.
This document describes the function of the new components:
https://gist.github.com/zsfelfoldi/3c7ace895234b7b345ab4f71dab102d4
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.
This adds additional logic to re-resolve the root name of a tree when a
couple of leaf requests have failed. We need this change to avoid
getting into a failure state where leaf requests keep failing for half
an hour when the tree has been updated.
* p2p: new dial scheduler
This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:
- The time between discovery of a node and dialing that node is
significantly lower in the new version. The old dialState kept
a buffer of nodes and launched a task to refill it whenever the buffer
became empty. This worked well with the discovery interface we used to
have, but doesn't really work with the new iterator-based discovery
API.
- Selection of static dial candidates (created by Server.AddPeer or
through static-nodes.json) performs much better for large amounts of
static peers. Connections to static nodes are now limited like dynanic
dials and can no longer overstep MaxPeers or the dial ratio.
* p2p/simulations/adapters: adapt to new NodeDialer interface
* p2p: re-add check for self in checkDial
* p2p: remove peersetCh
* p2p: allow static dials when discovery is disabled
* p2p: add test for dialScheduler.removeStatic
* p2p: remove blank line
* p2p: fix documentation of maxDialPeers
* p2p: change "ok" to "added" in static node log
* p2p: improve dialTask docs
Also increase log level for "Can't resolve node"
* p2p: ensure dial resolver is truly nil without discovery
* p2p: add "looking for peers" log message
* p2p: clean up Server.run comments
* p2p: fix maxDialedConns for maxpeers < dialRatio
Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.
* p2p: fix RemovePeer to disconnect the peer again
Also make RemovePeer synchronous and add a test.
* p2p: remove "Connection set up" log message
* p2p: clean up connection logging
We previously logged outgoing connection failures up to three times.
- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."
This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).
Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.
* p2p: document that RemovePeer blocks
This is a temporary fix for a problem which started happening when the
dialer was changed to read nodes from an enode.Iterator. Before the
iterator change, discovery queries would always return within a couple
seconds even if there was no Internet access. Since the iterator won't
return unless a node is actually found, discoverTask can take much
longer. This means that the 'emergency connect' logic might not execute
in time, leading to a stuck node.
* p2p/dnsdisc: add support for enode.Iterator
This changes the dnsdisc.Client API to support the enode.Iterator
interface.
* p2p/dnsdisc: rate-limit DNS requests
* p2p/dnsdisc: preserve linked trees across root updates
This improves the way links are handled when the link root changes.
Previously, sync would simply remove all links from the current tree and
garbage-collect all unreachable trees before syncing the new list of
links.
This behavior isn't great in certain cases: Consider a structure where
trees A, B, and C reference each other and D links to A. If D's link
root changed, the sync code would first remove trees A, B and C, only to
re-sync them later when the link to A was found again.
The fix for this problem is to track the current set of links in each
clientTree and removing old links only AFTER all links are synced.
* p2p/dnsdisc: deflake iterator test
* cmd/devp2p: adapt dnsClient to new p2p/dnsdisc API
* p2p/dnsdisc: tiny comment fix
* build: use golangci-lint
This changes build/ci.go to download and run golangci-lint instead
of gometalinter.
* core/state: fix unnecessary conversion
* p2p/simulations: fix lock copying (found by go vet)
* signer/core: fix unnecessary conversions
* crypto/ecies: remove unused function cmpPublic
* core/rawdb: remove unused function print
* core/state: remove unused function xTestFuzzCutter
* core/vm: disable TestWriteExpectedValues in a different way
* core/forkid: remove unused function checksum
* les: remove unused type proofsData
* cmd/utils: remove unused functions prefixedNames, prefixFor
* crypto/bn256: run goimports
* p2p/nat: fix goimports lint issue
* cmd/clef: avoid using unkeyed struct fields
* les: cancel context in testRequest
* rlp: delete unreachable code
* core: gofmt
* internal/build: simplify DownloadFile for Go 1.11 compatibility
* build: remove go test --short flag
* .travis.yml: disable build cache
* whisper/whisperv6: fix ineffectual assignment in TestWhisperIdentityManagement
* .golangci.yml: enable goconst and ineffassign linters
* build: print message when there are no lint issues
* internal/build: refactor download a bit
* rpc: improve codec abstraction
rpc.ServerCodec is an opaque interface. There was only one way to get a
codec using existing APIs: rpc.NewJSONCodec. This change exports
newCodec (as NewFuncCodec) and NewJSONCodec (as NewCodec). It also makes
all codec methods non-public to avoid showing internals in godoc.
While here, remove codec options in tests because they are not
supported anymore.
* p2p/simulations: use github.com/gorilla/websocket
This package was the last remaining user of golang.org/x/net/websocket.
Migrating to the new library wasn't straightforward because it is no
longer possible to treat WebSocket connections as a net.Conn.
* vendor: delete golang.org/x/net/websocket
* rpc: fix godoc comments and run gofmt
This adds all dashboard changes from the last couple months.
We're about to remove the dashboard, but decided that we should
get all the recent work in first in case anyone wants to pick up this
project later on.
* cmd, dashboard, eth, p2p: send peer info to the dashboard
* dashboard: update npm packages, improve UI, rebase
* dashboard, p2p: remove println, change doc
* cmd, dashboard, eth, p2p: cleanup after review
* dashboard: send current block to the dashboard client
This adds an implementation of node discovery via DNS TXT records to the
go-ethereum library. The implementation doesn't match EIP-1459 exactly,
the main difference being that this implementation uses separate merkle
trees for tree links and ENRs. The EIP will be updated to match p2p/dnsdisc.
To maintain DNS trees, cmd/devp2p provides a frontend for the p2p/dnsdisc
library. The new 'dns' subcommands can be used to create, sign and deploy DNS
discovery trees.
Most of these changes are related to the Go 1.13 changes to test binary
flag handling.
* cmd/geth: make attach tests more reliable
This makes the test wait for the endpoint to come up by polling
it instead of waiting for two seconds.
* tests: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* crypto/ecies: remove useless -dump flag in tests
* p2p/simulations: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* build: remove workaround for ./... vendor matching
This workaround was necessary for Go 1.8. The Go 1.9 release changed
the expansion rules to exclude vendored packages.
* Makefile: use relative path for GOBIN
This makes the "Run ./build/bin/..." line look nicer.
* les: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* eth: chain config (genesis + fork) ENR entry
* core/forkid, eth: protocol independent fork ID, update to CRC32 spec
* core/forkid, eth: make forkid a struct, next uint64, enr struct, RLP
* core/forkid: change forkhash rlp encoding from int to [4]byte
* eth: fixup eth entry a bit and update it every block
* eth: fix lint
* eth: fix crash in ethclient tests
The dialer limits itself to one attempt every 30s. Apply the same limit
in Server and reject peers which try to connect too eagerly. The check
against the limit happens right after accepting the connection.
Further changes in this commit ensure we pass the Server logger
down to Peer instances, discovery and dialState. Unit test logging now
works in all Server tests.
* p2p/enr: add entries for for IPv4/IPv6 separation
This adds entry types for "ip6", "udp6", "tcp6" keys. The IP type stays
around because removing it would break a lot of code and force everyone
to care about the distinction.
* p2p/enode: track IPv4 and IPv6 address separately
LocalNode predicts the local node's UDP endpoint and updates the record.
This change makes it predict IPv4 and IPv6 endpoints separately since
they can now be in the record at the same time.
* p2p/enode: implement base64 text format
* all: switch to enode.Parse(...)
This allows passing base64-encoded node records to all the places that
previously accepted enode:// URLs. The URL format is still supported.
* cmd/bootnode, p2p: log node URL instead of ENR
...and return the base64 record in NodeInfo.
* p2p/discover: export Ping and RequestENR
These two are useful for checking the status of a node.
* cmd/devp2p: add devp2p debug tool
This is a new tool for debugging p2p issues. It supports a few
basic tasks for now, but many more things can and will be added
in the near future.
devp2p enrdump -- prints ENRs readably
devp2p discv4 ping -- checks if a node is up
devp2p discv4 requestenr -- gets a node's record
devp2p discv4 resolve -- finds a node through the DHT
This change implements EIP-868. The UDPv4 transport announces support
for the extension in ping/pong and handles enrRequest messages.
There are two uses of the extension: If a remote node announces support
for EIP-868 in their pong, node revalidation pulls the node's record.
The Resolve method requests the record unconditionally.
cmd/swarm/swarm-smoke: improve smoke tests (#1337)
swarm/network: remove dead code (#1339)
swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)
This change restructures the internals of p2p/discover to make room for
the discv5 code which will soon be added to this package.
- packet type names now have a "V4" suffix.
- ListenUDP returns *UDPv4 instead of *Table. This technically breaks
the API but the only caller in go-ethereum is package p2p, which uses
a compatible interface and doesn't need changes.
- The internal transport interface is changed to make Table reusable for v5.
- The 'lookup' code moves from table to transport. This required
updating the lookup unit test to use udpTest instead of a custom transport.
* swarm/network: fix data races in TestInitialPeersMsg test
* swarm/network: add Kademlia.Saturation method with lock
* swarm/network: add Hive.Peer method to safely retrieve a bzz peer
* swarm/network: remove duplicate comment
* p2p/testing: prevent goroutine leak in ProtocolTester
* swarm/network: fix data race in newBzzBaseTesterWithAddrs
* swarm/network: fix goroutone leaks in testInitialPeersMsg
* swarm/network: raise number of peer check attempts in testInitialPeersMsg
* swarm/network: use Hive.Peer in Hive.PeerInfo function
* swarm/network: reduce the scope of mutex lock in newBzzBaseTesterWithAddrs
* swarm/storage: disable TestCleanIndex with race detector
This resolves a minor issue where neighbors responses containing less
than 16 nodes would bump the failure counter, removing the node. One
situation where this can happen is a private deployment where the total
number of extant nodes is less than 16.
Issue found by @jsying.
dummyHook's fields were concurrently written by nodes and read by
the test. The simplest solution is to protect all fields with a mutex.
Enable: TestMultiplePeersDropSelf, TestMultiplePeersDropOther as they
seemingly accidentally stayed disabled during a refactor/rewrite
since 1836366ac1.
resolvesethersphere/go-ethereum#1286
* p2p/discover: remove unused function
* p2p/enode: use localItemKey for local sequence number
I added localItemKey for this purpose in #18963, but then
forgot to actually use it. This changes the database layout
yet again and requires bumping the version number.
This change
- implements concurrent LES request serving even for a single peer.
- replaces the request cost estimation method with a cost table based on
benchmarks which gives much more consistent results. Until now the
allowed number of light peers was just a guess which probably contributed
a lot to the fluctuating quality of available service. Everything related
to request cost is implemented in a single object, the 'cost tracker'. It
uses a fixed cost table with a global 'correction factor'. Benchmark code
is included and can be run at any time to adapt costs to low-level
implementation changes.
- reimplements flowcontrol.ClientManager in a cleaner and more efficient
way, with added capabilities: There is now control over bandwidth, which
allows using the flow control parameters for client prioritization.
Target utilization over 100 percent is now supported to model concurrent
request processing. Total serving bandwidth is reduced during block
processing to prevent database contention.
- implements an RPC API for the LES servers allowing server operators to
assign priority bandwidth to certain clients and change prioritized
status even while the client is connected. The new API is meant for
cases where server operators charge for LES using an off-protocol mechanism.
- adds a unit test for the new client manager.
- adds an end-to-end test using the network simulator that tests bandwidth
control functions through the new API.
I added localItemKey for this purpose in #18963, but then
forgot to actually use it. This changes the database layout
yet again and requires bumping the version number.
* swarm/network: DRY out repeated giga comment
I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.
* p2p/simulations: encapsulate Node.Up field so we avoid data races
The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the field. Other times the locking was missed and we had
a data race.
For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.
resolves: ethersphere/go-ethereum#1146
* p2p/simulations: fix unmarshal of simulations.Node
Making Node.Up field private in 13292ee897
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.
Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177
* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON
* p2p/simulations: revert back to defer Unlock() pattern for Network
It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.
The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.
* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON
* p2p/simulations: remove JSON annotation from private fields of Node
As unexported fields are not serialized.
* p2p/simulations: fix deadlock in Network.GetRandomDownNode()
Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock
On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.
* p2p/simulations: ensure method conformity for Network
Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.
* p2p/simulations: fix deadlock during network shutdown
`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.
`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.
Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.
Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223
* swarm/state: simplify if statement in DBStore.Put()
* p2p/simulations: remove faulty godoc from private function
The comment started with the wrong method name.
The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
* node: close AccountsManager in new Close method
* p2p/simulations, p2p/simulations/adapters: handle node close on shutdown
* node: move node ephemeralKeystore cleanup to stop method
* node: call Stop in Node.Close method
* cmd/geth: close node.Node created with makeFullNode in cli commands
* node: close Node instances in tests
* cmd/geth, node: minor code style fixes
* cmd, console, miner, mobile: proper node Close() termination
This change clears up confusion around the two ways in which nodes
can be added to the table.
When a neighbors packet is received as a reply to findnode, the nodes
contained in the reply are added as 'seen' entries if sufficient space
is available.
When a ping is received and the endpoint verification has taken place,
the remote node is added as a 'verified' entry or moved to the front of
the bucket if present. This also updates the node's IP address and port
if they have changed.
This change resolves multiple issues around handling of endpoint proofs.
The proof is now done separately for each IP and completing the proof
requires a matching ping hash.
Also remove waitping because it's equivalent to sleep. waitping was
slightly more efficient, but that may cause issues with findnode if
packets are reordered and the remote end sees findnode before pong.
Logging of received packets was hitherto done after handling the packet,
which meant that sent replies were logged before the packet that
generated them. This change splits up packet handling into 'preverify'
and 'handle'. The error from 'preverify' is logged, but 'handle' happens
after the message is logged. This fixes the order. Packet logs now
contain the node ID.
* p2p/simulation: WIP minimal snapshot test
* p2p/simulation: Add snapshot create, load and verify to snapshot test
* build: add test tag for tests
* p2p/simulations, build: Revert travis change, build test sym always
* p2p/simulations: Add comments, timeout check on additional events
* p2p/simulation: Add benchmark template for minimal peer protocol init
* p2p/simulations: Remove unused code
* p2p/simulation: Correct timer reset
* p2p/simulations: Put snapshot check events in buffer and call blocking
* p2p/simulations: TestSnapshot fail if Load function returns early
* p2p/simulations: TestSnapshot wait for all connections before returning
* p2p/simulation: Revert to before wait for snap load (5e75594)
* p2p/simulations: add "conns after load" subtest to TestSnapshot
and nudge
* swarm/network: Hive - do not notify peer if discovery is disabled
* p2p/simulations: validate all connections on loading a snapshot
* p2p/simulations: track all connections in on snapshot loading
* p2p/simulations: add snapshotLoadTimeout variable
* p2p/simulations: ignore control events in snapshot load
* p2p/simulations: simplify event loop synchronization
* p2p/simulations: return already connected error from Load function
* p2p/simulations: log warning on snapshot loading disconnection
This change extends the peer metrics collection:
- traces the life-cycle of the peers
- meters the peer traffic separately for every peer
- creates event feed for the peer events
- emits the peer events
This PR adds enode.LocalNode and integrates it into the p2p
subsystem. This new object is the keeper of the local node
record. For now, a new version of the record is produced every
time the client restarts. We'll make it smarter to avoid that in
the future.
There are a couple of other changes in this commit: discovery now
waits for all of its goroutines at shutdown and the p2p server
now closes the node database after discovery has shut down. This
fixes a leveldb crash in tests. p2p server startup is faster
because it doesn't need to wait for the external IP query
anymore.
This fixes a rare deadlock with the inproc adapter:
- A node is stopped, which acquires Network.lock.
- The protocol code being simulated (swarm/network in my case)
waits for its goroutines to shut down.
- One of those goroutines calls into the simulation to add a peer,
which waits for Network.lock.
The fix for the deadlock is really simple, just release the lock
before stopping the simulation node.
Other changes in this PR clean up the exec adapter so it reports
node startup errors better and remove the docker adapter because
it just adds overhead.
In the exec adapter, node information is now posted to a one-shot
server. This avoids log parsing and allows reporting startup
errors to the simulation host.
A small change in package node was needed because simulation
nodes use port zero. Node.{HTTP,WS}Endpoint now return the live
endpoints after startup by checking the TCP listener.
Package p2p/enode provides a generalized representation of p2p nodes
which can contain arbitrary information in key/value pairs. It is also
the new home for the node database. The "v4" identity scheme is also
moved here from p2p/enr to remove the dependency on Ethereum crypto from
that package.
Record signature handling is changed significantly. The identity scheme
registry is removed and acceptable schemes must be passed to any method
that needs identity. This means records must now be validated explicitly
after decoding.
The enode API is designed to make signature handling easy and safe: most
APIs around the codebase work with enode.Node, which is a wrapper around
a valid record. Going from enr.Record to enode.Node requires a valid
signature.
* p2p/discover: port to p2p/enode
This ports the discovery code to the new node representation in
p2p/enode. The wire protocol is unchanged, this can be considered a
refactoring change. The Kademlia table can now deal with nodes using an
arbitrary identity scheme. This requires a few incompatible API changes:
- Table.Lookup is not available anymore. It used to take a public key
as argument because v4 protocol requires one. Its replacement is
LookupRandom.
- Table.Resolve takes *enode.Node instead of NodeID. This is also for
v4 protocol compatibility because nodes cannot be looked up by ID
alone.
- Types Node and NodeID are gone. Further commits in the series will be
fixes all over the the codebase to deal with those removals.
* p2p: port to p2p/enode and discovery changes
This adapts package p2p to the changes in p2p/discover. All uses of
discover.Node and discover.NodeID are replaced by their equivalents from
p2p/enode.
New API is added to retrieve the enode.Node instance of a peer. The
behavior of Server.Self with discovery disabled is improved. It now
tries much harder to report a working IP address, falling back to
127.0.0.1 if no suitable address can be determined through other means.
These changes were needed for tests of other packages later in the
series.
* p2p/simulations, p2p/testing: port to p2p/enode
No surprises here, mostly replacements of discover.Node, discover.NodeID
with their new equivalents. The 'interesting' API changes are:
- testing.ProtocolSession tracks complete nodes, not just their IDs.
- adapters.NodeConfig has a new method to create a complete node.
These changes were needed to make swarm tests work.
Note that the NodeID change makes the code incompatible with old
simulation snapshots.
* whisper/whisperv5, whisper/whisperv6: port to p2p/enode
This port was easy because whisper uses []byte for node IDs and
URL strings in the API.
* eth: port to p2p/enode
Again, easy to port because eth uses strings for node IDs and doesn't
care about node information in any way.
* les: port to p2p/enode
Apart from replacing discover.NodeID with enode.ID, most changes are in
the server pool code. It now deals with complete nodes instead
of (Pubkey, IP, Port) triples. The database format is unchanged for now,
but we should probably change it to use the node database later.
* node: port to p2p/enode
This change simply replaces discover.Node and discover.NodeID with their
new equivalents.
* swarm/network: port to p2p/enode
Swarm has its own node address representation, BzzAddr, containing both
an overlay address (the hash of a secp256k1 public key) and an underlay
address (enode:// URL).
There are no changes to the BzzAddr format in this commit, but certain
operations such as creating a BzzAddr from a node ID are now impossible
because node IDs aren't public keys anymore.
Most swarm-related changes in the series remove uses of
NewAddrFromNodeID, replacing it with NewAddr which takes a complete node
as argument. ToOverlayAddr is removed because we can just use the node
ID directly.
* cmd/swarm: minor cli flag text adjustments
* cmd/swarm, swarm/storage, swarm: fix mingw on windows test issues
* cmd/swarm: support for smoke tests on the production swarm cluster
* cmd/swarm/swarm-smoke: simplify cluster logic as per suggestion
* changed colour of landing page
* landing page reacts to enter keypress
* swarm/api/http: sticky footer for swarm landing page using flex
* swarm/api/http: sticky footer for error pages and fix for multiple choices
* swarm: propagate ctx to internal apis (#754)
* swarm/simnet: add basic node/service functions
* swarm/netsim: add buckets for global state and kademlia health check
* swarm/netsim: Use sync.Map as bucket and provide cleanup function for...
* swarm, swarm/netsim: adjust SwarmNetworkTest
* swarm/netsim: fix tests
* swarm: added visualization option to sim net redesign
* swarm/netsim: support multiple services per node
* swarm/netsim: remove redundant return statement
* swarm/netsim: add comments
* swarm: shutdown HTTP in Simulation.Close
* swarm: sim HTTP server timeout
* swarm/netsim: add more simulation methods and peer events examples
* swarm/netsim: add WaitKademlia example
* swarm/netsim: fix comments
* swarm/netsim: terminate peer events goroutines on simulation done
* swarm, swarm/netsim: naming updates
* swarm/netsim: return not healthy kademlias on WaitTillHealthy
* swarm: fix WaitTillHealthy call in testSwarmNetwork
* swarm/netsim: allow bucket to have any type for a key
* swarm: Added snapshots to new netsim
* swarm/netsim: add more tests for bucket
* swarm/netsim: move http related things into separate files
* swarm/netsim: add AddNodeWithService option
* swarm/netsim: add more tests and Start* methods
* swarm/netsim: add peer events and kademlia tests
* swarm/netsim: fix some tests flakiness
* swarm/netsim: improve random nodes selection, fix TestStartStop* tests
* swarm/netsim: remove time measurement from TestClose to avoid flakiness
* swarm/netsim: builder pattern for netsim HTTP server (#773)
* swarm/netsim: add connect related tests
* swarm/netsim: add comment for TestPeerEvents
* swarm: rename netsim package to network/simulation
* p2p/discover: move bond logic from table to transport
This commit moves node endpoint verification (bonding) from the table to
the UDP transport implementation. Previously, adding a node to the table
entailed pinging the node if needed. With this change, the ping-back
logic is embedded in the packet handler at a lower level.
It is easy to verify that the basic protocol is unchanged: we still
require a valid pong reply from the node before findnode is accepted.
The node database tracked the time of last ping sent to the node and
time of last valid pong received from the node. Node endpoints are
considered verified when a valid pong is received and the time of last
pong was called 'bond time'. The time of last ping sent was unused. In
this commit, the last ping database entry is repurposed to mean last
ping _received_. This entry is now used to track whether the node needs
to be pinged back.
The other big change is how nodes are added to the table. We used to add
nodes in Table.bond, which ran when a remote node pinged us or when we
encountered the node in a neighbors reply. The transport now adds to the
table directly after the endpoint is verified through ping. To ensure
that the Table can't be filled just by pinging the node repeatedly, we
retain the isInitDone check. During init, only nodes from neighbors
replies are added.
* p2p/discover: reduce findnode failure counter on success
* p2p/discover: remove unused parameter of loadSeedNodes
* p2p/discover: improve ping-back check and comments
* p2p/discover: add neighbors reply nodes always, not just during init
These RPC calls are analogous to Parity's parity_addReservedPeer and
parity_removeReservedPeer.
They are useful for adjusting the trusted peer set during runtime,
without requiring restarting the server.
This commit adds all changes needed for the merge of swarm-network-rewrite.
The changes:
- build: increase linter timeout
- contracts/ens: export ensNode
- log: add Output method and enable fractional seconds in format
- metrics: relax test timeout
- p2p: reduced some log levels, updates to simulation packages
- rpc: increased maxClientSubscriptionBuffer to 20000
ToECDSAPub was unsafe because it returned a non-nil key with nil X, Y in
case of invalid input. This change replaces ToECDSAPub with
UnmarshalPubkey across the codebase.
This applies spec changes from ethereum/EIPs#1049 and adds support for
pluggable identity schemes.
Some care has been taken to make the "v4" scheme standalone. It uses
public APIs only and could be moved out of package enr at any time.
A couple of minor changes were needed to make identity schemes work:
- The sequence number is now updated in Set instead of when signing.
- Record is now copy-safe, i.e. calling Set on a shallow copy doesn't
modify the record it was copied from.
This change removes a peer information from dialing history
when peer is removed from static list. It allows to force a
server to re-dial concrete peer if it is needed.
In our case we are running geth node on mobile devices, and
it is common for a network connection to flap on mobile.
Almost every time it flaps or network connection is changed
from cellular to wifi peers are disconnected with read
timeout. And usually it takes 30 seconds (default expiration
timeout) to recover connection with static peers after
connectivity is restored.
This change allows us to reconnect with peers almost
immediately and it seems harmless enough.
I forgot to change the check in udp.go when I changed Table.bond to be
based on lastPong instead of node presence in db. Rename lastPong to
bondTime and add hasBond so it's clearer what this DB key is used for
now.
* cmd,node,rpc: add allowedHosts to prevent dns rebinding attacks
* p2p,node: Fix bug with dumpconfig introduced in r54aeb8e4c0bb9f0e7a6c67258af67df3b266af3d
* rpc: add wildcard support for rpcallowedhosts + go fmt
* cmd/geth, cmd/utils, node, rpc: ignore direct ip(v4/6) addresses in rpc virtual hostnames check
* http, rpc, utils: make vhosts into map, address review concerns
* node: change log messages to use geth standard (not sprintf)
* rpc: fix spelling
* p2p: add DialRatio for configuration of inbound vs. dialed connections
* p2p: add connection flags to PeerInfo
* p2p/netutil: add SameNet, DistinctNetSet
* p2p/discover: improve revalidation and seeding
This changes node revalidation to be periodic instead of on-demand. This
should prevent issues where dead nodes get stuck in closer buckets
because no other node will ever come along to replace them.
Every 5 seconds (on average), the last node in a random bucket is
checked and moved to the front of the bucket if it is still responding.
If revalidation fails, the last node is replaced by an entry of the
'replacement list' containing recently-seen nodes.
Most close buckets are removed because it's very unlikely we'll ever
encounter a node that would fall into any of those buckets.
Table seeding is also improved: we now require a few minutes of table
membership before considering a node as a potential seed node. This
should make it less likely to store short-lived nodes as potential
seeds.
* p2p/discover: fix nits in UDP transport
We would skip sending neighbors replies if there were fewer than
maxNeighbors results and CheckRelayIP returned an error for the last
one. While here, also resolve a TODO about pong reply tokens.
This commit affects p2p/discv5 "topic discovery" by running it on
the same UDP port where the old discovery works. This is realized
by giving an "unhandled" packet channel to the old v4 discovery
packet handler where all invalid packets are sent. These packets
are then processed by v5. v5 packets are always invalid when
interpreted by v4 and vice versa. This is ensured by adding one
to the first byte of the packet hash in v5 packets.
DiscoveryV5Bootnodes is also changed to point to new bootnodes
that are implementing the changed packet format with modified
hash. Existing and new v5 bootnodes are both running on different
ports ATM.
* core/types, core/vm, eth, tests: regenerate gencodec files
* Makefile: update devtools target
Install protoc-gen-go and print reminders about npm, solc and protoc.
Also switch to github.com/kevinburke/go-bindata because it's more
maintained.
* contracts/ens: update contracts and regenerate with solidity v0.4.19
The newer upstream version of the FIFSRegistrar contract doesn't set the
resolver anymore. The resolver is now deployed separately.
* contracts/release: regenerate with solidity v0.4.19
* contracts/chequebook: fix fallback and regenerate with solidity v0.4.19
The contract didn't have a fallback function, payments would be rejected
when compiled with newer solidity. References to 'mortal' and 'owned'
use the local file system so we can compile without network access.
* p2p/discv5: regenerate with recent stringer
* cmd/faucet: regenerate
* dashboard: regenerate
* eth/tracers: regenerate
* internal/jsre/deps: regenerate
* dashboard: avoid sed -i because it's not portable
* accounts/usbwallet/internal/trezor: fix go generate warnings
p2p/simulations: introduce dialBan
- Refactor simulations/network connection getters to support
avoiding simultaneous dials between two peers If two peers dial
simultaneously, the connection will be dropped to help avoid
that, we essentially lock the connection object with a
timestamp which serves as a ban on dialing for a period of time
(dialBanTimeout).
- The connection getter InitConn can be wrapped and passed to the
nodes via adapters.NodeConfig#Reachable field and then used by
the respective services when they initiate connections. This
massively stablise the emerging connectivity when running with
hundreds of nodes bootstrapping a network.
p2p: add Inbound public method to p2p.Peer
p2p/simulations: Add server id to logs to support debugging
in-memory network simulations when multiple peers are logging.
p2p: SetupConn now returns error. The dialer checks the error and
only calls resolve if the actual TCP dial fails.
This commit introduces a network simulation framework which
can be used to run simulated networks of devp2p nodes. The
intention is to use this for testing protocols, performing
benchmarks and visualising emergent network behaviour.
Using a Timer over Ticker seems to be a lot better, though I cannot fully
account for why that it behaves so (since Ticker should be more bursty, but not
necessarily more active over time, but that may depend on how long window it
uses to decide on when to tick next)
* p2p/discover, p2p/discv5: add marshaling methods to Node
* p2p/netutil: make Netlist decodable from TOML
* common/math: encode nil HexOrDecimal256 as 0x0
* cmd/geth: add --config file flag
* cmd/geth: add missing license header
* eth: prettify Config again, fix tests
* eth: use gasprice.Config instead of duplicating its fields
* eth/gasprice: hide nil default from dumpconfig output
* cmd/geth: hide genesis block in dumpconfig output
* node: make tests compile
* console: fix tests
* cmd/geth: make TOML keys look exactly like Go struct fields
* p2p: use discovery by default
This makes the zero Config slightly more useful. It also fixes package
node tests because Node detects reuse of the datadir through the
NodeDatabase.
* cmd/geth: make ethstats URL settable through config file
* cmd/faucet: fix configuration
* cmd/geth: dedup attach tests
* eth: add comment for DefaultConfig
* eth: pass downloader.SyncMode in Config
This removes the FastSync, LightSync flags in favour of a more
general SyncMode flag.
* cmd/utils: remove jitvm flags
* cmd/utils: make mutually exclusive flag error prettier
It now reads:
Fatal: flags --dev, --testnet can't be used at the same time
* p2p: fix typo
* node: add DefaultConfig, use it for geth
* mobile: add missing NoDiscovery option
* cmd/utils: drop MakeNode
This exposed a couple of places that needed to be updated to use
node.DefaultConfig.
* node: fix typo
* eth: make fast sync the default mode
* cmd/utils: remove IPCApiFlag (unused)
* node: remove default IPC path
Set it in the frontends instead.
* cmd/geth: add --syncmode
* cmd/utils: make --ipcdisable and --ipcpath mutually exclusive
* cmd/utils: don't enable WS, HTTP when setting addr
* cmd/utils: fix --identity
The p2p packages can now be configured to restrict all communication to
a certain subset of IP networks. This feature is meant to be used for
private networks.
The discovery DHT contains a number of hosts with LAN and loopback IPs.
These get relayed because some implementations do not perform any checks
on the IP.
go-ethereum already prevented relay in most cases because it verifies
that the host actually exists before adding it to the local table. But
this verification causes other issues. We have received several reports
where people's VPSs got shut down by hosting providers because sending
packets to random LAN hosts is indistinguishable from a slow port scan.
The new check prevents sending random packets to LAN by discarding LAN
IPs sent by Internet hosts (and loopback IPs from LAN and Internet
hosts). The new check also blacklists almost all currently registered
special-purpose networks assigned by IANA to avoid inciting random
responses from services in the LAN.
As another precaution against abuse of the DHT, ports below 1024 are now
considered invalid.
The new package contains three things for now:
- IP network list parsing and matching
- The WSAEMSGSIZE workaround, which is duplicated in p2p/discover and
p2p/discv5.
Port mapper auto discovery used to run immediately after parsing the
--nat flag, giving it a slight performance boost. But this is becoming
inconvenient because we create node.Node for all geth operations
including account management and bare chain interaction. Delay
autodiscovery until the first use instead, which avoids any network
interaction until the node is actually started.
On Windows, UDPConn.ReadFrom returns an error for packets larger
than the receive buffer. The error is not marked temporary, causing
our loop to exit when the first oversized packet arrived. The fix
is to treat this particular error as temporary.
Fixes: #1579, #2087
Updates: #2082
This change makes it possible to add peers without providing their IP
address. The endpoint of the target node is resolved using the discovery
protocol.