Commit Graph

263 Commits

Author SHA1 Message Date
Felix Lange 8dcbdcad0a p2p: track write errors and prevent writes during shutdown
As of this commit, we no longer rely on the protocol handler to report
write errors in a timely fashion. When a write fails, shutdown is
initiated immediately and no new writes can start. This will also
prevent new writes from starting after Server.Stop has been called.
2015-06-15 15:03:46 +02:00
Felix Lange a8e4cb6dfe p2p/discover: use separate rand.Source instances in tests
rand.Source isn't safe for concurrent use.
2015-06-10 15:18:01 +02:00
Felix Lange 261a8077c4 p2p/discover: deflake TestUDP_successfulPing 2015-06-10 13:08:21 +02:00
Péter Szilágyi 1cbbfbe7fa p2p: fix a close race in the dial test 2015-06-09 22:26:26 +03:00
Felix Lange 3239aca69b p2p: bump global write timeout to 20s
The previous value of 5 seconds causes timeouts for legitimate messages
if large messages are sent.
2015-06-09 17:07:10 +02:00
Péter Szilágyi ff84352fb7 p2p: fix close data race 2015-06-09 16:12:24 +03:00
Felix Lange fc6a5ae3ec p2p/nat: add timeout for UPnP SOAP requests 2015-06-04 22:25:43 +02:00
Felix Lange 8e4512a5e7 p2p/nat: bump timeout in TestAutoDiscRace 2015-05-28 01:09:26 +02:00
Péter Szilágyi 612f01400f p2p/discover: bond with seed nodes too (runs only if findnode failed) 2015-05-26 23:30:41 +02:00
Péter Szilágyi 3630432dfb p2p/discovery: fix a cornercase loop if no seeds or bootnodes are known 2015-05-26 23:30:40 +02:00
Péter Szilágyi f539ed1e66 p2p/discover: force refresh if the table is empty 2015-05-26 23:30:40 +02:00
Péter Szilágyi 5076170f34 p2p/discover: permit temporary bond failures for previously known nodes 2015-05-26 23:30:40 +02:00
Péter Szilágyi 6078aa08eb p2p/discover: watch find failures, evacuate on too many, rebond if failed 2015-05-26 23:30:40 +02:00
Péter Szilágyi 64174f196f p2p/discover: add support for counting findnode failures 2015-05-26 23:30:40 +02:00
Péter Szilágyi 68898a4d6b p2p: fix Self() panic if listening is disabled 2015-05-26 19:16:05 +03:00
Péter Szilágyi e1a0ee8fc5 cmd/geth, cmd/utils, eth, p2p: pass and honor a no discovery flag 2015-05-26 19:07:24 +03:00
Péter Szilágyi 278183c7e7 eth, p2p: start the p2p server even if maxpeers == 0 2015-05-26 17:49:37 +03:00
Felix Lange 9e1fd70b50 p2p: decrease frameReadTimeout to 30s
This detects hanging connections sooner. We send a ping every 15s and
other implementation have similar limits.
2015-05-25 01:17:14 +02:00
Felix Lange 1440f9a37a p2p: new dialer, peer management without locks
The most visible change is event-based dialing, which should be an
improvement over the timer-based system that we have at the moment.
The dialer gets a chance to compute new tasks whenever peers change or
dials complete. This is better than checking peers on a timer because
dials happen faster. The dialer can now make more precise decisions
about whom to dial based on the peer set and we can test those
decisions without actually opening any sockets.

Peer management is easier to test because the tests can inject
connections at checkpoints (after enc handshake, after protocol
handshake).

Most of the handshake stuff is now part of the RLPx code. It could be
exported or move to its own package because it is no longer entangled
with Server logic.
2015-05-25 01:17:14 +02:00
Felix Lange 9f38ef5d97 p2p/discover: add ReadRandomNodes 2015-05-25 01:17:14 +02:00
Felix Lange 64564da20b p2p: decrease maximum message size for devp2p to 1kB
The previous limit was 10MB which is unacceptable for all kinds
of reasons, the most important one being that we don't want to
allow the remote side to make us allocate 10MB at handshake time.
2015-05-25 01:17:14 +02:00
Felix Lange dbdc5fd4b3 p2p: delete Server.Broadcast 2015-05-25 01:17:14 +02:00
Péter Szilágyi cbd3ae6906 p2p/discover: fix #838, evacuate self entries from the node db 2015-05-21 19:41:46 +03:00
Péter Szilágyi af24c271c7 p2p/discover: fix database presistency test folder 2015-05-21 19:28:10 +03:00
Jeffrey Wilcke 90b94e64fc Merge pull request #971 from fjl/p2p-limit-tweaks
p2p: tweak connection limits
2015-05-14 08:15:51 -07:00
Felix Lange d2f119cf9b p2p/discover: limit open files for node database 2015-05-14 15:01:13 +02:00
Felix Lange 206fe25971 p2p: remove testlog 2015-05-14 14:56:34 +02:00
Felix Lange 7fa2607bd1 p2p/discover: bump maxBondingPingPongs to 16
This should increase the speed a bit because all findnode
results (up to 16) can be verified at the same time.
2015-05-14 14:53:29 +02:00
Felix Lange 691cb90284 p2p: log remote reason when disconnect is requested
The returned reason is currently not used except for the log
message. This change makes the log messages a bit more useful.
The handshake code also returns the remote reason.
2015-05-14 14:53:29 +02:00
Felix Lange c14de2e973 p2p/nat: tweak port mapping log messages and levels
People stil get confused about the messages. This commit changes
the levels so that the only thing printed at the default level (info)
is a successful mapping.
2015-05-14 12:54:59 +02:00
Felix Lange 663d4e0aff p2p/nat: add test for UPnP auto discovery via SSDP
The test listens for multicast UDP packets on the default interface
because I couldn't get it to work reliably on loopback without massive
changes to goupnp. This means that the test might fail when there is a
UPnP-enabled router attached on that interface. I checked that locally
by looping the test and it passes reliably because the local SSDP server
always responds faster.
2015-05-14 12:13:19 +02:00
Felix Lange 983f5a717a p2p/nat: fix concurrent access to autodisc Interface
Concurrent calls to Interface methods on autodisc could return a "not
discovered" error if the discovery did not finish before the call.
autodisc.wait expected the done channel to carry the found Interface
but it was closed instead.

The fix is to use sync.Once for now, which is easier to get right.
And there is a test. Finally.

This will have to change again when we introduce re-discovery.
2015-05-14 03:53:11 +02:00
Felix Lange 7efeb4bd96 p2p: bump maxAcceptConns and defaultDialTimout
On the test network, we've seen that it becomes harder to connect
if the queues are so short.
2015-05-14 03:48:28 +02:00
Felix Lange 251846d65a p2p/discover: fix out-of-bounds slicing for chunked neighbors packets
The code assumed that Table.closest always returns at least 13 nodes.
This is not true for small tables (e.g. during bootstrap).
2015-05-13 21:49:04 +02:00
subtly 8eef2b765a fix test. 2015-05-13 20:15:01 +02:00
subtly a32693770c Manual send of multiple neighbours packets. Test receiving multiple neighbours packets. 2015-05-13 20:03:17 +02:00
subtly 7473c93668 UDP Interop. Limit datagrams to 1280bytes.
We don't have a UDP which specifies any messages that will be 4KB. Aside from being implemented for months and a necessity for encryption and piggy-backing packets, 1280bytes is ideal, and, means this TODO can be completed!

Why 1280 bytes?
* It's less than the default MTU for most WAN/LAN networks. That means fewer fragmented datagrams (esp on well-connected networks).
* Fragmented datagrams and dropped packets suck and add latency while OS waits for a dropped fragment to never arrive (blocking readLoop())
* Most of our packets are < 1280 bytes.
* 1280 bytes is minimum datagram size and MTU for IPv6 -- on IPv6, a datagram < 1280bytes will *never* be fragmented.

UDP datagrams are dropped. A lot! And fragmented datagrams are worse. If a datagram has a 30% chance of being dropped, then a fragmented datagram has a 60% chance of being dropped. More importantly, we have signed packets and can't do anything with a packet unless we receive the entire datagram because the signature can't be verified. The same is true when we have encrypted packets.

So the solution here to picking an ideal buffer size for receiving datagrams is a number under 1400bytes. And the lower-bound value for IPv6 of 1280 bytes make's it a non-decision. On IPv4 most ISPs and 3g/4g/let networks have an MTU just over 1400 -- and *never* over 1500. Never -- that means packets over 1500 (in reality: ~1450) bytes are fragmented. And probably dropped a lot.

Just to prove the point, here are pings sending non-fragmented packets over wifi/ISP, and a second set of pings via cell-phone tethering. It's important to note that, if *any* router between my system and the EC2 node has a lower MTU, the message would not go through:

On wifi w/normal ISP:
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=104.831 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=119.004 ms
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 104.831/111.918/119.004/7.087 ms
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1


Tethering to O2:
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=107.844 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=105.127 ms
1458 bytes from 52.6.250.242: icmp_seq=2 ttl=42 time=120.483 ms
1458 bytes from 52.6.250.242: icmp_seq=3 ttl=42 time=102.136 ms
2015-05-13 19:03:00 +02:00
Bas van Kervel 95773b9673 removed redundant newlines in import block 2015-05-12 15:20:53 +02:00
Bas van Kervel b79dd188d9 replaced several path.* with filepath.* which is platform independent 2015-05-12 14:24:11 +02:00
Felix Lange d4f0a67323 p2p: drop connections with no matching protocols 2015-05-08 16:09:55 +02:00
Felix Lange 9c0f36c46d p2p: use maxDialingConns instead of maxAcceptConns as dial limit 2015-05-08 16:09:55 +02:00
Felix Lange 914e57e49b p2p: fix disconnect at capacity
With the introduction of static/trusted nodes, the peer count
can go above MaxPeers. Update the capacity check to handle this.
While here, decouple the trusted nodes check from the handshake
by passing a function instead.
2015-05-08 16:09:54 +02:00
Péter Szilágyi 8735e5addd p2p: increase the handshake timeout in the tests 2015-05-07 15:30:56 +03:00
Péter Szilágyi 4d5a719f25 cmd, eth, p2p: introduce pending peer cli arg, add tests 2015-05-07 15:30:56 +03:00
Péter Szilágyi af93217775 p2p: reduce the concurrent handshakes to 10/10 in/out 2015-05-07 15:22:09 +03:00
Péter Szilágyi 2060bc8bac p2p: fix dial throttling race condition 2015-05-07 15:22:08 +03:00
Péter Szilágyi 29fef349ef p2p: fix a dialing race in the throttler 2015-05-07 15:22:08 +03:00
Péter Szilágyi 3953bf0031 p2p: limit the outbound dialing too 2015-05-07 15:22:08 +03:00
Jeffrey Wilcke a0cb1945ae Merge pull request #866 from fjl/p2p-last-minute
Last minute p2p fixes
2015-05-06 14:49:52 -07:00
Felix Lange 3e2a928caa p2p: stop dialing at half the maximum peer count 2015-05-06 23:44:51 +02:00
Felix Lange 6a2fec5309 p2p, whisper: use glog for peer-level logging 2015-05-06 23:19:14 +02:00
Felix Lange bcfd788661 p2p/discover: bump packet timeouts to 500ms 2015-05-06 22:59:00 +02:00
Felix Lange fd4b75cfa8 p2p/nat: less confusing error logging 2015-05-06 22:58:03 +02:00
obscuren 062fa049d0 fixed merge issue 2015-05-06 22:54:21 +02:00
Felix Lange 2adcc31bb4 p2p/discover: new distance metric based on sha3(id)
The previous metric was pubkey1^pubkey2, as specified in the Kademlia
paper. We missed that EC public keys are not uniformly distributed.
Using the hash of the public keys addresses that. It also makes it
a bit harder to generate node IDs that are close to a particular node.
2015-05-06 16:10:41 +02:00
Péter Szilágyi 4accc187d5 eth, p2p: add trusted node list beside static list 2015-05-04 13:59:51 +03:00
Péter Szilágyi 54db54931e p2p: add static node dialing test 2015-05-04 13:08:42 +03:00
Péter Szilágyi e82ddd9198 p2p: correct a leftover trusted -> static 2015-04-30 19:34:33 +03:00
Péter Szilágyi 413ace37d3 eth, p2p: rename trusted nodes to static, drop inbound extra slots 2015-04-30 19:32:48 +03:00
Péter Szilágyi 701591b403 cmd, eth, p2p: fix review issues enumerated by Felix 2015-04-30 16:15:29 +03:00
Péter Szilágyi 1528dbc171 p2p: add trust check to handshake, test privileged connectivity
Conflicts:
	p2p/server_test.go
2015-04-30 16:06:47 +03:00
Péter Szilágyi 14f32a0c3a p2p: reduce the severity of a debug log 2015-04-30 16:04:09 +03:00
Péter Szilágyi de0549fabb cmd/geth, cmd/mist, cmd/utils, eth, p2p: support trusted peers 2015-04-30 16:03:10 +03:00
Felix Lange 72ab6d3255 p2p/discover: track sha3(ID) in Node 2015-04-30 15:02:23 +02:00
Felix Lange b34a8ef624 p2p, p2p/discover: protocol version 4 2015-04-30 14:57:34 +02:00
Felix Lange fc747ef4a6 p2p/discover: new endpoint format
This commit changes the discovery protocol to use the new "v4" endpoint
format, which allows for separate UDP and TCP ports and makes it
possible to discover the UDP address after NAT.
2015-04-30 14:57:33 +02:00
obscuren 01e3d694a6 p2p: added received at to peer message
p2p.Msg.ReceivedAt can be used for determining block propagation from
begining to end.
2015-04-29 22:49:58 +02:00
Péter Szilágyi b569550a39 p2p/discover: fix api issues caused by leveldb update 2015-04-28 13:57:57 +03:00
Péter Szilágyi 4992765032 p2p/discover: fix goroutine leak due to blocking on sync.Once 2015-04-28 10:28:04 +03:00
Péter Szilágyi 437cf4b3ac p2p/discover: add node expirer and related tests 2015-04-27 17:38:28 +03:00
Péter Szilágyi a136e2bb22 p2p/discover: parametrize nodedb version, add persistency tests 2015-04-27 15:28:17 +03:00
Péter Szilágyi 75fd738dea p2p/discover: drop a superfluous warning 2015-04-27 15:06:31 +03:00
Péter Szilágyi 706da56f75 p2p/discover: wrap the pinger to update the node db too 2015-04-27 14:56:42 +03:00
Péter Szilágyi 85b4b44235 p2p/discover: use iterator based seeding, drop old protocol test 2015-04-27 14:45:35 +03:00
Péter Szilágyi 8de8f61d36 p2p/discover: write the basic tests, catch RLP bug 2015-04-27 12:33:06 +03:00
Péter Szilágyi 0201c04b95 p2p/discovery: fix issues raised in the nodeDb PR 2015-04-27 10:19:16 +03:00
Péter Szilágyi 8646365b42 cmd/bootnode, eth, p2p, p2p/discover: use a fancier db design 2015-04-24 18:04:41 +03:00
Péter Szilágyi 6def110c37 cmd/bootnode, eth, p2p, p2p/discover: clean up the seeder and mesh into eth. 2015-04-24 11:33:55 +03:00
Péter Szilágyi 971702e7a1 p2p/discovery: fix broken tests due to API update 2015-04-24 11:23:20 +03:00
Péter Szilágyi af923c965f p2p/discovery: use the seed table for finding nodes, auto drop stale ones 2015-04-24 11:23:20 +03:00
Péter Szilágyi 5f735d6fce cmd, eth, p2p, p2p/discover: init and clean up the seed cache 2015-04-24 11:23:20 +03:00
Felix Lange 936c8e19ff p2p/discover: store nodes in leveldb 2015-04-24 11:23:20 +03:00
Felix Lange 635b66acdc p2p: return zero node from Self if the server is not running
This helps with fixing the tests for cmd/geth to run without networking.
2015-04-22 12:31:19 +02:00
Felix Lange 9c7281c17e p2p: make DiscReason bigger than byte
We decode into [1]DiscReason in a few places. That doesn't work anymore
because package rlp no longer accepts RLP lists for byte arrays.
2015-04-17 14:45:10 +02:00
Felix Lange eedbb1ee9a p2p/discover: use rlp.DecodeBytes 2015-04-17 14:45:09 +02:00
Felix Lange 56a48101dc cmd/rlpdump, cmd/utils, eth, p2p, whisper: use rlp input limit 2015-04-17 14:45:09 +02:00
Felix Lange 5528abc795 p2p: fix the dial timer
The dial timer was not reset properly when the peer count reached
MaxPeers.
2015-04-17 08:17:01 +02:00
obscuren 474aa924ca p2p: added limiter function to limit package broadcasting 2015-04-14 12:47:31 +02:00
Felix Lange 0217652d1b p2p/discover: improve timer handling for reply timeouts 2015-04-13 18:08:11 +02:00
Felix Lange b8aeb04f6f p2p/discover: remove unused field Node.activeStamp 2015-04-13 17:44:14 +02:00
Felix Lange b9929d289d p2p: fix unsynchronized map access during Server shutdown
removePeer can be called even after listenLoop and dialLoop have returned.
2015-04-13 17:37:32 +02:00
Felix Lange 995fab2ebc p2p: fix yet another disconnect hang
Peer.readLoop will only terminate if the connection is closed. Fix the
hang by closing the connection before waiting for readLoop to terminate.

This also removes the british disconnect procedure where we're waiting
for the remote end to close the connection. I have confirmed with
@subtly that cpp-ethereum doesn't adhere to it either.
2015-04-13 17:34:08 +02:00
Felix Lange 79a6782c1c p2p: fix goroutine leak when handshake read fails
This regression was introduced in b3c058a9e4.
2015-04-13 17:06:19 +02:00
Felix Lange c5332537f5 p2p: limit number of lingering inbound pre-handshake connections
This is supposed to apply some back pressure so Server is not accepting
more connections than it can actually handle. The current limit is 50.
This doesn't really need to be configurable, but we'll see how it
behaves in our test nodes and adjust accordingly.
2015-04-10 17:24:41 +02:00
Felix Lange 56977c225e p2p: use RLock instead of Lock for pre-dial checks 2015-04-10 17:23:09 +02:00
Felix Lange b3c058a9e4 p2p: improve disconnect signaling at handshake time
As of this commit, p2p will disconnect nodes directly after the
encryption handshake if too many peer connections are active.
Errors in the protocol handshake packet are now handled more politely
by sending a disconnect packet before closing the connection.
2015-04-10 16:57:56 +02:00
Felix Lange 99a1db2d40 p2p: don't mess with the socket deadline in Peer.readLoop
netWrapper already sets a read deadline in ReadMsg.
2015-04-10 13:26:28 +02:00
Felix Lange 145330fdf2 p2p: properly decrement peer wait group counter for setup errors 2015-04-10 13:26:27 +02:00
Felix Lange f1d710af00 p2p: fix Peer shutdown deadlocks
There were multiple synchronization issues in the disconnect handling,
all caused by the odd special-casing of Peer.readLoop errors. Remove the
special handling of read errors and make readLoop part of the Peer
WaitGroup.

Thanks to @Gustav-Simonsson for pointing at arrows in a diagram
and playing rubber-duck.
2015-04-10 13:26:27 +02:00
Felix Lange 22d1f0faf1 p2p: improve peer selection logic
This commit introduces a new (temporary) peer selection
strategy based on random lookups.

While we're here, also implement the TODOs in dialLoop.
2015-04-10 13:26:27 +02:00