Windows socket error numbers are not reported using errno, but with
WSAGetLastError(). _getdns_errnostr() and friends as implemented on
Windows don't work for errors resulting from file open/close/read/write
etc.
So add a parallel set of functions specifically for file errors.
When using Stubby as a system DNS over TLS resolver with a Internet
connection that disconnects and reconnects from time to time there is often
a long waiting time (~20 minutes) after the connection reconnects before
DNS queries start to work again.
This is because in this particular case all the upstream TLS TCP
connections in Stubby are stuck waiting for upstream server response.
Which will never arrive since the host external IP address might have
changed and / or NAT router connection tracking entries for these TCP
connections might have been removed when the Internet connection
reconnected.
By default Linux tries to retransmit data on a TCP connection 15 times
before finally terminating it.
This takes 16 - 20 minutes, which is obviously a very long time to wait for
system DNS resolving to work again.
This is a real problem on weak mobile connections.
Thankfully, there is a "TCP_USER_TIMEOUT" per-socket option that allows
explicitly setting how long the network stack will wait in such cases.
Let's add a matching "tcp_send_timeout" option to getdns that allows
setting this option on outgoing TCP sockets.
For backward compatibility the code won't try to set it by default.
With this option set to, for example, 15 seconds Stubby recovers pretty
much instantly in such cases.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Most of the time we only need a read _or_ a write callback registered
with libuv - for example, on a UDP request a write callback is
registered, when executed the write callback performs the write,
deregisters itself, and registers a read callback.
However there is one case where getdns registers both read and write
callbacks: when a backlog of TCP requests is going to the same upstream
resolver, we use a single fd and queue the requests. In this instance we
want to listen for both read (to get responses for requests we've
already sent) and write (to continue to send our pending requests).
libuv, like most event libraries, only allows one callback to be
registered per fd. To get notification for both reads and writes, you
should examine the event flags and have appropriate conditional logic
within the single callback. Today getdns incorrectly tries to register
two separate poll_t with libuv, one for read and one for write - this
results in a crash (internal libuv assertion guaranteeing that only a
single poll_t is registered per fd).
Testing was done by using flamethrower
(https://github.com/DNS-OARC/flamethrower) to toss queries at a program
that embeds getdns.
Note that a higher qps trigger a _different_ getdns/libuv crashing bug
that occurs when the TCP backlog grows so large that requests start to
time out. That crash is not addressed in this PR, and will be more
involved to fix.