From 27dbee1b66bbdea7cf6d8168138a497c32cb0af4 Mon Sep 17 00:00:00 2001 From: David Fifield Date: Sun, 1 Aug 2021 19:18:14 -0600 Subject: [PATCH] Make pollChan buffered, and send on it only once. Formerly we sent twice on pollChan, but because it was unbuffered, the second send was almost always dropped. KCP's own ACK packets should also serve as another source of what are effectively polling queries at a rate proportional to the rate at which we are receiving. I did some performance tests of downloading 10 MiB between two servers with 100 ms RTT between them. Server: dnstt-server -udp :53 -privkey-file server.key t.example.com 127.0.0.1:9321 ncat -l -k -v 9321 --send-only --sh-exec 'dd bs=1M count=10 if=/dev/urandom' Client: dnstt-client -pubkey-file server.pub t.example.com 127.0.0.1:7000 ncat --recv-only 127.0.0.1 7000 | pv -t -r -a -b -i 0.2 > /dev/null I did the download under every treatment twice and recorded the download rate in KiB/s. First, the results for the commit before this one. I also hacked in fewer sends on pollChan for each packet received. The result for 1 poll are about the same as for 2 polls, which is expected, with the observation that the unbuffered pollChan was usually dropping the second send. 0 polls results in a very slow rate (possibly driven only by smux keepalive packets). ("~" means I stopped waiting for the download after about 2 minutes.) resolver method pollChan cap SetACKNoDelay polls KiB/s KiB/s -------- ------ ------------ ------------- ----- ----- ----- direct udp unbuffered false 2 146 159 (status quo before this commit) dns.google udp unbuffered false 2 58.3 60.2 (status quo before this commit) dns.google doh unbuffered false 2 125 126 (status quo before this commit) direct udp unbuffered false 1 155 165 dns.google udp unbuffered false 1 57.5 58.7 dns.google doh unbuffered false 1 124 123 direct udp unbuffered false 0 ~7 ~11 dns.google udp unbuffered false 0 ~6 ~5 dns.google doh unbuffered false 0 ~6 ~6 Now, the result after this commit. A buffered pollChan with 1 poll per receive has almost identical performance to an unbuffered pollChan with 1 or 2 polls per receive, which is expected. Using 2 polls rather than 1 actually helps performance a fair bit in the direct/UDP and Google/UDP treatments, but hurts performance in the Google/DoH treatment. 0 polls still yields poor performance. resolver method pollChan cap SetACKNoDelay polls KiB/s KiB/s -------- ------ ------------ ------------- ----- ----- ----- direct udp 16 false 2 174 175 dns.google udp 16 false 2 76.0 75.3 dns.google doh 16 false 2 68.7 68.0 direct udp 16 false 1 149 151 (this commit) dns.google udp 16 false 1 60.2 60.3 (this commit) dns.google doh 16 false 1 123 121 (this commit) direct udp 16 false 0 ~7 ~8 dns.google udp 16 false 0 ~5 ~5 dns.google doh 16 false 0 ~6 ~7 I tried the additional modification of calling conn.SetACKNoDelay(true). My guess was that this would cause every received data packet to be ACKed immediately, which should have the same function as a poll. Unexpectedly for me, SetACKNoDelay(true) actually slows down the direct/UDP and Google/DoH cases a lot. But strangely, Google/UDP becomes faster (and 1 poll is even faster than 2 polls in that case). Strangest of all, SetACKNoDelay(true) makes the 0-poll Google/UDP and Google/DoH treatments run reasonably fast. My best guess as to why that is the case is that routing through a Google resolver tends to disorder the packet sequence, which maybe results in more ACKs, which effectively act as polls. resolver method pollChan cap SetACKNoDelay polls KiB/s KiB/s -------- ------ ------------ ------------- ----- ----- ----- direct udp 16 true 2 87.7 84.2 dns.google udp 16 true 2 82.7 68.9 dns.google doh 16 true 2 59.8 60.8 direct udp 16 true 1 54.1 47.8 dns.google udp 16 true 1 97.5 100 dns.google doh 16 true 1 71.4 70.9 direct udp 16 true 0 ~9 ~8 dns.google udp 16 true 0 48.0 48.6 dns.google doh 16 true 0 59.2 59.2 --- dnstt-client/dns.go | 49 +++++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/dnstt-client/dns.go b/dnstt-client/dns.go index 564d2cf..b15f211 100644 --- a/dnstt-client/dns.go +++ b/dnstt-client/dns.go @@ -33,6 +33,10 @@ const ( initPollDelay = 500 * time.Millisecond maxPollDelay = 10 * time.Second pollDelayMultiplier = 2.0 + + // A limit on the number of empty poll requests we may send in a burst + // as a result of receiving data. + pollLimit = 16 ) // base32Encoding is a base32 encoding without padding. @@ -74,7 +78,7 @@ func NewDNSPacketConn(transport net.PacketConn, addr net.Addr, domain dns.Name) c := &DNSPacketConn{ clientID: clientID, domain: domain, - pollChan: make(chan struct{}), + pollChan: make(chan struct{}, pollLimit), QueuePacketConn: turbotunnel.NewQueuePacketConn(clientID, 0), } go func() { @@ -153,20 +157,27 @@ func nextPacket(r *bytes.Reader) ([]byte, error) { // extracts its payload and breaks it into packets, and stores the packets in a // queue to be returned from a future call to c.ReadFrom. // -// Whenever we receive a response with a non-empty payload, we send twice on -// c.pollChan to permit sendLoop to send two immediate polling queries. The -// intuition behind polling immediately after receiving is that we know the -// server has just had something to send, it may need to send more, and the only -// way it can send is if we give it a query to respond to. The intuition behind -// doing *two* polls when we receive is similar to TCP slow start: we want to -// maintain some number of queries "in flight", and the faster the server is -// sending, the higher that number should be. If we polled only once in response -// to received data, we would tend to have only one query in flight at a time, -// ping-pong style. The first polling request replaces the in-flight request -// that has just finished in our receiving data; the second grows the effective -// in-flight window proportionally to the rate at which data-carrying responses -// are being received. Compare to Eq. (2) of -// https://tools.ietf.org/html/rfc5681#section-3.1; the differences are that we +// Whenever we receive a DNS response containing at least one data packet, we +// send on c.pollChan to permit sendLoop to send an immediate polling queries. +// KCP itself will also send an ACK packet for incoming data, which is +// effectively a second poll. Therefore, each time we receive data, we send up +// to 2 polling queries (or 1 + f polling queries, if KCP only ACKs an f +// fraction of incoming data). We say "up to" because sendLoop will discard an +// empty polling query if it has an organic non-empty packet to send (this goes +// also for KCP's organic ACK packets). +// +// The intuition behind polling immediately after receiving is that if server +// has just had something to send, it may have more to send, and in order for +// the server to send anything, we must give it a query to respond to. The +// intuition behind polling *2 times* (or 1 + f times) is similar to TCP slow +// start: we want to maintain some number of queries "in flight", and the faster +// the server is sending, the higher that number should be. If we polled only +// once for each received packet, we would tend to have only one query in flight +// at a time, ping-pong style. The first polling query replaces the in-flight +// query that has just finished its duty in returning data to us; the second +// grows the effective in-flight window proportional to the rate at which +// data-carrying responses are being received. Compare to Eq. (2) of +// https://tools.ietf.org/html/rfc5681#section-3.1. The differences are that we // count messages, not bytes, and we don't maintain an explicit window. If a // response comes back without data, or if a query or response is dropped by the // network, then we don't poll again, which decreases the effective in-flight @@ -205,16 +216,14 @@ func (c *DNSPacketConn) recvLoop(transport net.PacketConn) error { } // If the payload contained one or more packets, permit sendLoop - // to poll immediately. + // to poll immediately. ACKs on received data will effectively + // serve as another stream of polls whose rate is proportional + // to the rate of incoming packets. if any { select { case c.pollChan <- struct{}{}: default: } - select { - case c.pollChan <- struct{}{}: - default: - } } } } -- 2.11.4.GIT