1 Filename: 203-https-frontend.txt
2 Title: Avoiding censorship by impersonating an HTTPS server
10 One frequently proposed approach for censorship resistance is that
11 Tor bridges ought to act like another TLS-based service, and deliver
12 traffic to Tor only if the client can demonstrate some shared
13 knowledge with the bridge.
15 In this document, I discuss some design considerations for building
16 such systems, and propose a few possible architectures and designs.
20 Most of our previous work on censorship resistance has focused on
21 preventing passive attackers from identifying Tor bridges, or from
22 doing so cheaply. But active attackers exist, and exist in the wild:
23 right now, the most sophisticated censors use their anti-Tor passive
24 attacks only as a first round of filtering before launching a
25 secondary active attack to confirm suspected Tor nodes.
27 One idea we've been talking about for a while is that of having a
28 service that looks like an HTTPS service unless a client does some
29 particular secret thing to prove it is allowed to use it as a Tor
30 bridge. Such a system would still succumb to passive traffic
31 analysis attacks (since the packet timings and sizes for HTTPS don't
32 look that much like Tor), but it would be enough to beat many current
35 Goals and requirements:
37 We should make it impossible for a passive attacker who examines only
38 a few packets at a time to distinguish Tor->Bridge traffic from an
39 HTTPS client talking to an HTTPS server.
41 We should make it impossible for an active attacker talking to the
42 server to tell a Tor bridge server from a regular HTTPS server.
44 We should make it impossible for an active attacker who can MITM the
45 server to learn from the client whether it thought it was connecting
46 to an HTTPS server or a Tor bridge. (This implies that an MITM
47 attacker shouldn't be able to learn anything that would help it
48 convince the server to act like a bridge.)
50 It would be nice to minimize the required code changes to Tor, and
51 the required code changes to any other software.
53 It would be good to avoid any requirement of close integration with
54 any particular HTTP or HTTPS implementation.
56 If we're replacing our own profile with that of an HTTPS service, we
57 should do so in a way that lets us use the profile of a popular
60 Efficiency would be good: layering TLS inside TLS is best avoided if
65 We need an actual web server; HTTP and HTTPS are so complicated that
66 there's no practical way to behave in a bug-compatible way with any
67 popular webserver short of running that webserver.
69 More obviously, we need a TLS implementation (or we can't implement
70 HTTPS), and we need a Tor bridge (since that's the whole point of
73 So from a top-level point of view, the question becomes: how shall we
76 There are three obvious ways; I'll discuss them in turn below.
80 Under this design, Tor accepts HTTPS connections, decides which ones
81 don't look like the Tor protocol, and relays them to a webserver.
83 +--------------------------------------+
84 +------+ TLS | +------------+ http +-----------+ |
85 | User |<------> | Tor Bridge |<----->| Webserver | |
86 +------+ | +------------+ +-----------+ |
87 | trusted host/network |
88 +--------------------------------------+
90 This approach would let us use a completely unmodified webserver
91 implementation, but would require the most extensive changes in Tor:
92 we'd need to add yet another flavor to Tor's TLS ice cream parlor,
93 and try to emulate a popular webserver's TLS behavior even more
96 To authenticate, we would need to take a hybrid approach, and begin
97 forwarding traffic to the webserver as soon as a webserver
98 might respond to the traffic. This could be pretty complicated,
99 since it requires us to have a model of how the webserver would
100 respond to any given set of bytes. As a workaround, we might try
101 relaying _all_ input to the webserver, and only replying as Tor in
102 the cases where the website hasn't replied. (This would likely
103 create recognizable timing patterns, though.)
105 The authentication itself could use a system akin to Tor proposals
106 189/190, where an early AUTHORIZE cell shows knowledge of a shared
107 secret if the client is a Tor client.
109 Design #2: TLS in the web server
111 +----------------------------------+
112 +------+ TLS | +------------+ tor0 +-----+ |
113 | User |<------> | Webserver |<------->| Tor | |
114 +------+ | +------------+ +-----+ |
115 | trusted host/network |
116 +----------------------------------+
118 In this design, we write an Apache module or something that can
119 recognize an authenticator of some kind in an HTTPS header, or
120 recognize a valid AUTHORIZE cell, and respond by forwarding the
121 traffic to a Tor instance.
123 To avoid the efficiency issue of doing an extra local
124 encrypt/decrypt, we need to have the webserver talk to Tor over a
125 local unencrypted connection. (I've denoted this as "tor0" in the
126 diagram above.) For implementation convenience, we might want to
127 implement that as a NULL TLS connection, so that the Tor server code
128 wouldn't have to change except to allow local NULL TLS connections in
131 For the Tor handshake to work properly here, we'll need a way for the
132 Tor instance to know which public key the webserver is configured to
135 We wouldn't need to support the parts of the Tor link protocol used
136 to authenticate clients to servers: relays shouldn't be using this
139 The Tor client would need to connect and prove its status as a Tor
140 client. If the client uses some means other than AUTHORIZE cells, or
141 if we want to do the authentication in a pluggable transport, and we
142 therefore decided to offload the responsibility for TLS itself to the
143 pluggable transport, that would scare me: Supporting pluggable
144 transports that have the responsibility for TLS would make it fairly
145 easy to mess up the crypto, and I'd rather not have it be so easy to
146 write a pluggable transport that accidentally makes Tor less secure.
148 Design #3: Reverse proxy
151 +----------------------------------+
152 | +-------+ http +-----------+ |
153 | | |<------>| Webserver | |
154 +------+ TLS | | | +-----------+ |
155 | User |<------> | Proxy | |
156 +------+ | | | tor0 +-----------+ |
157 | | |<------>| Tor | |
158 | +-------+ +-----------+ |
159 | trusted host/network |
160 +----------------------------------+
162 In this design, we write a server-side proxy to sit in front of Tor
163 and a webserver, or repurpose some existing HTTPS proxy. Its role
164 will be to do TLS, and then forward connections to Tor or the
165 webserver as appropriate. (In the web world, this kind of thing is
166 called a "reverse proxy", so that's the term I'm using here.)
168 To avoid fingerprinting, we should choose a proxy that's already in
169 common use as a TLS front-end for webservers -- nginx, perhaps.
170 Unfortunately, the more popular tools here seem to be pretty complex,
171 and the simpler tools less widely deployed. More investigation would
174 The authorization considerations would be as in Design #2 above; for
175 the reasons discussed there, it's probably a good idea to build the
176 necessary authorization into Tor itself.
178 I generally like this design best: it lets us isolate the "Check for
179 a valid authenticator and/or a valid or invalid HTTP header, and
180 react accordingly" question to a single program.
182 How to authenticate: The easiest way
184 Designing a good MITM-resistant AUTHORIZE cell, or an equivalent
185 HTTP header, is an open problem that we should solve in proposals
186 190 and 191 and their successors. I'm calling it out-of-scope here;
187 please see those proposals, their attendant discussion, and their
190 How to authenticate: a slightly harder way
192 Some proposals in this vein have in the past suggested a special
193 HTTP header to distinguish Tor connections from non-Tor connections.
194 This could work too, though it would require substantially larger
195 changes on the Tor client's part, would still require the client
196 take measures to avoid MITM attacks, and would also require the
197 client to implement a particular browser's http profile.
199 Some considerations on distinguishability
201 Against a passive eavesdropper, the easiest way to avoid
202 distinguishability in server responses will be to use an actual web
203 server or reverse web proxy's TLS implementation.
204 (Distinguishability based on client TLS use is another topic
207 Against an active non-MITM attacker, the best probing attacks will be
208 ones designed to provoke the system into acting in ways different from
209 those in which a webserver would act: responding earlier than a web
210 server would respond, or later, or differently. We need to make sure
211 that, whatever the front-end program is, it answers anything that
212 would qualify as a well-formed or ill-formed HTTP request whenever
213 the web server would. This must mean, for example, that whatever the
214 correct form of client authorization turns out to be, no prefix of
215 that authorization is ever something that the webserver would respond
216 to. With some web servers (I believe), that's as easy as making sure
217 that any valid authenticator isn't too long, and doesn't contain a CR
218 or LF character. With others, the authenticator would need to be a
219 valid HTTP request, with all the attendant difficulty that would
222 Against an attacker who can MITM the bridge, the best attacks will be
223 to wait for clients to connect and see how they behave. In this
224 case, the client probably needs to be able to authenticate the bridge
225 certificate as presented in the initial TLS handshake -- or some
226 other aspect of the TLS handshake if we're feeling insane. If the
227 certificate or handshake isn't as expected, the client should behave
228 as a web browser that's just received a bad TLS certificate. (The
229 alternative there would be to try to impersonate an HTTPS client that
230 has just accepted a self-signed certificate. But that would probably
231 require the Tor client to impersonate a full web browser, which isn't
234 Side note: What to put on the webserver?
236 To credibly pretend not to be ourselves, we must pretend to be
237 something else in particular -- and something not easily identifiable
238 or inherently worthless. We should not, for example, have all
239 deployments of this kind use a fixed website, even if that website is
240 the default "Welcome to Apache" configuration: A censor would
241 probably feel that they weren't breaking anything important by
242 blocking all unconfigured websites with nothing on them.
244 Therefore, we should probably conceive of a system like this as
245 "Something to add to your HTTPS website" rather than as a standalone