proposals/203-https-frontend.txt

   1 Filename: 203-https-frontend.txt
   2 Title: Avoiding censorship by impersonating an HTTPS server
   3 Author: Nick Mathewson
   4 Created: 24 Jun 2012
   5 Status: Draft
   6
   7
   8 Overview:
   9
  10    One frequently proposed approach for censorship resistance is that
  11    Tor bridges ought to act like another TLS-based service, and deliver
  12    traffic to Tor only if the client can demonstrate some shared
  13    knowledge with the bridge.
  14
  15    In this document, I discuss some design considerations for building
  16    such systems, and propose a few possible architectures and designs.
  17
  18 Background:
  19
  20    Most of our previous work on censorship resistance has focused on
  21    preventing passive attackers from identifying Tor bridges, or from
  22    doing so cheaply.  But active attackers exist, and exist in the wild:
  23    right now, the most sophisticated censors use their anti-Tor passive
  24    attacks only as a first round of filtering before launching a
  25    secondary active attack to confirm suspected Tor nodes.
  26
  27    One idea we've been talking about for a while is that of having a
  28    service that looks like an HTTPS service unless a client does some
  29    particular secret thing to prove it is allowed to use it as a Tor
  30    bridge.  Such a system would still succumb to passive traffic
  31    analysis attacks (since the packet timings and sizes for HTTPS don't
  32    look that much like Tor), but it would be enough to beat many current
  33    censors.
  34
  35 Goals and requirements:
  36
  37    We should make it impossible for a passive attacker who examines only
  38    a few packets at a time to distinguish Tor->Bridge traffic from an
  39    HTTPS client talking to an HTTPS server.
  40
  41    We should make it impossible for an active attacker talking to the
  42    server to tell a Tor bridge server from a regular HTTPS server.
  43
  44    We should make it impossible for an active attacker who can MITM the
  45    server to learn from the client whether it thought it was connecting
  46    to an HTTPS server or a Tor bridge.  (This implies that an MITM
  47    attacker shouldn't be able to learn anything that would help it
  48    convince the server to act like a bridge.)
  49
  50    It would be nice to minimize the required code changes to Tor, and
  51    the required code changes to any other software.
  52
  53    It would be good to avoid any requirement of close integration with
  54    any particular HTTP or HTTPS implementation.
  55
  56    If we're replacing our own profile with that of an HTTPS service, we
  57    should do so in a way that lets us use the profile of a popular
  58    HTTPS implementation.
  59
  60    Efficiency would be good: layering TLS inside TLS is best avoided if
  61    we can.
  62
  63 Discussion:
  64
  65    We need an actual web server; HTTP and HTTPS are so complicated that
  66    there's no practical way to behave in a bug-compatible way with any
  67    popular webserver short of running that webserver.
  68
  69    More obviously, we need a TLS implementation (or we can't implement
  70    HTTPS), and we need a Tor bridge (since that's the whole point of
  71    this exercise).
  72
  73    So from a top-level point of view, the question becomes: how shall we
  74    wire these together?
  75
  76    There are three obvious ways; I'll discuss them in turn below.
  77
  78 Design #1: TLS in Tor
  79
  80    Under this design, Tor accepts HTTPS connections, decides which ones
  81    don't look like the Tor protocol, and relays them to a webserver.
  82
  83                    +--------------------------------------+
  84      +------+  TLS |  +------------+  http +-----------+  |
  85      | User |<------> | Tor Bridge |<----->| Webserver |  |
  86      +------+      |  +------------+       +-----------+  |
  87                    |     trusted host/network             |
  88                    +--------------------------------------+
  89
  90    This approach would let us use a completely unmodified webserver
  91    implementation, but would require the most extensive changes in Tor:
  92    we'd need to add yet another flavor to Tor's TLS ice cream parlor,
  93    and try to emulate a popular webserver's TLS behavior even more
  94    thoroughly.
  95
  96    To authenticate, we would need to take a hybrid approach, and begin
  97    forwarding traffic to the webserver as soon as a webserver
  98    might respond to the traffic.  This could be pretty complicated,
  99    since it requires us to have a model of how the webserver would
 100    respond to any given set of bytes.  As a workaround, we might try
 101    relaying _all_ input to the webserver, and only replying as Tor in
 102    the cases where the website hasn't replied.  (This would likely
 103    create recognizable timing patterns, though.)
 104
 105    The authentication itself could use a system akin to Tor proposals
 106    189/190, where an early AUTHORIZE cell shows knowledge of a shared
 107    secret if the client is a Tor client.
 108
 109 Design #2: TLS in the web server
 110
 111                    +----------------------------------+
 112      +------+  TLS |  +------------+  tor0   +-----+  |
 113      | User |<------> | Webserver  |<------->| Tor |  |
 114      +------+      |  +------------+         +-----+  |
 115                    |     trusted host/network         |
 116                    +----------------------------------+
 117
 118    In this design, we write an Apache module or something that can
 119    recognize an authenticator of some kind in an HTTPS header, or
 120    recognize a valid AUTHORIZE cell, and respond by forwarding the
 121    traffic to a Tor instance.
 122
 123    To avoid the efficiency issue of doing an extra local
 124    encrypt/decrypt, we need to have the webserver talk to Tor over a
 125    local unencrypted connection. (I've denoted this as "tor0" in the
 126    diagram above.)  For implementation convenience, we might want to
 127    implement that as a NULL TLS connection, so that the Tor server code
 128    wouldn't have to change except to allow local NULL TLS connections in
 129    this configuration.
 130
 131    For the Tor handshake to work properly here, we'll need a way for the
 132    Tor instance to know which public key the webserver is configured to
 133    use.
 134
 135    We wouldn't need to support the parts of the Tor link protocol used
 136    to authenticate clients to servers: relays shouldn't be using this
 137    subsystem at all.
 138
 139    The Tor client would need to connect and prove its status as a Tor
 140    client.  If the client uses some means other than AUTHORIZE cells, or
 141    if we want to do the authentication in a pluggable transport, and we
 142    therefore decided to offload the responsibility for TLS itself to the
 143    pluggable transport, that would scare me: Supporting pluggable
 144    transports that have the responsibility for TLS would make it fairly
 145    easy to mess up the crypto, and I'd rather not have it be so easy to
 146    write a pluggable transport that accidentally makes Tor less secure.
 147
 148 Design #3: Reverse proxy
 149
 150
 151                    +----------------------------------+
 152                    |  +-------+  http  +-----------+  |
 153                    |  |       |<------>| Webserver |  |
 154      +------+  TLS |  |       |        +-----------+  |
 155      | User |<------> | Proxy |                       |
 156      +------+      |  |       |  tor0  +-----------+  |
 157                    |  |       |<------>|    Tor    |  |
 158                    |  +-------+        +-----------+  |
 159                    |     trusted host/network         |
 160                    +----------------------------------+
 161
 162    In this design, we write a server-side proxy to sit in front of Tor
 163    and a webserver, or repurpose some existing HTTPS proxy. Its role
 164    will be to do TLS, and then forward connections to Tor or the
 165    webserver as appropriate.  (In the web world, this kind of thing is
 166    called a "reverse proxy", so that's the term I'm using here.)
 167
 168    To avoid fingerprinting, we should choose a proxy that's already in
 169    common use as a TLS front-end for webservers -- nginx, perhaps.
 170    Unfortunately, the more popular tools here seem to be pretty complex,
 171    and the simpler tools less widely deployed.  More investigation would
 172    be needed.
 173
 174    The authorization considerations would be as in Design #2 above; for
 175    the reasons discussed there, it's probably a good idea to build the
 176    necessary authorization into Tor itself.
 177
 178    I generally like this design best: it lets us isolate the "Check for
 179    a valid authenticator and/or a valid or invalid HTTP header, and
 180    react accordingly" question to a single program.
 181
 182 How to authenticate: The easiest way
 183
 184    Designing a good MITM-resistant AUTHORIZE cell, or an equivalent
 185    HTTP header, is an open problem that we should solve in proposals
 186    190 and 191 and their successors.  I'm calling it out-of-scope here;
 187    please see those proposals, their attendant discussion, and their
 188    eventual successors.
 189
 190 How to authenticate: a slightly harder way
 191
 192    Some proposals in this vein have in the past suggested a special
 193    HTTP header to distinguish Tor connections from non-Tor connections.
 194    This could work too, though it would require substantially larger
 195    changes on the Tor client's part, would still require the client
 196    take measures to avoid MITM attacks, and would also require the
 197    client to implement a particular browser's http profile.
 198
 199 Some considerations on distinguishability
 200
 201    Against a passive eavesdropper, the easiest way to avoid
 202    distinguishability in server responses will be to use an actual web
 203    server or reverse web proxy's TLS implementation.
 204    (Distinguishability based on client TLS use is another topic
 205    entirely.)
 206
 207    Against an active non-MITM attacker, the best probing attacks will be
 208    ones designed to provoke the system into acting in ways different from
 209    those in which a webserver would act: responding earlier than a web
 210    server would respond, or later, or differently.  We need to make sure
 211    that, whatever the front-end program is, it answers anything that
 212    would qualify as a well-formed or ill-formed HTTP request whenever
 213    the web server would.  This must mean, for example, that whatever the
 214    correct form of client authorization turns out to be, no prefix of
 215    that authorization is ever something that the webserver would respond
 216    to.  With some web servers (I believe), that's as easy as making sure
 217    that any valid authenticator isn't too long, and doesn't contain a CR
 218    or LF character.  With others, the authenticator would need to be a
 219    valid HTTP request, with all the attendant difficulty that would
 220    raise.
 221
 222    Against an attacker who can MITM the bridge, the best attacks will be
 223    to wait for clients to connect and see how they behave.  In this
 224    case, the client probably needs to be able to authenticate the bridge
 225    certificate as presented in the initial TLS handshake -- or some
 226    other aspect of the TLS handshake if we're feeling insane.  If the
 227    certificate or handshake isn't as expected, the client should behave
 228    as a web browser that's just received a bad TLS certificate.  (The
 229    alternative there would be to try to impersonate an HTTPS client that
 230    has just accepted a self-signed certificate.  But that would probably
 231    require the Tor client to impersonate a full web browser, which isn't
 232    realistic.)
 233
 234 Side note: What to put on the webserver?
 235
 236    To credibly pretend not to be ourselves, we must pretend to be
 237    something else in particular -- and something not easily identifiable
 238    or inherently worthless.  We should not, for example, have all
 239    deployments of this kind use a fixed website, even if that website is
 240    the default "Welcome to Apache" configuration: A censor would
 241    probably feel that they weren't breaking anything important by
 242    blocking all unconfigured websites with nothing on them.
 243
 244    Therefore, we should probably conceive of a system like this as
 245    "Something to add to your HTTPS website" rather than as a standalone
 246    installation.
 247