proposals/180-pluggable-transport.txt

   1 Filename: 180-pluggable-transport.txt
   2 Title: Pluggable transports for circumvention
   3 Author: Jacob Appelbaum, Nick Mathewson
   4 Created: 15-Oct-2010
   5 Status: Open
   6
   7 Overview
   8
   9   This proposal describes a way to decouple protocol-level obfuscation
  10   from the core Tor protocol in order to better resist client-bridge
  11   censorship.  Our approach is to specify a means to add pluggable
  12   transport implementations to Tor clients and bridges so that they can
  13   negotiate a superencipherment for the Tor protocol.
  14
  15 Scope
  16
  17   This is a document about transport plugins; it does not cover
  18   discovery improvements, or bridgedb improvements.  While these
  19   requirements might be solved by a program that also functions as a
  20   transport plugin, this proposal only covers the requirements and
  21   operation of transport plugins.
  22
  23 Motivation
  24
  25   Frequently, people want to try a novel circumvention method to help
  26   users connect to Tor bridges.  Some of these methods are already
  27   pretty easy to deploy: if the user knows an unblocked VPN or open
  28   SOCKS proxy, they can just use that with the Tor client today.
  29
  30   Less easy to deploy are methods that require participation by both the
  31   client and the bridge.  In order of increasing sophistication, we
  32   might want to support:
  33
  34   1. A protocol obfuscation tool that transforms the output of a TLS
  35      connection into something that looks like HTTP as it leaves the
  36      client, and back to TLS as it arrives at the bridge.
  37   2. An additional authentication step that a client would need to
  38      perform for a given bridge before being allowed to connect.
  39   3. An information passing system that uses a side-channel in some
  40      existing protocol to convey traffic between a client and a bridge
  41      without the two of them ever communicating directly.
  42   4. A set of clients to tunnel client->bridge traffic over an existing
  43      large p2p network, such that the bridge is known by an identifier
  44      in that network rather than by an IP address.
  45
  46   We could in theory support these almost fine with Tor as it stands
  47   today: every Tor client can take a SOCKS proxy to use for its outgoing
  48   traffic, so a suitable client proxy could handle the client's traffic
  49   and connections on its behalf, while a corresponding program on the
  50   bridge side could handle the bridge's side of the protocol
  51   transformation.  Nevertheless, there are some reasons to add support
  52   for transportation plugins to Tor itself:
  53
  54   1. It would be good for bridges to have a standard way to advertise
  55      which transports they support, so that clients can have multiple
  56      local transport proxies, and automatically use the right one for
  57      the right bridge.
  58
  59   2. There are some changes to our architecture that we'll need for a
  60      system like this to work.  For testing purposes, if a bridge blocks
  61      off its regular ORPort and instead has an obfuscated ORPort, the
  62      bridge authority has no way to test it.  Also, unless the bridge
  63      has some way to tell that the bridge-side proxy at 127.0.0.1 is not
  64      the origin of all the connections it is relaying, it might decide
  65      that there are too many connections from 127.0.0.1, and start
  66      paring them down to avoid a DoS.
  67
  68   3. Censorship and anticensorship techniques often evolve faster than
  69      the typical Tor release cycle.  As such, it's a good idea to
  70      provide ways to test out new anticensorship mechanisms on a more
  71      rapid basis.
  72
  73   4. Transport obfuscation is a relatively distinct problem
  74      from the other privacy problems that Tor tries to solve, and it
  75      requires a fairly distinct skill-set from hacking the rest of Tor.
  76      By decoupling transport obfuscation from the Tor core, we hope to
  77      encourage people working on transport obfuscation who would
  78      otherwise not be interested in hacking Tor.
  79
  80   5. Finally, we hope that defining a generic transport obfuscation plugin
  81      mechanism will be useful to other anticensorship projects.
  82
  83 Non-Goals
  84
  85   We're not going to talk about automatic verification of plugin
  86   correctness and safety via sandboxing, proof-carrying code, or
  87   whatever.
  88
  89   We need to do more with discovery and distribution, but that's not
  90   what this proposal is about.  We're pretty convinced that the problems
  91   are sufficiently orthogonal that we should be fine so long as we don't
  92   preclude a single program from implementing both transport and
  93   discovery extensions.
  94
  95   This proposal is not about what transport plugins are the best ones
  96   for people to write.  We do, however, make some general
  97   recommendations for plugin authors in an appendix.
  98
  99   We've considered issues involved with completely replacing Tor's TLS
 100   with another encryption layer, rather than layering it inside the
 101   obfuscation layer.  We describe how to do this in an appendix to the
 102   current proposal, though we are not currently sure whether it's a good
 103   idea to implement.
 104
 105   We deliberately reject any design that would involve linking more code
 106   into Tor's process space.
 107
 108 Design overview
 109
 110   To write a new transport protocol, an implementer must provide two
 111   pieces: a "Client Proxy" to run at the initiator side, and a "Server
 112   Proxy" to run at the server side.  These two pieces may or may not be
 113   implemented by the same program.
 114
 115   Each client may run any number of Client Proxies.  Each one acts like
 116   a SOCKS proxy that accepts connections on localhost.  Each one
 117   runs on a different port, and implements one or more transport
 118   methods.  If the protocol has any parameters, they are passed from Tor
 119   inside the regular username/password parts of the SOCKS protocol.
 120
 121   Bridges (and maybe relays) may run any number of Server Proxies: these
 122   programs provide an interface like stunnel: they get connections from the
 123   network (typically by listening for connections on the network) and relay
 124   them to the Bridge's real ORPort.
 125
 126   To configure one of these programs, it should be sufficient simply to
 127   list it in your torrc.  The program tells Tor which transports it
 128   provides.  The Tor consensus should carry a new approved version number that
 129   is specific for pluggable transport; this will allow Tor to know when a
 130   particular transport is known to be unsafe, safe, or non-functional.
 131
 132   Bridges (and maybe relays) report in their descriptors which transport
 133   protocols they support.  This information can be copied into bridge
 134   lines.  Bridges using a transport protocol may have multiple bridge
 135   lines.
 136
 137   Any methods that are wildly successful, we can bake into Tor.
 138
 139 Specifications: Client behavior
 140
 141   We extend the bridge line format to allow you to say which method
 142   to use to connect to a bridge.
 143
 144   The new format is:
 145      "bridge method address:port [[keyid=]id-fingerprint] [k=v] [k=v] [k=v]"
 146
 147   To connect to such a bridge, the Tor program needs to know which
 148   local SOCKS proxy will support the transport called "method".  It
 149   then connects to this proxy, and asks it to connect to
 150   address:port.  If [id-fingerprint] is provided, Tor should expect
 151   the public identity key on the TLS connection to match the digest
 152   provided in [id-fingerprint].  If any [k=v] items are provided,
 153   they are configuration parameters for the proxy: Tor should
 154   separate them with semicolons and put them in the user and
 155   password fields of the request, splitting them across the fields
 156   as necessary.  If a key or value value must contain a semicolon or
 157   a backslash, it is escaped with a backslash.
 158
 159   The "id-fingerprint" field is always provided in a field named
 160   "keyid", if it was given.  Method names must be C identifiers.
 161
 162   Example: if the bridge line is "bridge trebuchet www.example.com:3333
 163      rocks=20 height=5.6m" AND if the Tor client knows that the
 164      'trebuchet' method is provided by a SOCKS5 proxy on
 165      127.0.0.1:19999, the client should connect to that proxy, ask it to
 166      connect to www.example.com, and provide the string
 167      "rocks=20;height=5.6m" as the username, the password, or split
 168      across the username and password.
 169
 170   There are two ways to tell Tor clients about protocol proxies:
 171   external proxies and managed proxies.  An external proxy is configured
 172   with
 173      ClientTransportPlugin method socks4 address:port [auth=X]
 174   or
 175      ClientTransportPlugin method socks5 address:port [username=X] [password=Y]
 176   as in
 177      "ClientTransportPlugin trebuchet socks5 127.0.0.1:9999".
 178   This example tells Tor that another program is already running to handle
 179   'trubuchet' connections, and Tor doesn't need to worry about it.
 180
 181   A managed proxy is configured with
 182      ClientTransportPlugin <method> exec <path> [options]
 183   as in
 184     "ClientTransportPlugin trebuchet exec /usr/libexec/trebuchet --managed"
 185   This example tells Tor to launch an external program to provide a
 186   socks proxy for 'trebuchet' connections. The Tor client only
 187   launches one instance of each external program with a given set of
 188   options, even if the same executable and options are listed for
 189   more than one method.
 190
 191   If instead of a transport name, the torrc lists "*" for a managed proxy,
 192   tor uses that proxy for all transports that it supports.  So
 193   "ClientTransportPlugin * exec /usr/libexec/tor/foobar" tells Tor
 194   that it should use the foobar plugin for everything that it supports.
 195
 196   If two proxies support the same method, Tor should use whichever
 197   one is listed first.
 198
 199   The same program can implement a managed or an external proxy: it just
 200   needs to take an argument saying which one to be.
 201
 202   See "Managed proxy behavior" for more information on the managed
 203   proxy interface.
 204
 205 Server behavior
 206
 207   Server proxies are configured similarly to client proxies.  When
 208   launching a proxy, the server must tell it what ORPort it has
 209   configured, and what address (if any) it can listen on.  The
 210   server must tell the proxy which (if any) methods it should
 211   provide if it can; the proxy needs to tell the server which
 212   methods it is actually providing, and on what ports.
 213
 214   When a client connects to the proxy, the proxy may need a way to
 215   tell the server some identifier for the client address.  It does
 216   this in-band.
 217
 218   As before, the server lists proxies in its torrc.  These can be
 219   external proxies that run on their own, or managedproxies that Tor
 220   launches.
 221
 222   An external server proxy is configured as
 223      ServerTransportPlugin method proxy address:port param=val..
 224   as in
 225      ServerTransportPlugin trebuchet proxy 127.0.0.1:999 rocks=heavy
 226   The param=val pairs and the address are used to make the bridge
 227   configuration information that we'll tell users.
 228
 229   A managed proxy is configured as
 230       ServerTransportPlugin method exec /path/to/binary [options]
 231   or
 232       ServerTransportPlugin * exec /path/to/binary [options]
 233
 234   When possible, Tor should launch only one binary of each binary/option
 235   pair configured.  So if the torrc contains
 236
 237      ClientTransportPlugin foo exec /usr/bin/megaproxy --foo
 238      ClientTransportPlugin bar exec /usr/bin/megaproxy --bar
 239      ServerTransportPlugin * exec /usr/bin/megaproxy --foo
 240
 241   then Tor will launch the megaproxy binary twice: once with the option
 242   --foo and once with the option --bar.
 243
 244 Managed proxy interface
 245
 246    When the Tor client launches a client proxy from the command
 247    line, it communicates via environment variables.  At a minimum,
 248    it sets:
 249
 250       {Client and server}
 251       HOME, PATH -- as you'd expect.
 252
 253       "STATE_LOCATION" -- a directory where the proxy should store
 254        state if it wants to.  This directory is not required to
 255        exist, but the proxy SHOULD be able to create it if it
 256        doesn't.  The proxy SHOULD NOT store state elsewhere.
 257
 258       "MANAGED_TRANSPORT_VER=1" -- To tell the proxy which versions
 259        of this configuration protocol Tor supports.  Future versions
 260        will give a comma-separated list.  Clients MUST accept
 261        comma-separated lists containing any version that they
 262        recognize, and MUST work correctly even if some of the
 263        versions they don't recognize are non-numeric.  Valid version
 264        characters are non-space, non-comma printing ASCII characters.
 265
 266       {Client only}
 267
 268       "CLIENT_TRANSPORTS" -- a comma-separated list of which methods
 269         this client should enable, or * if all methods should be
 270         enabled.  The proxy SHOULD ignore methods that it doesn't
 271         recognize.
 272
 273       {Server only}
 274
 275       "EXT_SERVER_PORT=addr:portnum" -- A port (probably on localhost) that
 276         speaks the extended server protocol.
 277
 278       "ORPORT=addr:portnum" -- Our regular ORPort in a form suitable
 279         for local connections.
 280
 281       "BINDADDR=addr" -- An address on which to listen for local
 282          connections.  This might be the advertised address, or might
 283          be a local address that Tor will forward ports to.  It MUST
 284          be an address that will work with bind().
 285
 286       "SERVER_TRANSPORTS=..." -- A comma-separated list of server
 287           methods that the proxy should support, or *
 288
 289   The transport proxy replies by writing NL-terminated lines to
 290   stdout.  The metaformat is
 291
 292       Keyword OptArgs NL
 293       OptArgs = Args |
 294       Args = SP ArgChar | Args ArgChar
 295       ArgChar = Any character but NUL or NL
 296       Keyword = KeywordChar | Keyword KeywordChar
 297       KeyWordChar = All alphanumeric characters, dash, and underscore.
 298
 299   Tor MUST ignore lines with keywords that it doesn't recognize.
 300
 301   First, the proxy writes "VERSION 1" to say that it supports this
 302   protocol. It must either pick a version that Tor told it about, or
 303   pick no version at all, and say "ERROR no-version\n" and exit.
 304
 305   The proxy should then open its ports.  If running as a client
 306   proxy, it should not use fixed ports; instead it should autoselect
 307   ports to avoid conflicts.  A client proxy should by default only
 308   listen on localhost for connections.
 309
 310   A server proxy SHOULD try listen at a consistent port, though it
 311   SHOULD pick a different one if the port it last used is now allocated.
 312
 313   A client or server proxy then should tell which methods it has
 314   made available and how.  It does this by printing zero or more
 315   CMETHOD and SMETHOD lines to its stdout.  These lines look like:
 316
 317    CMETHOD methodname SOCKS4/SOCKS5 address:port [ARGS=arglist] \
 318         [OPT-ARGS=arglist]
 319
 320   as in
 321
 322    CMETHOD trebuchet SOCKS5 127.0.0.1:19999 ARGS=rocks,height \
 323               OPT-ARGS=tensile-strength
 324
 325   The ARGS field lists mandatory parameters that must appear in
 326   every bridge line for this method. The OPT-ARGS field lists
 327   optional parameters.  If no ARGS or OPT-ARGS field is provided,
 328   Tor should not check the parameters in bridge lines for this
 329   method.
 330
 331   The proxy should print a single "CMETHODS DONE" line after it is
 332   finished telling Tor about the client methods it provides.  If it
 333   tries to supply a client method but can't for some reason, it
 334   should say:
 335     CMETHOD-ERROR methodname "Message"
 336
 337   A proxy should tell Tor about the server methods it is providing
 338   by printing zero or more SMETHOD lines.  These lines look like:
 339
 340     SMETHOD methodname address:port  [Options]
 341
 342   If there's an error setting up a configured server method, the
 343   proxy should say:
 344     SMETHOD-ERROR methodname "message"
 345
 346   The 'address:port' part of an SMETHOD line is the address to put
 347   in the bridge line.  The ARGS: part is a list of key-value pairs
 348   that the client needs to know.  The Options part is a list of
 349   space-separated K:V flags that Tor should know about.  Recognized
 350   options are:
 351
 352       - FORWARD:1
 353
 354         If this option is set, and address:port is not a publicly
 355         accessible address, then the bridge needs to forward some
 356         other address:port to address:port via upnp-helper.
 357
 358       - ARGS:k=v,k=v,k=v
 359
 360         If this option is set, the K=V arguments are added to the
 361         extrainfo document.
 362
 363       - DECLARE:K=V,...
 364
 365         If this option is set, all the K=V options should be
 366         added as extension entries to the router descriptor.  (See
 367         below)
 368
 369       - USE-EXTPORT:1
 370
 371         If this option is set, the server plugin is using the
 372         extended server port.
 373
 374   SMETHOD and CMETHOD lines may be interspersed.  After the list
 375   SMETHOD line, the proxy says "SMETHODS DONE"
 376
 377   The proxy SHOULD NOT tell Tor about a server or client method
 378   unless it is actually open and ready to use.
 379
 380   Tor clients SHOULD NOT use any method from a client proxy or
 381   advertise any method from a server proxy UNLESS it is listed as a
 382   possible method for that proxy in torrc, and it is listed by the
 383   proxy as a method it supports.
 384
 385   Proxies should respond to a single INT signal by closing their
 386   listener ports and not accepting any new connections, but keeping
 387   all connections open, then terminating when connections are all
 388   closed.  Proxies should respond to a second INT signal by shutting
 389   down cleanly.
 390
 391 The extended ORPort protocol.
 392
 393   Server transports may need to connect to the bridge and pass
 394   additional information about client connections that the bridge
 395   would ordinarily receive from the kernel's TCP stack.  To to this,
 396   they connect to the "extended server port" as given in
 397   SERVER_PORT, sent a short amount of information, wait for a
 398   response, and then send the user traffic on that port.
 399
 400   The extended server port protocol is as follows:
 401
 402      COMMAND [2 bytes, big-endian]
 403      BODYLEN [2 bytes, big-endian]
 404      BODY [Bodylen bytes]
 405
 406      Commands sent from the transport to the server are:
 407
 408      [0x0000] DONE: There is no more information to give. (body ignored)
 409
 410      [0x0001] USERADDR: an address:port string that represents the user's
 411        address.  If the transport doesn't actually do addresses,
 412        this shouldn't be sent.
 413
 414      Replies sent from tor to the proxy are:
 415
 416      [0x1001] OKAY: Send the user's traffic. (body ignored)
 417
 418      [0x1002] DENY: Tor would prefer not to get more traffic from
 419        this address for a while. (body ignored)
 420
 421   [We could also use an out-of-band signalling method to tell Tor
 422   about client addresses, but that's a historically error-prone way
 423   to go about annotating connections.]
 424
 425 Advertising bridge methods:
 426
 427   Bridges put the 'method' lines in their extra-info documents.
 428
 429      method SP methodname SP address:port SP arglist NL
 430
 431   The address:port parse are as returned from an SMETHOD line.  The
 432   arglist is a K=V,... list as retuned in the ARGS part of the
 433   SMETHOD line.
 434
 435   If the SMETHOD line includes a DECLARE: part, the routerinfo gets
 436   a new line:
 437
 438      method-info SP methodname SP arglist NL
 439
 440 Bridge authority behavior
 441
 442   We need to specify a way to test different transport methods that
 443   bridges claim to support.  We should test as many as possible.  We
 444   should NOT require that we have a way to test every possible
 445   transport method before we allow its use: the point of this design
 446   is to remove bottlenecks in transport deployment.
 447
 448 Bridgedb behavior:
 449
 450   Bridgedb can, given a set of router descriptors and their
 451   corresponding extrainfo documents, generate a set of bridge lines
 452   for each descriptor.  Bridgedb may want to avoid handing out
 453   methods that seem to get bridges blocked quickly.
 454
 455 Implementation plan
 456
 457   First, we should implement per-bridge proxies via the "external
 458   proxy" method described in "Specifications: Client behavior".  the
 459   extended-server-port mechanism.  This will let bridges run
 460   transport proxies such that they can give bridge lines to
 461   give to clients for testing, so long as the user configures and
 462   launches their proxies on their own.
 463
 464   Once that's done, we can see if we need any managed proxies, or if
 465   the whole idea there is silly.
 466
 467   If we do, the next most important part seems to be getting
 468   the client-side automatic part written.  And once that's done, we
 469   can evaluate how much of the server side is easy for people to do
 470   and how much is hard.
 471
 472   The "obfsproxy" obfuscating proxy is a likely candidate for an
 473   initial transport, as is Steven Murdoch's http thing or something
 474   similar.
 475
 476 Notes on plugins to write:
 477
 478    We should ship a couple of null plugin implementations in one or two
 479    popular, portable languages so that people get an idea of how to
 480    write the stuff.
 481
 482    1. We should have one that's just a proof of concept that does
 483       nothing but transfer bytes back and forth.
 484
 485    1. We should not do a rot13 one.
 486
 487    2. We should implement a basic proxy that does not transform the bytes at all
 488
 489    1. We should implement DNS or HTTP using other software (as goodesll
 490       did years ago with DNS) as an example of wrapping existing code into
 491       our plugin model.
 492
 493    2. The obfuscated-ssh superencipherment is pretty trivial and pretty
 494    useful.  It makes the protocol stringwise unfingerprintable.
 495
 496       1. Nick needs to be told firmly not to bikeshed the obfuscated-ssh
 497         superencipherment too badly
 498
 499          1. Go ahead, bikeshed my day
 500
 501    1. If we do a raw-traffic proxy, openssh tunnels would be the logical choice.
 502
 503 Appendix: recommendations for transports
 504
 505   Be free/open-source software.  Also, if you think your code might
 506   someday do so well at circumvention that it should be implemented
 507   inside Tor, it should use the same license as Tor.
 508
 509   Use libraries that Tor already requires. (You can rely on openssl and
 510   libevent being present if current Tor is present.)
 511
 512   Be portable: most Tor users are on Windows, and most Tor developers
 513   are not, so designing your code for just one of these platforms will
 514   make it either get a small userbase, or poor auditing.
 515
 516   Think secure: if your code is in a C-like language, and it's hard to
 517   read it and become convinced it's safe, then it's probably not safe.
 518
 519   Think small: we want to minimize the bytes that a Windows user needs
 520   to download for a transport client.
 521
 522   Avoid security-through-obscurity if possible.  Specify.
 523
 524   Resist trivial fingerprinting: There should be no good string or regex
 525   to search for to distinguish your protocol from protocols permitted by
 526   censors.
 527
 528   Imitate a real profile: There are many ways to implement most
 529   protocols -- and in many cases, most possible variants of a given
 530   protocol won't actually exist in the wild.
 531
 532
 533