proposals/163-detecting-clients.txt

   1 Filename: 163-detecting-clients.txt
   2 Title: Detecting whether a connection comes from a client
   3 Author: Nick Mathewson
   4 Created: 22-May-2009
   5 Target: 0.2.2
   6 Status: Superseded
   7
   8 [Note: Actually, this is partially done, partially superseded
   9        -nickm, 9 May 2011]
  10
  11
  12 Overview:
  13
  14    Some aspects of Tor's design require relays to distinguish
  15    connections from clients from connections that come from relays.
  16    The existing means for doing this is easy to spoof.  We propose
  17    a better approach.
  18
  19 Motivation:
  20
  21    There are at least two reasons for which Tor servers want to tell
  22    which connections come from clients and which come from other
  23    servers:
  24
  25      1) Some exits, proposal 152 notwithstanding, want to disallow
  26         their use as single-hop proxies.
  27      2) Some performance-related proposals involve prioritizing
  28         traffic from relays, or limiting traffic per client (but not
  29         per relay).
  30
  31    Right now, we detect client vs server status based on how the
  32    client opens circuits.  (Check out the code that implements the
  33    AllowSingleHopExits option if you want all the details.)  This
  34    method is depressingly easy to fake, though.  This document
  35    proposes better means.
  36
  37 Goals:
  38
  39    To make grabbing relay privileges at least as difficult as just
  40    running a relay.
  41
  42    In the analysis below, "using server privileges" means taking any
  43    action that only servers are supposed to do, like delivering a
  44    BEGIN cell to an exit node that doesn't allow single hop exits,
  45    or claiming server-like amounts of bandwidth.
  46
  47 Passive detection:
  48
  49    A connection is definitely a client connection if it takes one of
  50    the TLS methods during setup that does not establish an identity
  51    key.
  52
  53    A circuit is definitely a client circuit if it is initiated with
  54    a CREATE_FAST cell, though the node could be a client or a server.
  55
  56    A node that's listed in a recent consensus is probably a server.
  57
  58    A node to which we have successfully extended circuits from
  59    multiple origins is probably a server.
  60
  61 Active detection:
  62
  63    If a node doesn't try to use server privileges at all, we never
  64    need to care whether it's a server.
  65
  66    When a node or circuit tries to use server privileges, if it is
  67    "definitely a client" as per above, we can refuse it immediately.
  68
  69    If it's "probably a server" as per above, we can accept it.
  70
  71    Otherwise, we have either a client, or a server that is neither
  72    listed in any consensus or used by any other clients -- in other
  73    words, a new or private server.
  74
  75    For these servers, we should attempt to build one or more test
  76    circuits through them.  If enough of the circuits succeed, the
  77    node is a real relay.  If not, it is probably a client.
  78
  79    While we are waiting for the test circuits to succeed, we should
  80    allow a short grace period in which server privileges are
  81    permitted.  When a test is done, we should remember its outcome
  82    for a while, so we don't need to do it again.
  83
  84 Why it's hard to do good testing:
  85
  86    Doing a test circuit starting with an unlisted router requires
  87    only that we have an open connection for it.  Doing a test
  88    circuit starting elsewhere _through_ an unlisted router--though
  89    more reliable-- would require that we have a known address, port,
  90    identity key, and onion key for the router.  Only the address and
  91    identity key are easily available via the current Tor protocol in
  92    all cases.
  93
  94    We could fix this part by requiring that all servers support
  95    BEGIN_DIR and support downloading at least a current descriptor
  96    for themselves.
  97
  98 Open questions:
  99
 100    What are the thresholds for the needed numbers of circuits
 101    for us to decide that a node is a relay?
 102
 103       [Suggested answer: two circuits from two distinct hosts.]
 104
 105    How do we pick grace periods?  How long do we remember the
 106    outcome of a test?
 107
 108       [Suggested answer: 10 minute grace period; 48 hour memory of
 109       test outcomes.]
 110
 111    If we can build circuits starting at a suspect node, but we don't
 112    have enough information to try extending circuits elsewhere
 113    through the node, should we conclude that the node is
 114    "server-like" or not?
 115
 116       [Suggested answer: for now, just try making circuits through
 117       the node.  Extend this to extending circuits as needed.]
 118