doc/spec/proposals/158-microdescriptors.txt

   1 Filename: 158-microdescriptors.txt
   2 Title: Clients download consensus + microdescriptors
   3 Version: $Revision$
   4 Last-Modified: $Date$
   5 Author: Roger Dingledine
   6 Created: 17-Jan-2009
   7 Status: Open
   8
   9 1. Overview
  10
  11   This proposal replaces section 3.2 of proposal 141, which was
  12   called "Fetching descriptors on demand". Rather than modifying the
  13   circuit-building protocol to fetch a server descriptor inline at each
  14   circuit extend, we instead put all of the information that clients need
  15   either into the consensus itself, or into a new set of data about each
  16   relay called a microdescriptor. The microdescriptor is a direct
  17   transform from the relay descriptor, so relays don't even need to know
  18   this is happening.
  19
  20   Descriptor elements that are small and frequently changing should go
  21   in the consensus itself, and descriptor elements that are small and
  22   relatively static should go in the microdescriptor. If we ever end up
  23   with descriptor elements that aren't small yet clients need to know
  24   them, we'll need to resume considering some design like the one in
  25   proposal 141.
  26
  27 2. Motivation
  28
  29   See
  30   http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
  31   http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
  32   http://archives.seul.org/or/dev/Nov-2008/msg00007.html
  33   for a discussion of the options and why this is currently the best
  34   approach.
  35
  36 3. Design
  37
  38   There are three pieces to the proposal. First, authorities will list in
  39   their votes (and thus in the consensus) what relay descriptor elements
  40   are included in the microdescriptor, and also list the expected hash
  41   of microdescriptor for each relay. Second, directory mirrors will serve
  42   microdescriptors. Third, clients will ask for them and cache them.
  43
  44 3.1. Consensus changes
  45
  46   V3 votes should include a new line:
  47     microdescriptor-elements bar baz foo
  48   listing each descriptor element (sorted alphabetically) that authority
  49   included when it calculated its expected microdescriptor hashes.
  50
  51   We also need to include the hash of each expected microdescriptor in
  52   the routerstatus section. I suggest a new "m" line for each stanza,
  53   with the base64 of the hash of the elements that the authority voted
  54   for above.
  55
  56   The consensus microdescriptor-elements and "m" lines are then computed
  57   as described in Section 3.1.2 below.
  58
  59   I believe that means we need a new consensus-method "6" that knows
  60   how to compute the microdescriptor-elements and add "m" lines.
  61
  62 3.1.1. Descriptor elements to include for now
  63
  64   To start, the element list that authorities suggest should be
  65     family onion-key
  66
  67   (Note that the or-dev posts above only mention onion-key, but if
  68   we don't also include family then clients will never learn it. It
  69   seemed like it should be relatively static, so putting it in the
  70   microdescriptor is smarter than trying to fit it into the consensus.)
  71
  72   We could imagine a config option "family,onion-key" so authorities
  73   could change their voted preferences without needing to upgrade.
  74
  75 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
  76
  77   One approach is for the consensus microdescriptor-elements line to
  78   include every element listed by a majority of authorities, sorted. The
  79   problem here is that it will no longer be deterministic what the correct
  80   hash for the "m" line should be. We could imagine telling the authority
  81   to go look in its descriptor and produce the right hash itself, but
  82   we don't want consensus calculation to be based on external data like
  83   that. (Plus, the authority may not have the descriptor that everybody
  84   else voted to use.)
  85
  86   The better approach is to take the exact set that has the most votes
  87   (breaking ties by the set that has the most elements, and breaking
  88   ties after that by whichever is alphabetically first). That will
  89   increase the odds that we actually get a microdescriptor hash that
  90   is both a) for the descriptor we're putting in the consensus, and b)
  91   over the elements that we're declaring it should be for.
  92
  93   Then the "m" line for a given relay is the one that gets the most votes
  94   from authorities that both a) voted for the microdescriptor-elements
  95   line we're using, and b) voted for the descriptor we're using.
  96
  97   (If there's a tie, use the smaller hash. But really, if there are
  98   multiple such votes and they differ about a microdescriptor, we caught
  99   one of them lying or being buggy. We should log it to track down why.)
 100
 101   If there are no such votes, then we leave out the "m" line for that
 102   relay. That means clients should avoid it for this time period. (As
 103   an extension it could instead mean that clients should fetch the
 104   descriptor and figure out its microdescriptor themselves. But let's
 105   not get ahead of ourselves.)
 106
 107   It would be nice to have a more foolproof way to agree on what
 108   microdescriptor hash each authority should vote for, so we can avoid
 109   missing "m" lines. Just switching to a new consensus-method each time
 110   we change the set of microdescriptor-elements won't help though, since
 111   each authority will still have to decide what hash to vote for before
 112   knowing what consensus-method will be used.
 113
 114   Here's one way we could do it. Each vote / consensus includes
 115   the microdescriptor-elements that were used to compute the hashes,
 116   and also a preferred-microdescriptor-elements set. If an authority
 117   has a consensus from the previous period, then it should use the
 118   consensus preferred-microdescriptor-elements when computing its votes
 119   for microdescriptor-elements and the appropriate hashes in the upcoming
 120   period. (If it has no previous consensus, then it just writes its
 121   own preferences in both lines.)
 122
 123 3.2. Directory mirrors serve microdescriptors
 124
 125   Directory mirrors should then read the microdescriptor-elements line
 126   from the consensus, and learn how to answer requests. (Directory mirrors
 127   continue to serve normal relay descriptors too, a) to serve old clients
 128   and b) to be able to construct microdescriptors on the fly.)
 129
 130   The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
 131     http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
 132
 133   All the microdescriptors from the current consensus should also be
 134   available at:
 135     http://<hostname>/tor/micro/all.z
 136   so a client that's bootstrapping doesn't need to send a 70KB URL just
 137   to name every microdescriptor it's looking for.
 138
 139   The format of a microdescriptor is the header line
 140   "microdescriptor-header"
 141   followed by each element (keyword and body), alphabetically. There's
 142   no need to mention what hash it's for, since it's self-identifying:
 143   you can hash the elements to learn this.
 144
 145   (Do we need a footer line to show that it's over, or is the next
 146   microdescriptor line or EOF enough of a hint? A footer line wouldn't
 147   hurt much. Also, no fair voting for the microdescriptor-element
 148   "microdescriptor-header".)
 149
 150   The hash of the microdescriptor is simply the hash of the concatenated
 151   elements -- not counting the header line or hypothetical footer line.
 152   Unless you prefer that?
 153
 154   Is there a reasonable way to version these things? We could say that
 155   the microdescriptor-header line can contain arguments which clients
 156   must ignore if they don't understand them. Any better ways?
 157
 158   Directory mirrors should check to make sure that the microdescriptors
 159   they're about to serve match the right hashes (either the hashes from
 160   the fetch URL or the hashes from the consensus, respectively).
 161
 162   We will probably want to consider some sort of smart data structure to
 163   be able to quickly convert microdescriptor hashes into the appropriate
 164   microdescriptor. Clients will want this anyway when they load their
 165   microdescriptor cache and want to match it up with the consensus to
 166   see what's missing.
 167
 168 3.3. Clients fetch them and cache them
 169
 170   When a client gets a new consensus, it looks to see if there are any
 171   microdescriptors it needs to learn. If it needs to learn more than
 172   some threshold of the microdescriptors (half?), it requests 'all',
 173   else it requests only the missing ones.
 174
 175   Clients maintain a cache of microdescriptors along with metadata like
 176   when it was last referenced by a consensus. They keep a microdescriptor
 177   until it hasn't been mentioned in any consensus for a week. Future
 178   clients might cache them for longer or shorter times.
 179
 180 3.3.1. Information leaks from clients
 181
 182   If a client asks you for a set of microdescs, then you know she didn't
 183   have them cached before. How much does that leak? What about when
 184   we're all using our entry guards as directory guards, and we've seen
 185   that user make a bunch of circuits already?
 186
 187   Fetching "all" when you need at least half is a good first order fix,
 188   but might not be all there is to it.
 189
 190   Another future option would be to fetch some of the microdescriptors
 191   anonymously (via a Tor circuit).
 192
 193 4. Transition and deployment
 194
 195   Phase one, the directory authorities should start voting on
 196   microdescriptors and microdescriptor elements, and putting them in the
 197   consensus. This should happen during the 0.2.1.x series, and should
 198   be relatively easy to do.
 199
 200   Phase two, directory mirrors should learn how to serve them, and learn
 201   how to read the consensus to find out what they should be serving. This
 202   phase could be done either in 0.2.1.x or early in 0.2.2.x, depending
 203   on how messy it turns out to be and how quickly we get around to it.
 204
 205   Phase three, clients should start fetching and caching them instead
 206   of normal descriptors. This should happen post 0.2.1.x.
 207