1 Filename: 158-microdescriptors.txt
2 Title: Clients download consensus + microdescriptors
5 Author: Roger Dingledine
11 This proposal replaces section 3.2 of proposal 141, which was
12 called "Fetching descriptors on demand". Rather than modifying the
13 circuit-building protocol to fetch a server descriptor inline at each
14 circuit extend, we instead put all of the information that clients need
15 either into the consensus itself, or into a new set of data about each
16 relay called a microdescriptor. The microdescriptor is a direct
17 transform from the relay descriptor, so relays don't even need to know
20 Descriptor elements that are small and frequently changing should go
21 in the consensus itself, and descriptor elements that are small and
22 relatively static should go in the microdescriptor. If we ever end up
23 with descriptor elements that aren't small yet clients need to know
24 them, we'll need to resume considering some design like the one in
30 http://archives.seul.org/or/dev/Nov-2008/msg00000.html and
31 http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially
32 http://archives.seul.org/or/dev/Nov-2008/msg00007.html
33 for a discussion of the options and why this is currently the best
38 There are three pieces to the proposal. First, authorities will list in
39 their votes (and thus in the consensus) what relay descriptor elements
40 are included in the microdescriptor, and also list the expected hash
41 of microdescriptor for each relay. Second, directory mirrors will serve
42 microdescriptors. Third, clients will ask for them and cache them.
44 3.1. Consensus changes
46 V3 votes should include a new line:
47 microdescriptor-elements bar baz foo
48 listing each descriptor element (sorted alphabetically) that authority
49 included when it calculated its expected microdescriptor hashes.
51 We also need to include the hash of each expected microdescriptor in
52 the routerstatus section. I suggest a new "m" line for each stanza,
53 with the base64 of the hash of the elements that the authority voted
56 The consensus microdescriptor-elements and "m" lines are then computed
57 as described in Section 3.1.2 below.
59 I believe that means we need a new consensus-method "6" that knows
60 how to compute the microdescriptor-elements and add "m" lines.
62 3.1.1. Descriptor elements to include for now
64 To start, the element list that authorities suggest should be
67 (Note that the or-dev posts above only mention onion-key, but if
68 we don't also include family then clients will never learn it. It
69 seemed like it should be relatively static, so putting it in the
70 microdescriptor is smarter than trying to fit it into the consensus.)
72 We could imagine a config option "family,onion-key" so authorities
73 could change their voted preferences without needing to upgrade.
75 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
77 One approach is for the consensus microdescriptor-elements line to
78 include every element listed by a majority of authorities, sorted. The
79 problem here is that it will no longer be deterministic what the correct
80 hash for the "m" line should be. We could imagine telling the authority
81 to go look in its descriptor and produce the right hash itself, but
82 we don't want consensus calculation to be based on external data like
83 that. (Plus, the authority may not have the descriptor that everybody
86 The better approach is to take the exact set that has the most votes
87 (breaking ties by the set that has the most elements, and breaking
88 ties after that by whichever is alphabetically first). That will
89 increase the odds that we actually get a microdescriptor hash that
90 is both a) for the descriptor we're putting in the consensus, and b)
91 over the elements that we're declaring it should be for.
93 Then the "m" line for a given relay is the one that gets the most votes
94 from authorities that both a) voted for the microdescriptor-elements
95 line we're using, and b) voted for the descriptor we're using.
97 (If there's a tie, use the smaller hash. But really, if there are
98 multiple such votes and they differ about a microdescriptor, we caught
99 one of them lying or being buggy. We should log it to track down why.)
101 If there are no such votes, then we leave out the "m" line for that
102 relay. That means clients should avoid it for this time period. (As
103 an extension it could instead mean that clients should fetch the
104 descriptor and figure out its microdescriptor themselves. But let's
105 not get ahead of ourselves.)
107 It would be nice to have a more foolproof way to agree on what
108 microdescriptor hash each authority should vote for, so we can avoid
109 missing "m" lines. Just switching to a new consensus-method each time
110 we change the set of microdescriptor-elements won't help though, since
111 each authority will still have to decide what hash to vote for before
112 knowing what consensus-method will be used.
114 Here's one way we could do it. Each vote / consensus includes
115 the microdescriptor-elements that were used to compute the hashes,
116 and also a preferred-microdescriptor-elements set. If an authority
117 has a consensus from the previous period, then it should use the
118 consensus preferred-microdescriptor-elements when computing its votes
119 for microdescriptor-elements and the appropriate hashes in the upcoming
120 period. (If it has no previous consensus, then it just writes its
121 own preferences in both lines.)
123 3.2. Directory mirrors serve microdescriptors
125 Directory mirrors should then read the microdescriptor-elements line
126 from the consensus, and learn how to answer requests. (Directory mirrors
127 continue to serve normal relay descriptors too, a) to serve old clients
128 and b) to be able to construct microdescriptors on the fly.)
130 The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
131 http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
133 All the microdescriptors from the current consensus should also be
135 http://<hostname>/tor/micro/all.z
136 so a client that's bootstrapping doesn't need to send a 70KB URL just
137 to name every microdescriptor it's looking for.
139 The format of a microdescriptor is the header line
140 "microdescriptor-header"
141 followed by each element (keyword and body), alphabetically. There's
142 no need to mention what hash it's for, since it's self-identifying:
143 you can hash the elements to learn this.
145 (Do we need a footer line to show that it's over, or is the next
146 microdescriptor line or EOF enough of a hint? A footer line wouldn't
147 hurt much. Also, no fair voting for the microdescriptor-element
148 "microdescriptor-header".)
150 The hash of the microdescriptor is simply the hash of the concatenated
151 elements -- not counting the header line or hypothetical footer line.
152 Unless you prefer that?
154 Is there a reasonable way to version these things? We could say that
155 the microdescriptor-header line can contain arguments which clients
156 must ignore if they don't understand them. Any better ways?
158 Directory mirrors should check to make sure that the microdescriptors
159 they're about to serve match the right hashes (either the hashes from
160 the fetch URL or the hashes from the consensus, respectively).
162 We will probably want to consider some sort of smart data structure to
163 be able to quickly convert microdescriptor hashes into the appropriate
164 microdescriptor. Clients will want this anyway when they load their
165 microdescriptor cache and want to match it up with the consensus to
168 3.3. Clients fetch them and cache them
170 When a client gets a new consensus, it looks to see if there are any
171 microdescriptors it needs to learn. If it needs to learn more than
172 some threshold of the microdescriptors (half?), it requests 'all',
173 else it requests only the missing ones.
175 Clients maintain a cache of microdescriptors along with metadata like
176 when it was last referenced by a consensus. They keep a microdescriptor
177 until it hasn't been mentioned in any consensus for a week. Future
178 clients might cache them for longer or shorter times.
180 3.3.1. Information leaks from clients
182 If a client asks you for a set of microdescs, then you know she didn't
183 have them cached before. How much does that leak? What about when
184 we're all using our entry guards as directory guards, and we've seen
185 that user make a bunch of circuits already?
187 Fetching "all" when you need at least half is a good first order fix,
188 but might not be all there is to it.
190 Another future option would be to fetch some of the microdescriptors
191 anonymously (via a Tor circuit).
193 4. Transition and deployment
195 Phase one, the directory authorities should start voting on
196 microdescriptors and microdescriptor elements, and putting them in the
197 consensus. This should happen during the 0.2.1.x series, and should
198 be relatively easy to do.
200 Phase two, directory mirrors should learn how to serve them, and learn
201 how to read the consensus to find out what they should be serving. This
202 phase could be done either in 0.2.1.x or early in 0.2.2.x, depending
203 on how messy it turns out to be and how quickly we get around to it.
205 Phase three, clients should start fetching and caching them instead
206 of normal descriptors. This should happen post 0.2.1.x.