1 Filename: 143-distributed-storage-improvements.txt
2 Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
5 Author: Karsten Loesing
12 28-Jun-2008 Initial proposal for or-dev
16 An evaluation of the distributed storage for Tor hidden service
17 descriptors and subsequent discussions have brought up a few improvements
18 to proposal 114. All improvements are backwards compatible to the
19 implementation of proposal 114.
23 1. Report Bad Directory Nodes
25 Bad hidden service directory nodes could deny existence of previously
26 stored descriptors. A bad directory node that does this with all stored
27 descriptors causes harm to the distributed storage in general, but
28 replication will cope with this problem in most cases. However, an
29 adversary that attempts to make a specific hidden service unavailable by
30 running relays that become responsible for all of a service's
31 descriptors poses a more serious threat. The distributed storage needs to
32 defend against this attack by detecting and removing bad directory nodes.
34 As a countermeasure hidden services try to download their descriptors
35 every hour at random times from the hidden service directories that are
36 responsible for storing it. If a directory node replies with 404 (Not
37 found), the hidden service reports the supposedly bad directory node to
38 a random selection of half of the directory authorities (with version
39 numbers equal to or higher than the first version that implements this
40 proposal). The hidden service posts a complaint message using HTTP 'POST'
41 to a URL "/tor/rendezvous/complain" with the following message format:
43 "hidden-service-directory-complaint" identifier NL
45 [At start, exactly once]
47 The identifier of the hidden service directory node to be
50 "rendezvous-service-descriptor" descriptor NL
52 [At end, Excatly once]
54 The hidden service descriptor that the supposedly bad directory node
57 The directory authority checks if the descriptor is valid and the hidden
58 service directory responsible for storing it. It waits for a random time
59 of up to 30 minutes before posting the descriptor to the hidden service
60 directory. If the publication is acknowledged, the directory authority
61 waits another random time of up to 30 minutes before attempting to
62 request the descriptor that it has posted. If the directory node replies
63 with 404 (Not found), it will be blacklisted for being a hidden service
64 directory node for the next 48 hours.
66 A blacklisted hidden service directory is assigned the new flag BadHSDir
67 instead of the HSDir flag in the vote that a directory authority creates.
68 In a consensus a relay is only assigned a HSDir flag if the majority of
69 votes contains a HSDir flag and no more than one third of votes contains
70 a BadHSDir flag. As a result, clients do not have to learn about the
71 BadHSDir flag. A blacklisted directory node will simply not be assigned
72 the HSDir flag in the consensus.
74 In order to prevent an attacker from setting up new nodes as replacement
75 for blacklisted directory nodes, all directory nodes in the same /24
76 subnet are blacklisted, too. Furthermore, if two or more directory nodes
77 are blacklisted in the same /16 subnet concurrently, all other directory
78 nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
81 2. Publish Fewer Replicas
83 The evaluation has shown that the probability of a directory node to
84 serve a previously stored descriptor is 85.7% (more precisely, this is
85 the 0.001-quantile of the empirical distribution with the rationale that
86 it holds for 99.9% of all empirical cases). If descriptors are replicated
87 to x directory nodes, the probability of at least one of the replicas to
88 be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
89 overall availability of 99.9%, x = 3.55 replicas need to be stored. From
90 this follows that 4 replicas are sufficient, rather than the currently
93 Further, the current design stores 2 sets of descriptors on 3 directory
94 nodes with consecutive identities. Originally, this was meant to
95 facilitate replication between directory nodes, which has not been and
96 will not be implemented (the selection criterion of 24 hours uptime does
97 not make it necessary). As a result, storing descriptors on directory
98 nodes with consecutive identities is not required. In fact it should be
99 avoided to enable an attacker to create "black holes" in the identifier
102 Hidden services should store their descriptors on 4 non-consecutive
103 directory nodes, and clients should request descriptors from these
104 directory nodes only. For compatibility reasons, hidden services also
105 store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
106 clients will be able to retrieve 4 out of 6 descriptors, but will fail
107 for the remaining 2 descriptors, which is sufficient for reliability. As
108 soon as 0.2.0.x is deprecated, hidden services can stop publishing the
109 additional 2 replicas.
111 3. Change Default Value of Being Hidden Service Directory
113 The requirements for becoming a hidden service directory node are an open
114 directory port and an uptime of at least 24 hours. The evaluation has
115 shown that there are 300 hidden service directory candidates in the mean,
116 but only 6 of them are configured to act as hidden service directories.
117 This is bad, because those 6 nodes need to serve a large share of all
118 hidden service descriptors. Optimally, there should be hundreds of hidden
119 service directories. Having a large number of 0.2.1.x directory nodes
120 also has a positive effect on 0.2.0.x hidden services and clients.
122 Therefore, the new default of HidServDirectoryV2 should be 1, so that a
123 Tor relay that has an open directory port automatically accepts and
124 serves v2 hidden service descriptors. A relay operator can still opt-out
125 running a hidden service directory by changing HidServDirectoryV2 to 0.
126 The additional bandwidth requirements for running a hidden service
127 directory node in addition to being a directory cache are negligible.
129 4. Make Descriptors Persistent on Directory Nodes
131 Hidden service directories that are restarted by their operators or after
132 a failure will not be selected as hidden service directories within the
133 next 24 hours. However, some clients might still think that these nodes
134 are responsible for certain descriptors, because they work on the basis
135 of network consensuses that are up to three hours old. The directory
136 nodes should be able to serve the previously received descriptors to
137 these clients. Therefore, directory nodes make all received descriptors
138 persistent and load previously received descriptors on startup.
140 5. Store and Serve Descriptors Regardless of Responsibility
142 Currently, directory nodes only accept descriptors for which they think
143 they are responsible. This may lead to problems when a directory node
144 uses an older or newer network consensus than hidden service or client
145 or when a directory node has been restarted recently. In fact, there are
146 no security issues in storing or serving descriptors for which a
147 directory node thinks it is not responsible. To the contrary, doing so
148 may improve reliability in border cases. As a result, a directory node
149 does not pay attention to responsibilty when receiving a publication or
150 fetch request, but stores or serves the requested descriptor. Likewise,
151 the directory node does not remove descriptors when it thinks it is not
152 responsible for them any more.
154 6. Avoid Periodic Descriptor Re-Publication
156 In the current implementation a hidden service re-publishes its
157 descriptor either when its content changes or an hour elapses. However,
158 the evaluation has shown that failures of hidden service directory nodes,
159 i.e. of nodes that have not failed within the last 24 hours, are very
160 rare. Together with making descriptors persistent on directory nodes,
161 there is no necessity to re-publish descriptors hourly.
163 The only two events leading to descriptor re-publication should be a
164 change of the descriptor content and a new directory node becoming
165 responsible for the descriptor. Hidden services should therefore consider
166 re-publication every time they learn about a new network consensus
169 7. Discard Expired Descriptors
171 The current implementation lets directory nodes keep a descriptor for two
172 days before discarding it. However, with the v2 design, descriptors are
173 only valid for at most one day. Directory nodes should determine the
174 validity of stored descriptors and discard them one hour after they have
175 expired (to compensate wrong clocks on clients).
177 8. Shorten Client-Side Descriptor Fetch History
179 When clients try to download a hidden service descriptor, they memorize
180 fetch requests to directory nodes for up to 15 minutes. This allows them
181 to request all replicas of a descriptor to avoid bad or failing directory
182 nodes, but without querying the same directory node twice.
184 The downside is that a client that has requested a descriptor without
185 success, will not be able to find a hidden service that has been started
186 during the following 15 minutes after the client's last request.
188 This can be improved by shortening the fetch history to only 5 minutes.
189 This time should be sufficient to complete requests for all replicas of a
190 descriptor, but without ending in an infinite request loop.
194 All proposed improvements are compatible to the currently implemented
195 design as described in proposal 114.