1 Filename: 143-distributed-storage-improvements.txt
2 Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
3 Author: Karsten Loesing
10 28-Jun-2008 Initial proposal for or-dev
14 An evaluation of the distributed storage for Tor hidden service
15 descriptors and subsequent discussions have brought up a few improvements
16 to proposal 114. All improvements are backwards compatible to the
17 implementation of proposal 114.
21 1. Report Bad Directory Nodes
23 Bad hidden service directory nodes could deny existence of previously
24 stored descriptors. A bad directory node that does this with all stored
25 descriptors causes harm to the distributed storage in general, but
26 replication will cope with this problem in most cases. However, an
27 adversary that attempts to make a specific hidden service unavailable by
28 running relays that become responsible for all of a service's
29 descriptors poses a more serious threat. The distributed storage needs to
30 defend against this attack by detecting and removing bad directory nodes.
32 As a countermeasure hidden services try to download their descriptors
33 every hour at random times from the hidden service directories that are
34 responsible for storing it. If a directory node replies with 404 (Not
35 found), the hidden service reports the supposedly bad directory node to
36 a random selection of half of the directory authorities (with version
37 numbers equal to or higher than the first version that implements this
38 proposal). The hidden service posts a complaint message using HTTP 'POST'
39 to a URL "/tor/rendezvous/complain" with the following message format:
41 "hidden-service-directory-complaint" identifier NL
43 [At start, exactly once]
45 The identifier of the hidden service directory node to be
48 "rendezvous-service-descriptor" descriptor NL
50 [At end, Excatly once]
52 The hidden service descriptor that the supposedly bad directory node
55 The directory authority checks if the descriptor is valid and the hidden
56 service directory responsible for storing it. It waits for a random time
57 of up to 30 minutes before posting the descriptor to the hidden service
58 directory. If the publication is acknowledged, the directory authority
59 waits another random time of up to 30 minutes before attempting to
60 request the descriptor that it has posted. If the directory node replies
61 with 404 (Not found), it will be blacklisted for being a hidden service
62 directory node for the next 48 hours.
64 A blacklisted hidden service directory is assigned the new flag BadHSDir
65 instead of the HSDir flag in the vote that a directory authority creates.
66 In a consensus a relay is only assigned a HSDir flag if the majority of
67 votes contains a HSDir flag and no more than one third of votes contains
68 a BadHSDir flag. As a result, clients do not have to learn about the
69 BadHSDir flag. A blacklisted directory node will simply not be assigned
70 the HSDir flag in the consensus.
72 In order to prevent an attacker from setting up new nodes as replacement
73 for blacklisted directory nodes, all directory nodes in the same /24
74 subnet are blacklisted, too. Furthermore, if two or more directory nodes
75 are blacklisted in the same /16 subnet concurrently, all other directory
76 nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
79 2. Publish Fewer Replicas
81 The evaluation has shown that the probability of a directory node to
82 serve a previously stored descriptor is 85.7% (more precisely, this is
83 the 0.001-quantile of the empirical distribution with the rationale that
84 it holds for 99.9% of all empirical cases). If descriptors are replicated
85 to x directory nodes, the probability of at least one of the replicas to
86 be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
87 overall availability of 99.9%, x = 3.55 replicas need to be stored. From
88 this follows that 4 replicas are sufficient, rather than the currently
91 Further, the current design stores 2 sets of descriptors on 3 directory
92 nodes with consecutive identities. Originally, this was meant to
93 facilitate replication between directory nodes, which has not been and
94 will not be implemented (the selection criterion of 24 hours uptime does
95 not make it necessary). As a result, storing descriptors on directory
96 nodes with consecutive identities is not required. In fact it should be
97 avoided to enable an attacker to create "black holes" in the identifier
100 Hidden services should store their descriptors on 4 non-consecutive
101 directory nodes, and clients should request descriptors from these
102 directory nodes only. For compatibility reasons, hidden services also
103 store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
104 clients will be able to retrieve 4 out of 6 descriptors, but will fail
105 for the remaining 2 descriptors, which is sufficient for reliability. As
106 soon as 0.2.0.x is deprecated, hidden services can stop publishing the
107 additional 2 replicas.
109 3. Change Default Value of Being Hidden Service Directory
111 The requirements for becoming a hidden service directory node are an open
112 directory port and an uptime of at least 24 hours. The evaluation has
113 shown that there are 300 hidden service directory candidates in the mean,
114 but only 6 of them are configured to act as hidden service directories.
115 This is bad, because those 6 nodes need to serve a large share of all
116 hidden service descriptors. Optimally, there should be hundreds of hidden
117 service directories. Having a large number of 0.2.1.x directory nodes
118 also has a positive effect on 0.2.0.x hidden services and clients.
120 Therefore, the new default of HidServDirectoryV2 should be 1, so that a
121 Tor relay that has an open directory port automatically accepts and
122 serves v2 hidden service descriptors. A relay operator can still opt-out
123 running a hidden service directory by changing HidServDirectoryV2 to 0.
124 The additional bandwidth requirements for running a hidden service
125 directory node in addition to being a directory cache are negligible.
127 4. Make Descriptors Persistent on Directory Nodes
129 Hidden service directories that are restarted by their operators or after
130 a failure will not be selected as hidden service directories within the
131 next 24 hours. However, some clients might still think that these nodes
132 are responsible for certain descriptors, because they work on the basis
133 of network consensuses that are up to three hours old. The directory
134 nodes should be able to serve the previously received descriptors to
135 these clients. Therefore, directory nodes make all received descriptors
136 persistent and load previously received descriptors on startup.
138 5. Store and Serve Descriptors Regardless of Responsibility
140 Currently, directory nodes only accept descriptors for which they think
141 they are responsible. This may lead to problems when a directory node
142 uses an older or newer network consensus than hidden service or client
143 or when a directory node has been restarted recently. In fact, there are
144 no security issues in storing or serving descriptors for which a
145 directory node thinks it is not responsible. To the contrary, doing so
146 may improve reliability in border cases. As a result, a directory node
147 does not pay attention to responsibilty when receiving a publication or
148 fetch request, but stores or serves the requested descriptor. Likewise,
149 the directory node does not remove descriptors when it thinks it is not
150 responsible for them any more.
152 6. Avoid Periodic Descriptor Re-Publication
154 In the current implementation a hidden service re-publishes its
155 descriptor either when its content changes or an hour elapses. However,
156 the evaluation has shown that failures of hidden service directory nodes,
157 i.e. of nodes that have not failed within the last 24 hours, are very
158 rare. Together with making descriptors persistent on directory nodes,
159 there is no necessity to re-publish descriptors hourly.
161 The only two events leading to descriptor re-publication should be a
162 change of the descriptor content and a new directory node becoming
163 responsible for the descriptor. Hidden services should therefore consider
164 re-publication every time they learn about a new network consensus
167 7. Discard Expired Descriptors
169 The current implementation lets directory nodes keep a descriptor for two
170 days before discarding it. However, with the v2 design, descriptors are
171 only valid for at most one day. Directory nodes should determine the
172 validity of stored descriptors and discard them one hour after they have
173 expired (to compensate wrong clocks on clients).
175 8. Shorten Client-Side Descriptor Fetch History
177 When clients try to download a hidden service descriptor, they memorize
178 fetch requests to directory nodes for up to 15 minutes. This allows them
179 to request all replicas of a descriptor to avoid bad or failing directory
180 nodes, but without querying the same directory node twice.
182 The downside is that a client that has requested a descriptor without
183 success, will not be able to find a hidden service that has been started
184 during the following 15 minutes after the client's last request.
186 This can be improved by shortening the fetch history to only 5 minutes.
187 This time should be sufficient to complete requests for all replicas of a
188 descriptor, but without ending in an infinite request loop.
192 All proposed improvements are compatible to the currently implemented
193 design as described in proposal 114.