proposals/114-distributed-storage.txt

   1 Filename: 114-distributed-storage.txt
   2 Title: Distributed Storage for Tor Hidden Service Descriptors
   3 Author: Karsten Loesing
   4 Created: 13-May-2007
   5 Status: Closed
   6 Implemented-In: 0.2.0.x
   7
   8 Change history:
   9
  10   13-May-2007  Initial proposal
  11   14-May-2007  Added changes suggested by Lasse Øverlier
  12   30-May-2007  Changed descriptor format, key length discussion, typos
  13   09-Jul-2007  Incorporated suggestions by Roger, added status of specification
  14                and implementation for upcoming GSoC mid-term evaluation
  15   11-Aug-2007  Updated implementation statuses, included non-consecutive
  16                replication to descriptor format
  17   20-Aug-2007  Renamed config option HSDir as HidServDirectoryV2
  18   02-Dec-2007  Closed proposal
  19
  20 Overview:
  21
  22   The basic idea of this proposal is to distribute the tasks of storing and
  23   serving hidden service descriptors from currently three authoritative
  24   directory nodes among a large subset of all onion routers. The three
  25   reasons to do this are better robustness (availability), better
  26   scalability, and improved security properties. Further,
  27   this proposal suggests changes to the hidden service descriptor format to
  28   prevent new security threats coming from decentralization and to gain even
  29   better security properties.
  30
  31 Status:
  32
  33   As of December 2007, the new hidden service descriptor format is implemented
  34   and usable. However, servers and clients do not yet make use of descriptor
  35   cookies, because there are open usability issues of this feature that might
  36   be resolved in proposal 121. Further, hidden service directories do not
  37   perform replication by themselves, because (unauthorized) replica fetch
  38   requests would allow any attacker to fetch all hidden service descriptors in
  39   the system. As neither issue is critical to the functioning of v2
  40   descriptors and their distribution, this proposal is considered as Closed.
  41
  42 Motivation:
  43
  44   The current design of hidden services exhibits the following performance and
  45   security problems:
  46
  47   First, the three hidden service authoritative directories constitute a
  48   performance bottleneck in the system. The directory nodes are responsible for
  49   storing and serving all hidden service descriptors. As of May 2007 there are
  50   about 1000 descriptors at a time, but this number is assumed to increase in
  51   the future. Further, there is no replication protocol for descriptors between
  52   the three directory nodes, so that hidden services must ensure the
  53   availability of their descriptors by manually publishing them on all
  54   directory nodes. Whenever a fourth or fifth hidden service authoritative
  55   directory is added, hidden services will need to maintain an equally
  56   increasing number of replicas. These scalability issues have an impact on the
  57   current usage of hidden services and put an even higher burden on the
  58   development of new kinds of applications for hidden services that might
  59   require storing even more descriptors.
  60
  61   Second, besides posing a limitation to scalability, storing all hidden
  62   service descriptors on three directory nodes also constitutes a security
  63   risk. The directory node operators could easily analyze the publish and fetch
  64   requests to derive information on service activity and usage and read the
  65   descriptor contents to determine which onion routers work as introduction
  66   points for a given hidden service and need to be attacked or threatened to
  67   shut it down. Furthermore, the contents of a hidden service descriptor offer
  68   only minimal security properties to the hidden service. Whoever gets aware of
  69   the service ID can easily find out whether the service is active at the
  70   moment and which introduction points it has. This applies to (former)
  71   clients, (former) introduction points, and of course to the directory nodes.
  72   It requires only to request the descriptor for the given service ID, which
  73   can be performed by anyone anonymously.
  74
  75   This proposal suggests two major changes to approach the described
  76   performance and security problems:
  77
  78   The first change affects the storage location for hidden service descriptors.
  79   Descriptors are distributed among a large subset of all onion routers instead
  80   of three fixed directory nodes. Each storing node is responsible for a subset
  81   of descriptors for a limited time only. It is not able to choose which
  82   descriptors it stores at a certain time, because this is determined by its
  83   onion ID which is hard to change frequently and in time (only routers which
  84   are stable for a given time are accepted as storing nodes). In order to
  85   resist single node failures and untrustworthy nodes, descriptors are
  86   replicated among a certain number of storing nodes. A first replication
  87   protocol makes sure that descriptors don't get lost when the node population
  88   changes; therefore, a storing node periodically requests the descriptors from
  89   its siblings. A second replication protocol distributes descriptors among
  90   non-consecutive nodes of the ID ring to prevent a group of adversaries from
  91   generating new onion keys until they have consecutive IDs to create a 'black
  92   hole' in the ring and make random services unavailable. Connections to
  93   storing nodes are established by extending existing circuits by one hop to
  94   the storing node. This also ensures that contents are encrypted. The effect
  95   of this first change is that the probability that a single node operator
  96   learns about a certain hidden service is very small and that it is very hard
  97   to track a service over time, even when it collaborates with other node
  98   operators.
  99
 100   The second change concerns the content of hidden service descriptors.
 101   Obviously, security problems cannot be solved only by decentralizing storage;
 102   in fact, they could also get worse if done without caution. At first, a
 103   descriptor ID needs to change periodically in order to be stored on changing
 104   nodes over time. Next, the descriptor ID needs to be computable only for the
 105   service's clients, but should be unpredictable for all other nodes. Further,
 106   the storing node needs to be able to verify that the hidden service is the
 107   true originator of the descriptor with the given ID even though it is not a
 108   client. Finally, a storing node should learn as little information as
 109   necessary by storing a descriptor, because it might not be as trustworthy as
 110   a directory node; for example it does not need to know the list of
 111   introduction points. Therefore, a second key is applied that is only known to
 112   the hidden service provider and its clients and that is not included in the
 113   descriptor. It is used to calculate descriptor IDs and to encrypt the
 114   introduction points. This second key can either be given to all clients
 115   together with the hidden service ID, or to a group or a single client as
 116   an authentication token. In the future this second key could be the result of
 117   some key agreement protocol between the hidden service and one or more
 118   clients. A new text-based format is proposed for descriptors instead of an
 119   extension of the existing binary format for reasons of future extensibility.
 120
 121 Design:
 122
 123   The proposed design is described by the required changes to the current
 124   design. These requirements are grouped by content, rather than by affected
 125   specification documents or code files, and numbered for reference below.
 126
 127   Hidden service clients, servers, and directories:
 128
 129   /1/ Create routing list
 130
 131     All participants can filter the consensus status document received from the
 132     directory authorities to one routing list containing only those servers
 133     that store and serve hidden service descriptors and which are running for
 134     at least 24 hours. A participant only trusts its own routing list and never
 135     learns about routing information from other parties.
 136
 137   /2/ Determine responsible hidden service directory
 138
 139     All participants can determine the hidden service directory that is
 140     responsible for storing and serving a given ID, as well as the hidden
 141     service directories that replicate its content. Every hidden service
 142     directory is responsible for the descriptor IDs in the interval from
 143     its predecessor, exclusive, to its own ID, inclusive. Further, a hidden
 144     service directory holds replicas for its n predecessors, where n denotes
 145     the number of consecutive replicas. (requires /1/)
 146
 147   [/3/ and /4/ were requirements to use BEGIN_DIR cells for directory
 148    requests which have not been fulfilled in the course of the implementation
 149    of this proposal, but elsewhere.]
 150
 151   Hidden service directory nodes:
 152
 153   /5/ Advertise hidden service directory functionality
 154
 155     Every onion router that has its directory port open can decide whether it
 156     wants to store and serve hidden service descriptors by setting a new config
 157     option "HidServDirectoryV2" 0|1 to 1. An onion router with this config
 158     option being set includes the flag "hidden-service-dir" in its router
 159     descriptors that it sends to directory authorities.
 160
 161   /6/ Accept v2 publish requests, parse and store v2 descriptors
 162
 163     Hidden service directory nodes accept publish requests for hidden service
 164     descriptors and store them to their local memory. (It is not necessary to
 165     make descriptors persistent, because after disconnecting, the onion router
 166     would not be accepted as storing node anyway, because it has not been
 167     running for at least 24 hours.) All requests and replies are formatted as
 168     HTTP messages. Requests are directed to the router's directory port and are
 169     contained within BEGIN_DIR cells. A hidden service directory node stores a
 170     descriptor only when it thinks that it is responsible for storing that
 171     descriptor based on its own routing table. Every hidden service directory
 172     node is responsible for the descriptor IDs in the interval of its n-th
 173     predecessor in the ID circle up to its own ID (n denotes the number of
 174     consecutive replicas). (requires /1/)
 175
 176   /7/ Accept v2 fetch requests
 177
 178     Same as /6/, but with fetch requests for hidden service descriptors.
 179     (requires /2/)
 180
 181   /8/ Replicate descriptors with neighbors
 182
 183     A hidden service directory node replicates descriptors from its two
 184     predecessors by downloading them once an hour. Further, it checks its
 185     routing table periodically for changes. Whenever it realizes that a
 186     predecessor has left the network, it establishes a connection to the new
 187     n-th predecessor and requests its stored descriptors in the interval of its
 188     (n+1)-th predecessor and the requested n-th predecessor. Whenever it
 189     realizes that a new onion router has joined with an ID higher than its
 190     former n-th predecessor, it adds it to its predecessors and discards all
 191     descriptors in the interval of its (n+1)-th and its n-th predecessor.
 192     (requires /1/)
 193
 194     [Dec 02: This function has not been implemented, because arbitrary nodes
 195      what have been able to download the entire set of v2 descriptors. An
 196      authorized replication request would be necessary. For the moment, the
 197      system runs without any directory-side replication. -KL]
 198
 199   Authoritative directory nodes:
 200
 201   /9/ Confirm a router's hidden service directory functionality
 202
 203     Directory nodes include a new flag "HSDir" for routers that decided to
 204     provide storage for hidden service descriptors and that are running for at
 205     least 24 hours. The last requirement prevents a node from frequently
 206     changing its onion key to become responsible for an identifier it wants to
 207     target.
 208
 209   Hidden service provider:
 210
 211   /10/ Configure v2 hidden service
 212
 213     Each hidden service provider that has set the config option
 214     "PublishV2HidServDescriptors" 0|1 to 1 is configured to publish v2
 215     descriptors and conform to the v2 connection establishment protocol. When
 216     configuring a hidden service, a hidden service provider checks if it has
 217     already created a random secret_cookie and a hostname2 file; if not, it
 218     creates both of them. (requires /2/)
 219
 220   /11/ Establish introduction points with fresh key
 221
 222     If configured to publish only v2 descriptors and no v0/v1 descriptors any
 223     more, a hidden service provider that is setting up the hidden service at
 224     introduction points does not pass its own public key, but the public key
 225     of a freshly generated key pair. It also includes these fresh public keys
 226     in the hidden service descriptor together with the other introduction point
 227     information. The reason is that the introduction point does not need to and
 228     therefore should not know for which hidden service it works, so as to
 229     prevent it from tracking the hidden service's activity. (If a hidden
 230     service provider supports both, v0/v1 and v2 descriptors, v0/v1 clients
 231     rely on the fact that all introduction points accept the same public key,
 232     so that this new feature cannot be used.)
 233
 234   /12/ Encode v2 descriptors and send v2 publish requests
 235
 236     If configured to publish v2 descriptors, a hidden service provider
 237     publishes a new descriptor whenever its content changes or a new
 238     publication period starts for this descriptor. If the current publication
 239     period would only last for less than 60 minutes (= 2 x 30 minutes to allow
 240     the server to be 30 minutes behind and the client 30 minutes ahead), the
 241     hidden service provider publishes both a current descriptor and one for
 242     the next period. Publication is performed by sending the descriptor to all
 243     hidden service directories that are responsible for keeping replicas for
 244     the descriptor ID. This includes two non-consecutive replicas that are
 245     stored at 3 consecutive nodes each. (requires /1/ and /2/)
 246
 247   Hidden service client:
 248
 249   /13/ Send v2 fetch requests
 250
 251     A hidden service client that has set the config option
 252     "FetchV2HidServDescriptors" 0|1 to 1 handles SOCKS requests for v2 onion
 253     addresses by requesting a v2 descriptor from a randomly chosen hidden
 254     service directory that is responsible for keeping replica for the
 255     descriptor ID. In total there are six replicas of which the first and the
 256     last three are stored on consecutive nodes. The probability of picking one
 257     of the three consecutive replicas is 1/6, 2/6, and 3/6 to incorporate the
 258     fact that the availability will be the highest on the node with next higher
 259     ID. A hidden service client relies on the hidden service provider to store
 260     two sets of descriptors to compensate clock skew between service and
 261     client. (requires /1/ and /2/)
 262
 263   /14/ Process v2 fetch reply and parse v2 descriptors
 264
 265     A hidden service client that has sent a request for a v2 descriptor can
 266     parse it and store it to the local cache of rendezvous service descriptors.
 267
 268   /15/ Establish connection to v2 hidden service
 269
 270     A hidden service client can establish a connection to a hidden service
 271     using a v2 descriptor. This includes using the secret cookie for decrypting
 272     the introduction points contained in the descriptor. When contacting an
 273     introduction point, the client does not use the public key of the hidden
 274     service provider, but the freshly-generated public key that is included in
 275     the hidden service descriptor. Whether or not a fresh key is used instead
 276     of the key of the hidden service depends on the available protocol versions
 277     that are included in the descriptor; by this, connection establishment is
 278     to a certain extend decoupled from fetching the descriptor.
 279
 280   Hidden service descriptor:
 281
 282   (Requirements concerning the descriptor format are contained in /6/ and /7/.)
 283
 284     The new v2 hidden service descriptor format looks like this:
 285
 286       onion-address = h(public-key) + cookie
 287       descriptor-id = h(h(public-key) + h(time-period + cookie + relica))
 288       descriptor-content = {
 289         descriptor-id,
 290         version,
 291         public-key,
 292         h(time-period + cookie + replica),
 293         timestamp,
 294         protocol-versions,
 295         { introduction-points } encrypted with cookie
 296       } signed with private-key
 297
 298     The "descriptor-id" needs to change periodically in order for the
 299     descriptor to be stored on changing nodes over time. It may only be
 300     computable by a hidden service provider and all of his clients to prevent
 301     unauthorized nodes from tracking the service activity by periodically
 302     checking whether there is a descriptor for this service. Finally, the
 303     hidden service directory needs to be able to verify that the hidden service
 304     provider is the true originator of the descriptor with the given ID.
 305
 306     Therefore, "descriptor-id" is derived from the "public-key" of the hidden
 307     service provider, the current "time-period" which changes every 24 hours,
 308     a secret "cookie" shared between hidden service provider and clients, and
 309     a "replica" denoting the number of this non-consecutive replica. (The
 310     "time-period" is constructed in a way that time periods do not change at
 311     the same moment for all descriptors by deriving a value between 0:00 and
 312     23:59 hours from h(public-key) and making the descriptors of this hidden
 313     service provider expire at that time of the day.) The "descriptor-id" is
 314     defined to be 160 bits long. [extending the "descriptor-id" length
 315     suggested by LØ]
 316
 317     Only the hidden service provider and the clients are able to generate
 318     future "descriptor-ID"s. Hence, the "onion-address" is extended from now
 319     the hash value of "public-key" by the secret "cookie". The "public-key" is
 320     determined to be 80 bits long, whereas the "cookie" is dimensioned to be
 321     120 bits long. This makes a total of 200 bits or 40 base32 chars, which is
 322     quite a lot to handle for a human, but necessary to provide sufficient
 323     protection against an adversary from generating a key pair with same
 324     "public-key" hash or guessing the "cookie".
 325
 326     A hidden service directory can verify that a descriptor was created by the
 327     hidden service provider by checking if the "descriptor-id" corresponds to
 328     the "public-key" and if the signature can be verified with the
 329     "public-key".
 330
 331     The "introduction-points" that are included in the descriptor are encrypted
 332     using the same "cookie" that is shared between hidden service provider and
 333     clients. [correction to use another key than h(time-period + cookie) as
 334     encryption key for introduction points made by LØ]
 335
 336     A new text-based format is proposed for descriptors instead of an extension
 337     of the existing binary format for reasons of future extensibility.
 338
 339 Security implications:
 340
 341   The security implications of the proposed changes are grouped by the roles of
 342   nodes that could perform attacks or on which attacks could be performed.
 343
 344   Attacks by authoritative directory nodes
 345
 346     Authoritative directory nodes are no longer the single places in the
 347     network that know about a hidden service's activity and introduction
 348     points. Thus, they cannot perform attacks using this information, e.g.
 349     track a hidden service's activity or usage pattern or attack its
 350     introduction points. Formerly, it would only require a single corrupted
 351     authoritative directory operator to perform such an attack.
 352
 353   Attacks by hidden service directory nodes
 354
 355     A hidden service directory node could misuse a stored descriptor to track a
 356     hidden service's activity and usage pattern by clients. Though there is no
 357     countermeasure against this kind of attack, it is very expensive to track a
 358     certain hidden service over time. An attacker would need to run a large
 359     number of stable onion routers that work as hidden service directory nodes
 360     to have a good probability to become responsible for its changing
 361     descriptor IDs. For each period, the probability is:
 362
 363       1-(N-c choose r)/(N choose r) for N-c>=r and 1 otherwise, with N
 364       as total
 365       number of hidden service directories, c as compromised nodes, and r as
 366       number of replicas
 367
 368     The hidden service directory nodes could try to make a certain hidden
 369     service unavailable to its clients. Therefore, they could discard all
 370     stored descriptors for that hidden service and reply to clients that there
 371     is no descriptor for the given ID or return an old or false descriptor
 372     content. The client would detect a false descriptor, because it could not
 373     contain a correct signature. But an old content or an empty reply could
 374     confuse the client. Therefore, the countermeasure is to replicate
 375     descriptors among a small number of hidden service directories, e.g. 5.
 376     The probability of a group of collaborating nodes to make a hidden service
 377     completely unavailable is in each period:
 378
 379       (c choose r)/(N choose r) for c>=r and N>=r, and 0 otherwise,
 380       with N as total
 381       number of hidden service directories, c as compromised nodes, and r as
 382       number of replicas
 383
 384     A hidden service directory could try to find out which introduction points
 385     are working on behalf of a hidden service. In contrast to the previous
 386     design, this is not possible anymore, because this information is encrypted
 387     to the clients of a hidden service.
 388
 389   Attacks on hidden service directory nodes
 390
 391     An anonymous attacker could try to swamp a hidden service directory with
 392     false descriptors for a given descriptor ID. This is prevented by requiring
 393     that descriptors are signed.
 394
 395     Anonymous attackers could swamp a hidden service directory with correct
 396     descriptors for non-existing hidden services. There is no countermeasure
 397     against this attack. However, the creation of valid descriptors is more
 398     expensive than verification and storage in local memory. This should make
 399     this kind of attack unattractive.
 400
 401   Attacks by introduction points
 402
 403     Current or former introduction points could try to gain information on the
 404     hidden service they serve. But due to the fresh key pair that is used by
 405     the hidden service, this attack is not possible anymore.
 406
 407   Attacks by clients
 408
 409     Current or former clients could track a hidden service's activity, attack
 410     its introduction points, or determine the responsible hidden service
 411     directory nodes and attack them. There is nothing that could prevent them
 412     from doing so, because honest clients need the full descriptor content to
 413     establish a connection to the hidden service. At the moment, the only
 414     countermeasure against dishonest clients is to change the secret cookie and
 415     pass it only to the honest clients.
 416
 417 Compatibility:
 418
 419   The proposed design is meant to replace the current design for hidden service
 420   descriptors and their storage in the long run.
 421
 422   There should be a first transition phase in which both, the current design
 423   and the proposed design are served in parallel. Onion routers should start
 424   serving as hidden service directories, and hidden service providers and
 425   clients should make use of the new design if both sides support it. Hidden
 426   service providers should be allowed to publish descriptors of the current
 427   format in parallel, and authoritative directories should continue storing and
 428   serving these descriptors.
 429
 430   After the first transition phase, hidden service providers should stop
 431   publishing descriptors on authoritative directories, and hidden service
 432   clients should not try to fetch descriptors from the authoritative
 433   directories. However, the authoritative directories should continue serving
 434   hidden service descriptors for a second transition phase. As of this point,
 435   all v2 config options should be set to a default value of 1.
 436
 437   After the second transition phase, the authoritative directories should stop
 438   serving hidden service descriptors.
 439