doc/incentives.txt

   1
   2                  Tor Incentives Design Brainstorms
   3
   4 1. Goals: what do we want to achieve with an incentive scheme?
   5
   6 1.1. Encourage users to provide good relay service (throughput, latency).
   7 1.2. Encourage users to allow traffic to exit the Tor network from
   8      their node.
   9
  10 2. Approaches to learning who should get priority.
  11
  12 2.1. "Hard" or quantitative reputation tracking.
  13
  14    In this design, we track the number of bytes and throughput in and
  15    out of nodes we interact with. When a node asks to send or receive
  16    bytes, we provide service proportional to our current record of the
  17    node's value. One approach is to let each circuit be either a normal
  18    circuit or a premium circuit, and nodes can "spend" their value by
  19    sending and receiving bytes on premium circuits: see section 4.1 for
  20    details of this design. Another approach (section 4.2) would treat
  21    all traffic from the node with the same priority class, and so nodes
  22    that provide resources will get and provide better service on average.
  23
  24    This approach could be complemented with an anonymous e-cash
  25    implementation to let people spend reputations gained from one context
  26    in another context.
  27
  28 2.2. "Soft" or qualitative reputation tracking.
  29
  30    Rather than accounting for every byte (if I owe you a byte, I don't
  31    owe it anymore once you've spent it), instead I keep a general opinion
  32    about each server: my opinion increases when they do good work for me,
  33    and it decays with time, but it does not decrease as they send traffic.
  34    Therefore we reward servers who provide value to the system without
  35    nickle and diming them at each step. We also let them benefit from
  36    relaying traffic for others without having to "reserve" some of the
  37    payment for their own use. See section 4.3 for a possible design.
  38
  39 2.3. Centralized opinions from the reputation servers.
  40
  41    The above approaches are complex and we don't have all the answers
  42    for them yet. A simpler approach is just to let some central set
  43    of trusted servers (say, the Tor directory servers) measure whether
  44    people are contributing to the network, and provide a signal about
  45    which servers should be rewarded. They can even do the measurements
  46    via Tor so servers can't easily perform only when they're being
  47    tested. See section 4.4.
  48
  49 2.4. Reputation servers that aggregate opinions.
  50
  51    The option above has the directory servers doing all of the
  52    measurements. This doesn't scale. We can set it up so we have "deputy
  53    testers" -- trusted other nodes that do performance testing and report
  54    their results.
  55
  56    If we want to be really adventurous, we could even
  57    accept claims from every Tor user and build a complex weighting /
  58    reputation system to decide which claims are "probably" right.
  59    One possible way to implement the latter is something similar to
  60    EigenTrust [http://www.stanford.edu/~sdkamvar/papers/eigentrust.pdf],
  61    where the opinion of nodes with high reputation more is weighted
  62    higher.
  63
  64 3. Related issues we need to keep in mind.
  65
  66 3.1. Relay and exit configuration needs to be easy and usable.
  67
  68    Implicit in all of the above designs is the need to make it easy to
  69    run a Tor server out of the box. We need to make it stable on all
  70    common platforms (including XP), it needs to detect its available
  71    bandwidth and not overreach that, and it needs to help the operator
  72    through opening up ports on his firewall. Then we need a slick GUI
  73    that lets people click a button or two rather than editing text files.
  74
  75    Once we've done all this, we'll hit our first big question: is
  76    most of the barrier to growth caused by the unusability of the current
  77    software? If so, are the rest of these incentive schemes superfluous?
  78
  79 3.2. The network effect: how many nodes will you interact with?
  80
  81    One of the concerns with pairwise reputation systems is that as the
  82    network gets thousands of servers, the chance that you're going to
  83    interact with a given server decreases. So if 90% of interactions
  84    don't have any prior information, the "local" incentive schemes above
  85    are going to degrade. This doesn't mean they're pointless -- it just
  86    means we need to be aware that this is a limitation, and plan in the
  87    background for what step to take next. (It seems that e-cash solutions
  88    would scale better, though they have issues of their own.)
  89
  90 3.3. Guard nodes
  91
  92    As of Tor 0.1.1.11, Tor users pick from a small set of semi-permanent
  93    "guard nodes" for their first hop of each circuit. This seems like it
  94    would have a big impact on pairwise reputation systems since you
  95    will only be cashing in on your reputation to a few people, and it is
  96    unlikely that a given pair of nodes will use each other as guard nodes.
  97
  98    What does this imply? For one, it means that we don't care at all
  99    about the opinions of most of the servers out there -- we should
 100    focus on keeping our guard nodes happy with us.
 101
 102    One conclusion from that is that our design needs to judge performance
 103    not just through direct interaction (beginning of the circuit) but
 104    also through indirect interaction (middle of the circuit). That way
 105    you can never be sure when your guards are measuring you.
 106
 107    Both 3.2 and 3.3 may be solved by having a global notion of reputation,
 108    as in 2.3 and 2.4. However, computing the global reputation from local
 109    views could be expensive (O(n^2)) when the network is really large.
 110
 111 3.4. Restricted topology: benefits and roadmap.
 112
 113    As the Tor network continues to grow, we will need to make design
 114    changes to the network topology so that each node does not need
 115    to maintain connections to an unbounded number of other nodes. For
 116    anonymity's sake, we may partition the network such that all
 117    the nodes have the same belief about the divisions and each node is
 118    in only one partition. (The alternative is that every user fetches
 119    his own random subset of the overall node list -- this is bad because
 120    of intersection attacks.)
 121
 122    Therefore the "network horizon" for each user will stay bounded,
 123    which helps against the above issues in 3.2 and 3.3.
 124
 125    It could be that the core of long-lived servers will all get to know
 126    each other, and so the critical point that decides whether you get
 127    good service is whether the core likes you. Or perhaps it will turn
 128    out to work some other way.
 129
 130    A special case here is the social network, where the network isn't
 131    partitioned randomly but instead based on some external properties.
 132    Social network topologies can provide incentives in other ways, because
 133    people may be more inclined to help out their friends, and more willing
 134    to relay traffic if most of the traffic they are relaying comes
 135    from their friends. It also opens the door for out-of-band incentive
 136    schemes because of the out-of-band links in the graph.
 137
 138 3.5. Profit-maximizing vs. Altruism.
 139
 140    There are some interesting game theory questions here.
 141
 142    First, in a volunteer culture, success is measured in public utility
 143    or in public esteem. If we add a reward mechanism, there's a risk that
 144    reward-maximizing behavior will surpass utility- or esteem-maximizing
 145    behavior.
 146
 147    Specifically, if most of our servers right now are relaying traffic
 148    for the good of the community, we may actually *lose* those volunteers
 149    if we turn the act of relaying traffic into a selfish act.
 150
 151    I am not too worried about this issue for now, since we're aiming
 152    for an incentive scheme so effective that it produces tens of
 153    thousands of new servers.
 154
 155 3.6. What part of the node's performance do you measure?
 156
 157    We keep referring to having a node measure how well the other nodes
 158    receive bytes. But don't leeching clients receive bytes just as well
 159    as servers?
 160
 161    Further, many transactions in Tor involve fetching lots of
 162    bytes and not sending very many. So it seems that we want to turn
 163    things around: we need to measure how quickly a node is _sending_
 164    us bytes, and then only send it bytes in proportion to that.
 165
 166    However, a sneaky user could simply connect to a node and send some
 167    traffic through it, and voila, he has performed for the network. This
 168    is no good. The first fix is that we only count if you're receiving
 169    bytes "backwards" in the circuit. Now the sneaky user needs to
 170    construct a circuit such that his node appears later in the circuit,
 171    and then send some bytes back quickly.
 172
 173    Maybe that complexity is sufficient to deter most lazy users. Or
 174    maybe it's an argument in favor of a more penny-counting reputation
 175    approach.
 176
 177    Addendum: I was more thinking of measuring based on who is the service
 178    provider and service receiver for the circuit. Say Alice builds a
 179    circuit to Bob. Then Bob is providing service to Alice, since he
 180    otherwise wouldn't need to spend is bandwidth. So traffic in either
 181    direction should be charged to Alice. Of course, the same attack would
 182    work, namely, Bob could cheat by sending bytes back quickly. So someone
 183    close to the origin needs to detect this and close the circuit, if
 184    necessary. -JN
 185
 186 3.7. What is the appropriate resource balance for servers vs. clients?
 187
 188    If we build a good incentive system, we'll still need to tune it
 189    to provide the right bandwidth allocation -- if we reserve too much
 190    bandwidth for fast servers, then we're wasting some potential, but
 191    if we reserve too little, then fewer people will opt to become servers.
 192    In fact, finding an optimum balance is especially hard because it's
 193    a moving target: the better our incentive mechanism (and the lower
 194    the barrier to setup), the more servers there will be. How do we find
 195    the right balance?
 196
 197    One answer is that it doesn't have to be perfect: we can err on the
 198    side of providing extra resources to servers. Then we will achieve our
 199    desired goal -- when people complain about speed, we can tell them to
 200    run a server, and they will in fact get better performance.
 201
 202 3.8. Anonymity attack: fast connections probably come from good servers.
 203
 204    If only fast servers can consistently get good performance in the
 205    network, they will stand out. "Oh, that connection probably came from
 206    one of the top ten servers in the network." Intersection attacks over
 207    time can improve the certainty of the attack.
 208
 209    I'm not too worried about this. First, in periods of low activity,
 210    many different people might be getting good performance. This dirties
 211    the intersection attack. Second, with many of these schemes, we will
 212    still be uncertain whether the fast node originated the traffic, or
 213    was the entry node for some other lucky user -- and we already accept
 214    this level of attack in other cases such as the Murdoch-Danezis attack
 215    [http://freehaven.net/anonbib/#torta05].
 216
 217 3.9. How do we allocate bandwidth over the course of a second?
 218
 219    This may be a simple matter of engineering, but it still needs to be
 220    addressed. Our current token bucket design refills each bucket once a
 221    second. If we have N tokens in our bucket, and we don't know ahead of
 222    time how many connections are going to want to send out how many bytes,
 223    how do we balance providing quick service to the traffic that is
 224    already here compared to providing service to potential high-importance
 225    future traffic?
 226
 227    If we have only two classes of service, here is a simple design:
 228    At each point, when we are 1/t through the second, the total number
 229    of non-priority bytes we are willing to send out is N/t. Thus if N
 230    priority bytes are waiting at the beginning of the second, we drain
 231    our whole bucket then, and otherwise we provide some delayed service
 232    to the non-priority bytes.
 233
 234    Does this design expand to cover the case of three priority classes?
 235    Ideally we'd give each remote server its own priority number. Or
 236    hopefully there's an easy design in the literature to point to --
 237    this is clearly not my field.
 238
 239    Is our current flow control mechanism (each circuit and each stream
 240    start out with a certain window, and once they've exhausted it they
 241    need to receive an ack before they can send more) going to have
 242    problems with this new design now that we'll be queueing more bytes
 243    for less preferred nodes? If it turns out we do, the first fix is
 244    to have the windows start out at zero rather than start out full --
 245    it will slow down the startup phase but protect us better.
 246
 247    While we have outgoing cells queued for a given server, we have the
 248    option of reordering them based on the priority of the previous hop.
 249    Is this going to turn out to be useful? If we're the exit node (that
 250    is, there is no previous hop) what priority do those cells get?
 251
 252    Should we do this prioritizing just for sending out bytes (as I've
 253    described here) or would it help to do it also for receiving bytes?
 254    See next section.
 255
 256 3.10. Different-priority cells arriving on the same TCP connection.
 257
 258    In some of the proposed designs, servers want to give specific circuits
 259    priority rather than having all circuits from them get the same class
 260    of service.
 261
 262    Since Tor uses TCP's flow control for rate limiting, this constraints
 263    our design choices -- it is easy to give different TCP connections
 264    different priorities, but it is hard to give different cells on the
 265    same connection priority, because you have to read them to know what
 266    priority they're supposed to get.
 267
 268    There are several possible solutions though. First is that we rely on
 269    the sender to reorder them so the highest priority cells (circuits) are
 270    more often first. Second is that if we open two TCP connections -- one
 271    for the high-priority cells, and one for the low-priority cells. (But
 272    this prevents us from changing the priority of a circuit because
 273    we would need to migrate it from one connection to the other.) A
 274    third approach is to remember which connections have recently sent
 275    us high-priority cells, and preferentially read from those connections.
 276
 277    Hopefully we can get away with not solving this section at all. But if
 278    necessary, we can consult Ed Knightly, a Professor at Rice
 279    [http://www.ece.rice.edu/~knightly/], for his extensive experience on
 280    networking QoS.
 281
 282 3.11. Global reputation system: Congestion on high reputation servers?
 283
 284    If the notion of reputation is global (as in 2.3 or 2.4), circuits that
 285    go through successive high reputation servers would be the fastest and
 286    most reliable. This would incentivize everyone, regardless of their own
 287    reputation, to choose only the highest reputation servers in its
 288    circuits, causing an over-congestion on those servers.
 289
 290    One could argue, though, that once those servers are over-congested,
 291    their bandwidth per circuit drops, which would in turn lower their
 292    reputation in the future. A question is whether this would overall
 293    stablize.
 294
 295    Another possible way is to keep a cap on reputation. In this way, a
 296    fraction of servers would have the same high reputation, thus balancing
 297    such load.
 298
 299 3.12. Another anonymity attack: learning from service levels.
 300
 301    If reputation is local, it may be possible for an evil node to learn
 302    the identity of the origin through provision of differential service.
 303    For instance, the evil node provides crappy bandwidth to everyone,
 304    until it finds a circuit that it wants to trace the origin, then it
 305    provides good bandwidth. Now, as only those directly or indirectly
 306    observing this circuit would like the evil node, it can test each node
 307    by building a circuit via each node to another evil node. If the
 308    bandwidth is high, it is (somewhat) likely that the node was a part of
 309    the circuit.
 310
 311    This problem does not exist if the reputation is global and nodes only
 312    follow the global reputation, i.e., completely ignore their own view.
 313
 314 3.13. DoS through high priority traffic.
 315
 316    Assume there is an evil node with high reputation (or high value on
 317    Alice) and this evil node wants to deny the service to Alice. What it
 318    needs to do is to send a lot of traffic to Alice. To Alice, all traffic
 319    from this evil node is of high priority. If the choice of circuits are
 320    too based toward high priority circuits, Alice would spend most of her
 321    available bandwidth on this circuit, thus providing poor bandwidth to
 322    everyone else. Everyone else would start to dislike Alice, making it
 323    even harder for her to forward other nodes' traffic. This could cause
 324    Alice to have a low reputation, and the only high bandwidth circuit
 325    Alice could use would be via the evil node.
 326
 327 3.14. If you run a fast server, can you run your client elsewhere?
 328
 329    A lot of people want to run a fast server at a colocation facility,
 330    and then reap the rewards using their cablemodem or DSL Tor client.
 331
 332    If we use anonymous micropayments, where reputation can literally
 333    be transferred, this is trivial.
 334
 335    If we pick a design where servers accrue reputation and can only
 336    use it themselves, though, the clients can configure the servers as
 337    their entry nodes and "inherit" their reputation. In this approach
 338    we would let servers configure a set of IP addresses or keys that get
 339    "like local" service.
 340
 341 4. Sample designs.
 342
 343 4.1. Two classes of service for circuits.
 344
 345    Whenever a circuit is built, it is specified by the origin which class,
 346    either "premium" or "normal", this circuit belongs. A premium circuit
 347    gets preferred treatment at each node. A node "spends" its value, which
 348    it earned a priori by providing service, to the next node by sending
 349    and receiving bytes. Once a node has overspent its values, the circuit
 350    cannot stay as premium. It can either breaks or converts into a normal
 351    circuit. Each node also reserves a small portion of bandwidth for
 352    normal circuits to prevent starvation.
 353
 354    Pro: Even if a node has no value to spend, it can still use normal
 355    circuits. This allow casual user to use Tor without forcing them to run
 356    a server.
 357
 358    Pro: Nodes have incentive to forward traffic as quick and as much as
 359    possible to accumulate value.
 360
 361    Con: There is no proactive method for a node to rebalance its debt. It
 362    has to wait until there happens to be a circuit in the opposite
 363    direction.
 364
 365    Con: A node needs to build circuits in such a way that each node in the
 366    circuit has to have good values to the next node. This requires
 367    non-local knowledge and makes circuits less reliable as the values are
 368    used up in the circuit.
 369
 370    Con: May discourage nodes to forward traffic in some circuits, as they
 371    worry about spending more useful values to get less useful values in
 372    return.
 373
 374 4.2. Treat all the traffic from the node with the same service;
 375      hard reputation system.
 376
 377    This design is similar to 4.1, except that instead of having two
 378    classes of circuits, there is only one. All the circuits are
 379    prioritized based on the value of the interacting node.
 380
 381    Pro: It is simpler to design and give priority based on connections,
 382    not circuits.
 383
 384    Con: A node only needs to keep a few guard nodes happy to forward their
 385    traffic.
 386
 387    Con: Same as in 4.1, may discourage nodes to forward traffic in some
 388    circuits, as they worry about spending more useful values to get less
 389    useful values in return.
 390
 391 4.3. Treat all the traffic from the node with the same service;
 392      soft reputation system.
 393
 394    Rather than a guaranteed system with accounting (as 4.1 and 4.2),
 395    we instead try for a best-effort system. All bytes are in the same
 396    class of service. You keep track of other Tors by key, and give them
 397    service proportional to the service they have given you. That is, in
 398    the past when you have tried to push bytes through them, you track the
 399    number of bytes and the average bandwidth, and use that to weight the
 400    priority of their connections if they try to push bytes through you.
 401
 402    Now you're going to get minimum service if you don't ever push bytes
 403    for other people, and you get increasingly improved service the more
 404    active you are. We should have memories fade over time (we'll have
 405    to tune that, which could be quite hard).
 406
 407    Pro: Sybil attacks are pointless because new identities get lowest
 408    priority.
 409
 410    Pro: Smoothly handles periods of both low and high network load. Rather
 411    than keeping track of the ratio/difference between what he's done for
 412    you and what you've done for him, simply keep track of what he's done
 413    for you, and give him priority based on that.
 414
 415    Based on 3.3 above, it seems we should reward all the nodes in our
 416    path, not just the first one -- otherwise the node can provide good
 417    service only to its guards. On the other hand, there might be a
 418    second-order effect where you want nodes to like you so that *when*
 419    your guards choose you for a circuit, they'll be able to get good
 420    performance. This tradeoff needs more simulation/analysis.
 421
 422    This approach focuses on incenting people to relay traffic, but it
 423    doesn't do much for incenting them to allow exits. It may help in
 424    one way through: if there are few exits, then they will attract a
 425    lot of use, so lots of people will like them, so when they try to
 426    use the network they will find their first hop to be particularly
 427    pleasant. After that they're like the rest of the world though. (An
 428    alternative would be to reward exit nodes with higher values. At the
 429    extreme, we could even ask the directory servers to suggest the extra
 430    values, based on the current availability of exit nodes.)
 431
 432    Pro: this is a pretty easy design to add; and it can be phased in
 433    incrementally simply by having new nodes behave differently.
 434
 435 4.4. Centralized opinions from the reputation servers.
 436
 437    Have a set of official measurers who spot-check servers from the
 438    directory to see if they really do offer roughly the bandwidth
 439    they advertise. Include these observations in the directory. (For
 440    simplicity, the directory servers could be the measurers.) Then Tor
 441    servers give priority to other servers. We'd like to weight the
 442    priority by advertised bandwidth to encourage people to donate more,
 443    but it seems hard to distinguish between a slow server and a busy
 444    server.
 445
 446    The spot-checking can be done anonymously to prevent selectively
 447    performing only for the measurers, because hey, we have an anonymity
 448    network.
 449
 450    We could also reward exit nodes by giving them better priority, but
 451    like above this only will affect their first hop. Another problem
 452    is that it's darn hard to spot-check whether a server allows exits
 453    to all the pieces of the Internet that it claims to. If necessary,
 454    perhaps this can be solved by a distributed reporting mechanism,
 455    where clients that can reach a site from one exit but not another
 456    anonymously submit that site to the measurers, who verify.
 457
 458    A last problem is that since directory servers will be doing their
 459    tests directly (easy to detect) or indirectly (through other Tor
 460    servers), then we know that we can get away with poor performance for
 461    people that aren't listed in the directory. Maybe we can turn this
 462    around and call it a feature though -- another reason to get listed
 463    in the directory.
 464
 465 5. Recommendations and next steps.
 466
 467 5.1. Simulation.
 468
 469    For simulation trace, we can use two: one is what we obtained from Tor
 470    and one from existing web traces.
 471
 472    We want to simulate all the four cases in 4.1-4. For 4.4, we may want
 473    to look at two variations: (1) the directory servers check the
 474    bandwidth themselves through Tor; (2) each node reports their perceived
 475    values on other nodes, while the directory servers use EigenTrust to
 476    compute global reputation and broadcast those.
 477
 478 5.2. Deploying into existing Tor network.
 479