doc/tor-design.tex

   1
   2 \documentclass[twocolumn]{article}
   3 \usepackage{usenix}
   4
   5 %\documentclass[times,10pt,twocolumn]{article}
   6 %\usepackage{latex8}
   7 %\usepackage{times}
   8 \usepackage{url}
   9 \usepackage{graphics}
  10 \usepackage{amsmath}
  11 \usepackage{epsfig}
  12
  13 \pagestyle{plain}
  14
  15 \renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
  16 \newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
  17
  18 \newcommand{\workingnote}[1]{}        % The version that hides the note.
  19 %\newcommand{\workingnote}[1]{(**#1)}   % The version that makes the note visible.
  20
  21
  22 % If an URL ends up with '%'s in it, that's because the line *in the .bib/.tex
  23 % file* is too long, so break it there (it doesn't matter if the next line is
  24 % indented with spaces). -DH
  25
  26 %\newif\ifpdf
  27 %\ifx\pdfoutput\undefined
  28 %   \pdffalse
  29 %\else
  30 %   \pdfoutput=1
  31 %   \pdftrue
  32 %\fi
  33
  34 \newenvironment{tightlist}{\begin{list}{$\bullet$}{
  35   \setlength{\itemsep}{0mm}
  36     \setlength{\parsep}{0mm}
  37     %  \setlength{\labelsep}{0mm}
  38     %  \setlength{\labelwidth}{0mm}
  39     %  \setlength{\topsep}{0mm}
  40     }}{\end{list}}
  41
  42 % Cut down on whitespace above and below figures displayed at head/foot of
  43 % page.
  44 \setlength{\textfloatsep}{3mm}
  45 % Cut down on whitespace above and below figures displayed in middle of page
  46 \setlength{\intextsep}{3mm}
  47
  48 \begin{document}
  49
  50 %% Use dvipdfm instead. --DH
  51 %\ifpdf
  52 %  \pdfcompresslevel=9
  53 %  \pdfpagewidth=\the\paperwidth
  54 %  \pdfpageheight=\the\paperheight
  55 %\fi
  56
  57 \title{Tor: The Second-Generation Onion Router} %\\DRAFT VERSION}
  58 % Putting the 'Private' back in 'Virtual Private Network'
  59
  60 \author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
  61 Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
  62 Paul Syverson \\ Naval Research Lab \\ syverson@itd.nrl.navy.mil}
  63
  64 \maketitle
  65 \thispagestyle{empty}
  66
  67 \begin{abstract}
  68 We present Tor, a circuit-based low-latency anonymous communication
  69 service. This second-generation Onion Routing system addresses limitations
  70 in the original design. Tor adds perfect forward secrecy, congestion
  71 control, directory servers, integrity checking, configurable exit policies,
  72 and a practical design for rendezvous points. Tor works on the real-world
  73 Internet, requires no special privileges or kernel modifications, requires
  74 little synchronization or coordination between nodes, and provides a
  75 reasonable tradeoff between anonymity, usability, and efficiency.
  76 We briefly describe our experiences with an international network of
  77 more than a dozen hosts. % that has been running for several months.
  78 We close with a list of open problems in anonymous communication.
  79 \end{abstract}
  80
  81 %\begin{center}
  82 %\textbf{Keywords:} anonymity, peer-to-peer, remailer, nymserver, reply block
  83 %\end{center}
  84
  85 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  86
  87 \section{Overview}
  88 \label{sec:intro}
  89
  90 Onion Routing is a distributed overlay network designed to anonymize
  91 TCP-based applications like web browsing, secure shell,
  92 and instant messaging. Clients choose a path through the network and
  93 build a \emph{circuit}, in which each node (or ``onion router'' or ``OR'')
  94 in the path knows its predecessor and successor, but no other nodes in
  95 the circuit.  Traffic flows down the circuit in fixed-size
  96 \emph{cells}, which are unwrapped by a symmetric key at each node
  97 (like the layers of an onion) and relayed downstream. The
  98 Onion Routing project published several design and analysis
  99 papers \cite{or-ih96,or-jsac98,or-discex00,or-pet00}. While a wide area Onion
 100 Routing network was deployed briefly, the only long-running
 101 public implementation was a fragile
 102 proof-of-concept that ran on a single machine. Even this simple deployment
 103 processed connections from over sixty thousand distinct IP addresses from
 104 all over the world at a rate of about fifty thousand per day.
 105 But many critical design and deployment issues were never
 106 resolved, and the design has not been updated in several years. Here
 107 we describe Tor, a protocol for asynchronous, loosely federated onion
 108 routers that provides the following improvements over the old Onion
 109 Routing design:
 110
 111 \textbf{Perfect forward secrecy:} Onion Routing
 112 was originally vulnerable to a single hostile node recording traffic and
 113 later compromising successive nodes in the circuit and forcing them
 114 to decrypt it. Rather than using a single multiply encrypted data
 115 structure (an \emph{onion}) to lay each circuit,
 116 Tor now uses an incremental or \emph{telescoping} path-building design,
 117 where the initiator negotiates session keys with each successive hop in
 118 the circuit.  Once these keys are deleted, subsequently compromised nodes
 119 cannot decrypt old traffic.  As a side benefit, onion replay detection
 120 is no longer necessary, and the process of building circuits is more
 121 reliable, since the initiator knows when a hop fails and can then try
 122 extending to a new node.
 123
 124 \textbf{Separation of ``protocol cleaning'' from anonymity:}
 125 Onion Routing originally required a separate ``application
 126 proxy'' for each supported application protocol---most of which were
 127 never written, so many applications were never supported.  Tor uses the
 128 standard and near-ubiquitous SOCKS~\cite{socks4} proxy interface, allowing
 129 us to support most TCP-based programs without modification.  Tor now
 130 relies on the filtering features of privacy-enhancing
 131 application-level proxies such as Privoxy~\cite{privoxy}, without trying
 132 to duplicate those features itself.
 133
 134 \textbf{No mixing, padding, or traffic shaping (yet):} Onion
 135 Routing originally called for batching and reordering cells as they arrived,
 136 assumed padding between ORs, and in
 137 later designs added padding between onion proxies (users) and
 138 ORs~\cite{or-ih96,or-jsac98}.  Tradeoffs between padding protection
 139 and cost were discussed, and \emph{traffic shaping} algorithms were
 140 theorized~\cite{or-pet00} to provide good security without expensive
 141 padding, but no concrete padding scheme was suggested.
 142 Recent research~\cite{econymics}
 143 and deployment experience~\cite{freedom21-security} suggest that this
 144 level of resource use is not practical or economical; and even full
 145 link padding is still vulnerable~\cite{defensive-dropping}. Thus,
 146 until we have a proven and convenient design for traffic shaping or
 147 low-latency mixing that improves anonymity against a realistic
 148 adversary, we leave these strategies out.
 149
 150 \textbf{Many TCP streams can share one circuit:} Onion Routing originally
 151 built a separate circuit for each
 152 application-level request, but this required
 153 multiple public key operations for every request, and also presented
 154 a threat to anonymity from building so many circuits; see
 155 Section~\ref{sec:maintaining-anonymity}.  Tor multiplexes multiple TCP
 156 streams along each circuit to improve efficiency and anonymity.
 157
 158 \textbf{Leaky-pipe circuit topology:} Through in-band signaling
 159 within the circuit, Tor initiators can direct traffic to nodes partway
 160 down the circuit. This novel approach
 161 allows traffic to exit the circuit from the middle---possibly
 162 frustrating traffic shape and volume attacks based on observing the end
 163 of the circuit. (It also allows for long-range padding if
 164 future research shows this to be worthwhile.)
 165
 166 \textbf{Congestion control:} Earlier anonymity designs do not
 167 address traffic bottlenecks. Unfortunately, typical approaches to
 168 load balancing and flow control in overlay networks involve inter-node
 169 control communication and global views of traffic. Tor's decentralized
 170 congestion control uses end-to-end acks to maintain anonymity
 171 while allowing nodes at the edges of the network to detect congestion
 172 or flooding and send less data until the congestion subsides.
 173
 174 \textbf{Directory servers:} The earlier Onion Routing design
 175 planned to flood state information through the network---an approach
 176 that can be unreliable and complex. % open to partitioning attacks.
 177 Tor takes a simplified view toward distributing this
 178 information. Certain more trusted nodes act as \emph{directory
 179 servers}: they provide signed directories describing known
 180 routers and their current state. Users periodically download them
 181 via HTTP.
 182
 183 \textbf{Variable exit policies:} Tor provides a consistent mechanism
 184 for each node to advertise a policy describing the hosts
 185 and ports to which it will connect. These exit policies are critical
 186 in a volunteer-based distributed infrastructure, because each operator
 187 is comfortable with allowing different types of traffic to exit the Tor
 188 network from his node.
 189
 190 \textbf{End-to-end integrity checking:} The original Onion Routing
 191 design did no integrity checking on data. Any node on the
 192 circuit could change the contents of data cells as they passed by---for
 193 example, to alter a connection request so it would connect
 194 to a different webserver, or to `tag' encrypted traffic and look for
 195 corresponding corrupted traffic at the network edges~\cite{minion-design}.
 196 Tor hampers these attacks by verifying data integrity before it leaves
 197 the network.
 198
 199 %\textbf{Improved robustness to failed nodes:} A failed node
 200 %in the old design meant that circuit building failed, but thanks to
 201 %Tor's step-by-step circuit building, users notice failed nodes
 202 %while building circuits and route around them. Additionally, liveness
 203 %information from directories allows users to avoid unreliable nodes in
 204 %the first place.
 205 %% Can't really claim this, now that we've found so many variants of
 206 %% attack on partial-circuit-building. -RD
 207
 208 \textbf{Rendezvous points and hidden services:}
 209 Tor provides an integrated mechanism for responder anonymity via
 210 location-protected servers.  Previous Onion Routing designs included
 211 long-lived ``reply onions'' that could be used to build circuits
 212 to a hidden server, but these reply onions did not provide forward
 213 security, and became useless if any node in the path went down
 214 or rotated its keys.  In Tor, clients negotiate {\it rendezvous points}
 215 to connect with hidden servers; reply onions are no longer required.
 216
 217 Unlike Freedom~\cite{freedom2-arch}, Tor does not require OS kernel
 218 patches or network stack support.  This prevents us from anonymizing
 219 non-TCP protocols, but has greatly helped our portability and
 220 deployability.
 221
 222 %Unlike Freedom~\cite{freedom2-arch}, Tor only anonymizes
 223 %TCP-based protocols---not requiring patches (or built-in support) in an
 224 %operating system's network stack has been valuable to Tor's
 225 %portability and deployability.
 226
 227 We have implemented all of the above features except rendezvous
 228 points. Our source code is
 229 available under a free license, and Tor
 230 %, as far as we know, is unencumbered by patents.
 231 is not covered by the patent that affected distribution and use of
 232 earlier versions of Onion Routing.
 233 We have deployed a wide-area alpha network
 234 to test the design, to get more experience with usability
 235 and users, and to provide a research platform for experimentation.
 236 As of this writing, the network stands at eighteen nodes in thirteen
 237 distinct administrative domains on two continents.
 238
 239 We review previous work in Section~\ref{sec:related-work}, describe
 240 our goals and assumptions in Section~\ref{sec:assumptions},
 241 and then address the above list of improvements in
 242 Sections~\ref{sec:design} and~\ref{sec:other-design}. We summarize
 243 in Section~\ref{sec:attacks} how our design stands up to
 244 known attacks, and talk about our early deployment experiences in
 245 Section~\ref{sec:in-the-wild}. We conclude with a list of open problems in
 246 Section~\ref{sec:maintaining-anonymity} and future work for the Onion
 247 Routing project in Section~\ref{sec:conclusion}.
 248
 249 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 250
 251 \section{Related work}
 252 \label{sec:related-work}
 253
 254 Modern anonymity systems date to Chaum's {\bf Mix-Net}
 255 design~\cite{chaum-mix}. Chaum
 256 proposed hiding the correspondence between sender and recipient by
 257 wrapping messages in layers of public-key cryptography, and relaying them
 258 through a path composed of ``mixes.''  Each mix in turn
 259 decrypts, delays, and re-orders messages, before relaying them
 260 onward.
 261 %toward their destinations.
 262
 263 Subsequent relay-based anonymity designs have diverged in two
 264 main directions. Systems like {\bf Babel}~\cite{babel},
 265 {\bf Mixmaster}~\cite{mixmaster-spec},
 266 and {\bf Mixminion}~\cite{minion-design} have tried
 267 to maximize anonymity at the cost of introducing comparatively large and
 268 variable latencies. Because of this decision, these \emph{high-latency}
 269 networks resist strong global adversaries,
 270 but introduce too much lag for interactive tasks like web browsing,
 271 Internet chat, or SSH connections.
 272
 273 Tor belongs to the second category: \emph{low-latency} designs that
 274 try to anonymize interactive network traffic. These systems handle
 275 a variety of bidirectional protocols. They also provide more convenient
 276 mail delivery than the high-latency anonymous email
 277 networks, because the remote mail server provides explicit and timely
 278 delivery confirmation. But because these designs typically
 279 involve many packets that must be delivered quickly, it is
 280 difficult for them to prevent an attacker who can eavesdrop both ends of the
 281 communication from correlating the timing and volume
 282 of traffic entering the anonymity network with traffic leaving it \cite{SS03}.
 283 These
 284 protocols are similarly vulnerable to an active adversary who introduces
 285 timing patterns into traffic entering the network and looks
 286 for correlated patterns among exiting traffic.
 287 Although some work has been done to frustrate
 288 these attacks, %\footnote{
 289 %  The most common approach is to pad and limit communication to a constant
 290 %  rate, or to limit
 291 %  the variation in traffic shape.  Doing so can have prohibitive bandwidth
 292 %  costs and/or performance limitations.
 293 %}
 294 % Point in the footnote is covered above, yes? -PS
 295  most designs protect primarily against traffic analysis rather than traffic
 296 confirmation (see Section~\ref{subsec:threat-model}).
 297
 298 The simplest low-latency designs are single-hop proxies such as the
 299 {\bf Anonymizer}~\cite{anonymizer}: a single trusted server strips the
 300 data's origin before relaying it.  These designs are easy to
 301 analyze, but users must trust the anonymizing proxy.
 302 Concentrating the traffic to this single point increases the anonymity set
 303 (the people a given user is hiding among), but it is vulnerable if the
 304 adversary can observe all traffic going into and out of the proxy.
 305
 306 More complex are distributed-trust, circuit-based anonymizing systems.
 307 In these designs, a user establishes one or more medium-term bidirectional
 308 end-to-end circuits, and tunnels data in fixed-size cells.
 309 Establishing circuits is computationally expensive and typically
 310 requires public-key
 311 cryptography, whereas relaying cells is comparatively inexpensive and
 312 typically requires only symmetric encryption.
 313 Because a circuit crosses several servers, and each server only knows
 314 the adjacent servers in the circuit, no single server can link a
 315 user to her communication partners.
 316
 317 The {\bf Java Anon Proxy} (also known as JAP or Web MIXes) uses fixed shared
 318 routes known as \emph{cascades}.  As with a single-hop proxy, this
 319 approach aggregates users into larger anonymity sets, but again an
 320 attacker only needs to observe both ends of the cascade to bridge all
 321 the system's traffic.  The Java Anon Proxy's design
 322 calls for padding between end users and the head of the
 323 cascade~\cite{web-mix}. However, it is not demonstrated whether the current
 324 implementation's padding policy improves anonymity.
 325
 326 {\bf PipeNet}~\cite{back01, pipenet}, another low-latency design proposed
 327 around the same time as Onion Routing, gave
 328 stronger anonymity but allowed a single user to shut
 329 down the network by not sending. Systems like {\bf ISDN
 330 mixes}~\cite{isdn-mixes} were designed for other environments with
 331 different assumptions.
 332 %XXX please can we fix this sentence to something less demeaning
 333
 334 In P2P designs like {\bf Tarzan}~\cite{tarzan:ccs02} and
 335 {\bf MorphMix}~\cite{morphmix:fc04}, all participants both generate
 336 traffic and relay traffic for others. These systems aim to conceal
 337 whether a given peer originated a request
 338 or just relayed it from another peer. While Tarzan and MorphMix use
 339 layered encryption as above, {\bf Crowds}~\cite{crowds-tissec} simply assumes
 340 an adversary who cannot observe the initiator: it uses no public-key
 341 encryption, so any node on a circuit can read the circuit's traffic.
 342
 343 {\bf Hordes}~\cite{hordes-jcs} is based on Crowds but also uses multicast
 344 responses to hide the initiator. {\bf Herbivore}~\cite{herbivore} and
 345 $\mbox{\bf P}^{\mathbf 5}$~\cite{p5} go even further, requiring broadcast.
 346 These systems are designed primarily for communication between peers,
 347 although Herbivore users can make external connections by
 348 requesting a peer to serve as a proxy.
 349
 350 Systems like {\bf Freedom} and the original Onion Routing build circuits
 351 all at once, using a layered ``onion'' of public-key encrypted messages,
 352 each layer of which provides session keys and the address of the
 353 next server in the circuit. Tor as described herein, Tarzan, MorphMix,
 354 {\bf Cebolla}~\cite{cebolla}, and Rennhard's {\bf Anonymity Network}~\cite{anonnet}
 355 build circuits
 356 in stages, extending them one hop at a time.
 357 Section~\ref{subsubsec:constructing-a-circuit} describes how this
 358 approach enables perfect forward secrecy.
 359
 360 Circuit-based anonymity designs must choose which protocol layer
 361 to anonymize. They may choose to intercept IP packets directly, and
 362 relay them whole (stripping the source address) along the
 363 circuit~\cite{freedom2-arch,tarzan:ccs02}.  Alternatively, like
 364 Tor, they may accept TCP streams and relay the data in those streams
 365 along the circuit, ignoring the breakdown of that data into TCP
 366 segments~\cite{morphmix:fc04,anonnet}. Finally, they may accept
 367 application-level protocols (such as HTTP) and relay the application
 368 requests themselves along the circuit.
 369 Making this protocol-layer decision requires a compromise between flexibility
 370 and anonymity.  For example, a system that understands HTTP, such as Crowds,
 371 can strip
 372 identifying information from those requests, can take advantage of caching
 373 to limit the number of requests that leave the network, and can batch
 374 or encode those requests to minimize the number of connections.
 375 On the other hand, an IP-level anonymizer can handle nearly any protocol,
 376 even ones unforeseen by its designers (though these systems require
 377 kernel-level modifications to some operating systems, and so are more
 378 complex and less portable). TCP-level anonymity networks like Tor present
 379 a middle approach: they are fairly application neutral (so long as the
 380 application supports, or can be tunneled across, TCP), but by treating
 381 application connections as data streams rather than raw TCP packets,
 382 they avoid the well-known inefficiencies of tunneling TCP over
 383 TCP~\cite{tcp-over-tcp-is-bad}.
 384
 385 Distributed-trust anonymizing systems need to prevent attackers from
 386 adding too many servers and thus compromising user paths.
 387 Tor relies on a small set of well-known directory servers, run by
 388 independent parties, to decide which nodes can
 389 join. Tarzan and MorphMix allow unknown users to run servers, and use
 390 a limited resource (like IP addresses) to prevent an attacker from
 391 controlling too much of the network.  Crowds suggests requiring
 392 written, notarized requests from potential crowd members.
 393
 394 Anonymous communication is essential for censorship-resistant
 395 systems like Eternity~\cite{eternity}, Free~Haven~\cite{freehaven-berk},
 396 Publius~\cite{publius}, and Tangler~\cite{tangler}. Tor's rendezvous
 397 points enable connections between mutually anonymous entities; they
 398 are a building block for location-hidden servers, which are needed by
 399 Eternity and Free~Haven.
 400
 401 % didn't include rewebbers. No clear place to put them, so I'll leave
 402 % them out for now. -RD
 403
 404 \section{Design goals and assumptions}
 405 \label{sec:assumptions}
 406
 407 \noindent{\large\bf Goals}\\
 408 Like other low-latency anonymity designs, Tor seeks to frustrate
 409 attackers from linking communication partners, or from linking
 410 multiple communications to or from a single user.  Within this
 411 main goal, however, several considerations have directed
 412 Tor's evolution.
 413
 414 \textbf{Deployability:} The design must be deployed and used in the
 415 real world.  Thus it
 416 must not be expensive to run (for example, by requiring more bandwidth
 417 than volunteers are willing to provide); must not place a heavy
 418 liability burden on operators (for example, by allowing attackers to
 419 implicate onion routers in illegal activities); and must not be
 420 difficult or expensive to implement (for example, by requiring kernel
 421 patches, or separate proxies for every protocol).  We also cannot
 422 require non-anonymous parties (such as websites)
 423 to run our software.  (Our rendezvous point design does not meet
 424 this goal for non-anonymous users talking to hidden servers,
 425 however; see Section~\ref{subsec:rendezvous}.)
 426
 427 \textbf{Usability:} A hard-to-use system has fewer users---and because
 428 anonymity systems hide users among users, a system with fewer users
 429 provides less anonymity.  Usability is thus not only a convenience:
 430 it is a security requirement~\cite{econymics,back01}. Tor should
 431 therefore not
 432 require modifying applications; should not introduce prohibitive delays;
 433 and should require users to make as few configuration decisions
 434 as possible.  Finally, Tor should be easily implemented on all common
 435 platforms; we cannot require users to change their operating system
 436 to be anonymous.  (The current Tor implementation runs on Windows and
 437 assorted Unix clones including Linux, FreeBSD, and MacOS X.)
 438
 439 \textbf{Flexibility:} The protocol must be flexible and well-specified,
 440 so Tor can serve as a test-bed for future research.
 441 Many of the open problems in low-latency anonymity
 442 networks, such as generating dummy traffic or preventing Sybil
 443 attacks~\cite{sybil}, may be solvable independently from the issues
 444 solved by
 445 Tor. Hopefully future systems will not need to reinvent Tor's design.
 446 %(But note that while a flexible design benefits researchers,
 447 %there is a danger that differing choices of extensions will make users
 448 %distinguishable. Experiments should be run on a separate network.)
 449
 450 \textbf{Simple design:} The protocol's design and security
 451 parameters must be well-understood. Additional features impose implementation
 452 and complexity costs; adding unproven techniques to the design threatens
 453 deployability, readability, and ease of security analysis. Tor aims to
 454 deploy a simple and stable system that integrates the best accepted
 455 approaches to protecting anonymity.\\
 456
 457 \noindent{\large\bf Non-goals}\label{subsec:non-goals}\\
 458 In favoring simple, deployable designs, we have explicitly deferred
 459 several possible goals, either because they are solved elsewhere, or because
 460 they are not yet solved.
 461
 462 \textbf{Not peer-to-peer:} Tarzan and MorphMix aim to scale to completely
 463 decentralized peer-to-peer environments with thousands of short-lived
 464 servers, many of which may be controlled by an adversary.  This approach
 465 is appealing, but still has many open
 466 problems~\cite{tarzan:ccs02,morphmix:fc04}.
 467
 468 \textbf{Not secure against end-to-end attacks:} Tor does not claim
 469 to provide a definitive solution to end-to-end timing or intersection
 470 attacks. Some approaches, such as having users run their own onion routers,
 471 may help;
 472 see Section~\ref{sec:maintaining-anonymity} for more discussion.
 473
 474 \textbf{No protocol normalization:} Tor does not provide \emph{protocol
 475 normalization} like Privoxy or the Anonymizer. If senders want anonymity from
 476 responders while using complex and variable
 477 protocols like HTTP, Tor must be layered with a filtering proxy such
 478 as Privoxy to hide differences between clients, and expunge protocol
 479 features that leak identity.
 480 Note that by this separation Tor can also provide services that
 481 are anonymous to the network yet authenticated to the responder, like
 482 SSH. Similarly, Tor does not integrate
 483 tunneling for non-stream-based protocols like UDP; this must be
 484 provided by an external service if appropriate.
 485
 486 \textbf{Not steganographic:} Tor does not try to conceal who is connected
 487 to the network.
 488
 489 \subsection{Threat Model}
 490 \label{subsec:threat-model}
 491
 492 A global passive adversary is the most commonly assumed threat when
 493 analyzing theoretical anonymity designs. But like all practical
 494 low-latency systems, Tor does not protect against such a strong
 495 adversary. Instead, we assume an adversary who can observe some fraction
 496 of network traffic; who can generate, modify, delete, or delay
 497 traffic;  who can operate onion routers of his own; and who can
 498 compromise some fraction of the onion routers.
 499
 500 In low-latency anonymity systems that use layered encryption, the
 501 adversary's typical goal is to observe both the initiator and the
 502 responder. By observing both ends, passive attackers can confirm a
 503 suspicion that Alice is
 504 talking to Bob if the timing and volume patterns of the traffic on the
 505 connection are distinct enough; active attackers can induce timing
 506 signatures on the traffic to force distinct patterns. Rather
 507 than focusing on these \emph{traffic confirmation} attacks,
 508 we aim to prevent \emph{traffic
 509 analysis} attacks, where the adversary uses traffic patterns to learn
 510 which points in the network he should attack.
 511
 512 Our adversary might try to link an initiator Alice with her
 513 communication partners, or try to build a profile of Alice's
 514 behavior. He might mount passive attacks by observing the network edges
 515 and correlating traffic entering and leaving the network---by
 516 relationships in packet timing, volume, or externally visible
 517 user-selected
 518 options. The adversary can also mount active attacks by compromising
 519 routers or keys; by replaying traffic; by selectively denying service
 520 to trustworthy routers to move users to
 521 compromised routers, or denying service to users to see if traffic
 522 elsewhere in the
 523 network stops; or by introducing patterns into traffic that can later be
 524 detected. The adversary might subvert the directory servers to give users
 525 differing views of network state. Additionally, he can try to decrease
 526 the network's reliability by attacking nodes or by performing antisocial
 527 activities from reliable nodes and trying to get them taken down---making
 528 the network unreliable flushes users to other less anonymous
 529 systems, where they may be easier to attack. We summarize
 530 in Section~\ref{sec:attacks} how well the Tor design defends against
 531 each of these attacks.
 532
 533 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 534
 535 \section{The Tor Design}
 536 \label{sec:design}
 537
 538 The Tor network is an overlay network; each onion router (OR)
 539 runs as a normal
 540 user-level process without any special privileges.
 541 Each onion router maintains a TLS~\cite{TLS}
 542 connection to every other onion router.
 543 %(We discuss alternatives to this clique-topology assumption in
 544 %Section~\ref{sec:maintaining-anonymity}.)
 545 % A subset of the ORs also act as
 546 %directory servers, tracking which routers are in the network;
 547 %see Section~\ref{subsec:dirservers} for directory server details.
 548 Each user
 549 runs local software called an onion proxy (OP) to fetch directories,
 550 establish circuits across the network,
 551 and handle connections from user applications.  These onion proxies accept
 552 TCP streams and multiplex them across the circuits. The onion
 553 router on the other side
 554 of the circuit connects to the destinations of
 555 the TCP streams and relays data.
 556
 557 Each onion router maintains a long-term identity key and a short-term
 558 onion key. The identity
 559 key is used to sign TLS certificates, to sign the OR's \emph{router
 560 descriptor} (a summary of its keys, address, bandwidth, exit policy,
 561 and so on), and (by directory servers) to sign directories. %Changing
 562 %the identity key of a router is considered equivalent to creating a
 563 %new router.
 564 The onion key is used to decrypt requests
 565 from users to set up a circuit and negotiate ephemeral keys.
 566 The TLS protocol also establishes a short-term link key when communicating
 567 between ORs. Short-term keys are rotated periodically and
 568 independently, to limit the impact of key compromise.
 569
 570 Section~\ref{subsec:cells} presents the fixed-size
 571 \emph{cells} that are the unit of communication in Tor. We describe
 572 in Section~\ref{subsec:circuits} how circuits are
 573 built, extended, truncated, and destroyed. Section~\ref{subsec:tcp}
 574 describes how TCP streams are routed through the network.  We address
 575 integrity checking in Section~\ref{subsec:integrity-checking},
 576 and resource limiting in Section~\ref{subsec:rate-limit}.
 577 Finally,
 578 Section~\ref{subsec:congestion} talks about congestion control and
 579 fairness issues.
 580
 581 \subsection{Cells}
 582 \label{subsec:cells}
 583
 584 Onion routers communicate with one another, and with users' OPs, via
 585 TLS connections with ephemeral keys.  Using TLS conceals the data on
 586 the connection with perfect forward secrecy, and prevents an attacker
 587 from modifying data on the wire or impersonating an OR.
 588
 589 Traffic passes along these connections in fixed-size cells.  Each cell
 590 is 512 bytes, %(but see Section~\ref{sec:conclusion} for a discussion of
 591 %allowing large cells and small cells on the same network),
 592 and consists of a header and a payload. The header includes a circuit
 593 identifier (circID) that specifies which circuit the cell refers to
 594 (many circuits can be multiplexed over the single TLS connection), and
 595 a command to describe what to do with the cell's payload.  (Circuit
 596 identifiers are connection-specific: each single circuit has a different
 597 circID on each OP/OR or OR/OR connection it traverses.)
 598 Based on their command, cells are either \emph{control} cells, which are
 599 always interpreted by the node that receives them, or \emph{relay} cells,
 600 which carry end-to-end stream data.   The control cell commands are:
 601 \emph{padding} (currently used for keepalive, but also usable for link
 602 padding); \emph{create} or \emph{created} (used to set up a new circuit);
 603 and \emph{destroy} (to tear down a circuit).
 604
 605 Relay cells have an additional header (the relay header) after the
 606 cell header, containing a streamID (stream identifier: many streams can
 607 be multiplexed over a circuit); an end-to-end checksum for integrity
 608 checking; the length of the relay payload; and a relay command.
 609 The entire contents of the relay header and the relay cell payload
 610 are encrypted or decrypted together as the relay cell moves along the
 611 circuit, using the 128-bit AES cipher in counter mode to generate a
 612 cipher stream.  The relay commands are: \emph{relay
 613 data} (for data flowing down the stream), \emph{relay begin} (to open a
 614 stream), \emph{relay end} (to close a stream cleanly), \emph{relay
 615 teardown} (to close a broken stream), \emph{relay connected}
 616 (to notify the OP that a relay begin has succeeded), \emph{relay
 617 extend} and \emph{relay extended} (to extend the circuit by a hop,
 618 and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
 619 (to tear down only part of the circuit, and to acknowledge), \emph{relay
 620 sendme} (used for congestion control), and \emph{relay drop} (used to
 621 implement long-range dummies).
 622 We give a visual overview of cell structure plus the details of relay
 623 cell structure, and then describe each of these cell types and commands
 624 in more detail below.
 625
 626 \begin{figure}[h]
 627 \unitlength=1cm
 628 \centering
 629 \begin{picture}(8.0,1.5)
 630 \put(4,.5){\makebox(0,0)[c]{\epsfig{file=cell-struct,width=7cm}}}
 631 \end{picture}
 632 \end{figure}
 633
 634 \subsection{Circuits and streams}
 635 \label{subsec:circuits}
 636
 637 Onion Routing originally built one circuit for each
 638 TCP stream.  Because building a circuit can take several tenths of a
 639 second (due to public-key cryptography and network latency),
 640 this design imposed high costs on applications like web browsing that
 641 open many TCP streams.
 642
 643 In Tor, each circuit can be shared by many TCP streams.  To avoid
 644 delays, users construct circuits preemptively.  To limit linkability
 645 among their streams, users' OPs build a new circuit
 646 periodically if the previous one has been used,
 647 and expire old used circuits that no longer have any open streams.
 648 OPs consider making a new circuit once a minute: thus
 649 even heavy users spend negligible time
 650 building circuits, but a limited number of requests can be linked
 651 to each other through a given exit node. Also, because circuits are built
 652 in the background, OPs can recover from failed circuit creation
 653 without delaying streams and thereby harming user experience.\\
 654
 655 \begin{figure}[h]
 656 \centering
 657 \mbox{\epsfig{figure=interaction,width=8.75cm}}
 658 \caption{Alice builds a two-hop circuit and begins fetching a web page.}
 659 \label{fig:interaction}
 660 \end{figure}
 661
 662 \noindent{\large\bf Constructing a circuit}\label{subsubsec:constructing-a-circuit}\\
 663 %\subsubsection{Constructing a circuit}
 664 A user's OP constructs circuits incrementally, negotiating a
 665 symmetric key with each OR on the circuit, one hop at a time. To begin
 666 creating a new circuit, the OP (call her Alice) sends a
 667 \emph{create} cell to the first node in her chosen path (call him Bob).
 668 (She chooses a new
 669 circID $C_{AB}$ not currently used on the connection from her to Bob.)
 670 The \emph{create} cell's
 671 payload contains the first half of the Diffie-Hellman handshake
 672 ($g^x$), encrypted to the onion key of the OR (call him Bob). Bob
 673 responds with a \emph{created} cell containing the second half of the
 674 DH handshake, along with a hash of the negotiated key $K=g^{xy}$.
 675
 676 Once the circuit has been established, Alice and Bob can send one
 677 another relay cells encrypted with the negotiated
 678 key.\footnote{Actually, the negotiated key is used to derive two
 679   symmetric keys: one for each direction.}  More detail is given in
 680 the next section.
 681
 682 To extend the circuit further, Alice sends a \emph{relay extend} cell
 683 to Bob, specifying the address of the next OR (call her Carol), and
 684 an encrypted $g^{x_2}$ for her.  Bob copies the half-handshake into a
 685 \emph{create} cell, and passes it to Carol to extend the circuit.
 686 (Bob chooses a new circID $C_{BC}$ not currently used on the connection
 687 between him and Carol.  Alice never needs to know this circID; only Bob
 688 associates $C_{AB}$ on his connection with Alice to $C_{BC}$ on
 689 his connection with Carol.)
 690 When Carol responds with a \emph{created} cell, Bob wraps the payload
 691 into a \emph{relay extended} cell and passes it back to Alice.  Now
 692 the circuit is extended to Carol, and Alice and Carol share a common key
 693 $K_2 = g^{x_2 y_2}$.
 694
 695 To extend the circuit to a third node or beyond, Alice
 696 proceeds as above, always telling the last node in the circuit to
 697 extend one hop further.
 698
 699 This circuit-level handshake protocol achieves unilateral entity
 700 authentication (Alice knows she's handshaking with the OR, but
 701 the OR doesn't care who is opening the circuit---Alice uses no public key
 702 and is trying to remain anonymous) and unilateral key authentication
 703 (Alice and the OR agree on a key, and Alice knows only the OR learns
 704 it). It also achieves forward
 705 secrecy and key freshness. More formally, the protocol is as follows
 706 (where $E_{PK_{Bob}}(\cdot)$ is encryption with Bob's public key,
 707 $H$ is a secure hash function, and $|$ is concatenation):
 708 \begin{equation*}
 709 \begin{aligned}
 710 \mathrm{Alice} \rightarrow \mathrm{Bob}&: E_{PK_{Bob}}(g^x) \\
 711 \mathrm{Bob} \rightarrow \mathrm{Alice}&: g^y, H(K | \mathrm{``handshake"}) \\
 712 \end{aligned}
 713 \end{equation*}
 714
 715 \noindent In the second step, Bob proves that it was he who received $g^x$,
 716 and who chose $y$. We use PK encryption in the first step
 717 (rather than, say, using the first two steps of STS, which has a
 718 signature in the second step) because a single cell is too small to
 719 hold both a public key and a signature. Preliminary analysis with the
 720 NRL protocol analyzer~\cite{meadows96} shows this protocol to be
 721 secure (including perfect forward secrecy) under the
 722 traditional Dolev-Yao model.\\
 723
 724 \noindent{\large\bf Relay cells}\\
 725 %\subsubsection{Relay cells}
 726 %
 727 Once Alice has established the circuit (so she shares keys with each
 728 OR on the circuit), she can send relay cells.  Recall that every relay
 729 cell has a streamID that indicates to which
 730 stream the cell belongs.  %This streamID allows a relay cell to be
 731 %addressed to any OR on the circuit.
 732 Upon receiving a relay
 733 cell, an OR looks up the corresponding circuit, and decrypts the relay
 734 header and payload with the session key for that circuit.
 735 If the cell is headed away from Alice the OR then checks
 736 whether the decrypted streamID is recognized---either because it
 737 corresponds to an open stream at this OR for the given circuit, or because
 738 it is the control streamID (zero).  If the OR recognizes the
 739 streamID, it accepts the relay cell and processes it as described
 740 below.  Otherwise,
 741 the OR looks up the circID and OR for the
 742 next step in the circuit, replaces the circID as appropriate, and
 743 sends the decrypted relay cell to the next OR.  (If the OR at the end
 744 of the circuit receives an unrecognized relay cell, an error has
 745 occurred, and the cell is discarded.)
 746
 747 OPs treat incoming relay cells similarly: they iteratively unwrap the
 748 relay header and payload with the session keys shared with each
 749 OR on the circuit, from the closest to farthest. % (Because we use a
 750 %stream cipher, encryption operations may be inverted in any order.)
 751 If at any stage the OP recognizes the streamID, the cell must have
 752 originated at the OR whose encryption has just been removed.
 753
 754 To construct a relay cell addressed to a given OR, Alice iteratively
 755 encrypts the cell payload (that is, the relay header and payload) with
 756 the symmetric key of each hop up to that OR.  Because the streamID is
 757 encrypted to a different value at each step, only at the targeted OR
 758 will it have a meaningful value.\footnote{
 759   % Should we just say that 2^56 is itself negligible?
 760   % Assuming 4-hop circuits with 10 streams per hop, there are 33
 761   % possible bad streamIDs before the last circuit.  This still
 762   % gives an error only once every 2 million terabytes (approx).
 763 With 48 bits of streamID per cell, the probability of an accidental
 764 collision is far lower than the chance of hardware failure.}
 765 This \emph{leaky pipe} circuit topology
 766 allows Alice's streams to exit at different ORs on a single circuit.
 767 Alice may choose different exit points because of their exit policies,
 768 or to keep the ORs from knowing that two streams
 769 originate from the same person.
 770
 771 When an OR later replies to Alice with a relay cell, it
 772 encrypts the cell's relay header and payload with the single key it
 773 shares with Alice, and sends the cell back toward Alice along the
 774 circuit.  Subsequent ORs add further layers of encryption as they
 775 relay the cell back to Alice.
 776
 777 To tear down a circuit, Alice sends a \emph{destroy} control
 778 cell. Each OR in the circuit receives the \emph{destroy} cell, closes
 779 all streams on that circuit, and passes a new \emph{destroy} cell
 780 forward. But just as circuits are built incrementally, they can also
 781 be torn down incrementally: Alice can send a \emph{relay
 782 truncate} cell to a single OR on a circuit. That OR then sends a
 783 \emph{destroy} cell forward, and acknowledges with a
 784 \emph{relay truncated} cell. Alice can then extend the circuit to
 785 different nodes, without signaling to the intermediate nodes (or
 786 a limited observer) that she has changed her circuit.
 787 Similarly, if a node on the circuit goes down, the adjacent
 788 node can send a \emph{relay truncated} cell back to Alice.  Thus the
 789 ``break a node and see which circuits go down''
 790 attack~\cite{freedom21-security} is weakened.
 791
 792 \subsection{Opening and closing streams}
 793 \label{subsec:tcp}
 794
 795 When Alice's application wants a TCP connection to a given
 796 address and port, it asks the OP (via SOCKS) to make the
 797 connection. The OP chooses the newest open circuit (or creates one if
 798 none is available), and chooses a suitable OR on that circuit to be the
 799 exit node (usually the last node, but maybe others due to exit policy
 800 conflicts; see Section~\ref{subsec:exitpolicies}.) The OP then opens
 801 the stream by sending a \emph{relay begin} cell to the exit node,
 802 using a streamID of zero (so the OR will recognize it), containing as
 803 its relay payload a new random streamID, the destination
 804 address, and the destination port.  Once the
 805 exit node connects to the remote host, it responds
 806 with a \emph{relay connected} cell.  Upon receipt, the OP sends a
 807 SOCKS reply to notify the application of its success. The OP
 808 now accepts data from the application's TCP stream, packaging it into
 809 \emph{relay data} cells and sending those cells along the circuit to
 810 the chosen OR.
 811
 812 There's a catch to using SOCKS, however---some applications pass the
 813 alphanumeric hostname to the Tor client, while others resolve it into
 814 an IP address first and then pass the IP address to the Tor client. If
 815 the application does DNS resolution first, Alice thereby reveals her
 816 destination to the remote DNS server, rather than sending the hostname
 817 through the Tor network to be resolved at the far end. Common applications
 818 like Mozilla and SSH have this flaw.
 819
 820 With Mozilla, the flaw is easy to address: the filtering HTTP
 821 proxy called Privoxy gives a hostname to the Tor client, so Alice's
 822 computer never does DNS resolution.
 823 But a portable general solution, such as is needed for
 824 SSH, is
 825 an open problem. Modifying or replacing the local nameserver
 826 can be invasive, brittle, and unportable. Forcing the resolver
 827 library to do resolution via TCP rather than UDP is
 828 hard, and also has portability problems. We could also provide a
 829 tool similar to \emph{dig} to perform a private lookup through the
 830 Tor network. Currently, we encourage the use of
 831 privacy-aware proxies like Privoxy wherever possible.
 832
 833 Closing a Tor stream is analogous to closing a TCP stream: it uses a
 834 two-step handshake for normal operation, or a one-step handshake for
 835 errors. If the stream closes abnormally, the adjacent node simply sends a
 836 \emph{relay teardown} cell. If the stream closes normally, the node sends
 837 a \emph{relay end} cell down the circuit. When the other side has sent
 838 back its own \emph{relay end} cell, the stream can be torn down.  Because
 839 all relay cells use layered encryption, only the destination OR knows
 840 that a given relay cell is a request to close a stream.  This two-step
 841 handshake allows Tor to support TCP-based applications that use half-closed
 842 connections.
 843 % such as broken HTTP clients that close their side of the
 844 %stream after writing but are still willing to read.
 845
 846 \subsection{Integrity checking on streams}
 847 \label{subsec:integrity-checking}
 848
 849 Because the old Onion Routing design used a stream cipher without integrity
 850 checking, traffic was
 851 vulnerable to a malleability attack: though the attacker could not
 852 decrypt cells, any changes to encrypted data
 853 would create corresponding changes to the data leaving the network.
 854 This weakness allowed an adversary to change a padding cell to a destroy
 855 cell; change the destination address in a \emph{relay begin} cell to the
 856 adversary's webserver; or change an FTP command from
 857 {\tt dir} to {\tt rm~*}. Any adversary who could guess the encrypted
 858 content could introduce such corruption in a stream. (Even an external
 859 adversary could do this, because the link encryption similarly used a
 860 stream cipher.)
 861
 862 Tor uses TLS on its links---its integrity checking
 863 protects data from modification by external adversaries.
 864 Addressing the insider malleability attack, however, is
 865 more complex.
 866
 867 We could do integrity checking of the relay cells at each hop, either
 868 by including hashes or by using an authenticating cipher mode like
 869 EAX~\cite{eax}, but there are some problems. First, these approaches
 870 impose a message-expansion overhead at each hop, and so we would have to
 871 either leak the path length or waste bytes by padding to a maximum
 872 path length. Second, these solutions can only verify traffic coming
 873 from Alice: ORs would not be able to produce suitable hashes for
 874 the intermediate hops, since the ORs on a circuit do not know the
 875 other ORs' session keys. Third, we have already accepted that our design
 876 is vulnerable to end-to-end timing attacks; tagging attacks performed
 877 within the circuit provide no additional information to the attacker.
 878
 879 Thus, we check integrity only at the edges of each stream. When Alice
 880 negotiates a key with a new hop, they each initialize a SHA-1
 881 digest with a derivative of that key,
 882 thus beginning with randomness that only the two of them know. From
 883 then on they each incrementally add to the SHA-1 digest the contents of
 884 all relay cells they create, and include with each relay cell the
 885 first four bytes of the current digest.  Each also keeps a SHA-1
 886 digest of data received, to verify that the received hashes are correct.
 887
 888 To be sure of removing or modifying a cell, the attacker must be able
 889 to deduce the current digest state (which depends on all
 890 traffic between Alice and Bob, starting with their negotiated key).
 891 Attacks on SHA-1 where the adversary can incrementally add to a hash
 892 to produce a new valid hash don't work, because all hashes are
 893 end-to-end encrypted across the circuit.  The computational overhead
 894 of computing the digests is minimal compared to doing the AES
 895 encryption performed at each hop of the circuit. We use only four
 896 bytes per cell to minimize overhead; the chance that an adversary will
 897 correctly guess a valid hash
 898 %, plus the payload the current cell,
 899 is
 900 acceptably low, given that Alice or Bob tear down the circuit if they
 901 receive a bad hash.
 902
 903 \subsection{Rate limiting and fairness}
 904 \label{subsec:rate-limit}
 905
 906 Volunteers are more willing to run services that can limit
 907 their bandwidth usage. To accommodate them, Tor servers use a
 908 token bucket approach~\cite{tannenbaum96} to
 909 enforce a long-term average rate of incoming bytes, while still
 910 permitting short-term bursts above the allowed bandwidth.
 911 % Current bucket sizes are set to ten seconds' worth of traffic.
 912
 913 %Further, we want to avoid starving any Tor streams. Entire circuits
 914 %could starve if we read greedily from connections and one connection
 915 %uses all the remaining bandwidth. We solve this by dividing the number
 916 %of tokens in the bucket by the number of connections that want to read,
 917 %and reading at most that number of bytes from each connection. We iterate
 918 %this procedure until the number of tokens in the bucket is under some
 919 %threshold (currently 10KB), at which point we greedily read from connections.
 920
 921 Because the Tor protocol generates roughly the same number of outgoing
 922 bytes as incoming bytes, it is sufficient in practice to limit only
 923 incoming bytes.
 924 With TCP streams, however, the correspondence is not one-to-one:
 925 relaying a single incoming byte can require an entire 512-byte cell.
 926 (We can't just wait for more bytes, because the local application may
 927 be waiting for a reply.) Therefore, we treat this case as if the entire
 928 cell size had been read, regardless of the fullness of the cell.
 929
 930 Further, inspired by Rennhard et al's design in~\cite{anonnet}, a
 931 circuit's edges can heuristically distinguish interactive streams from bulk
 932 streams by comparing the frequency with which they supply cells.  We can
 933 provide good latency for interactive streams by giving them preferential
 934 service, while still giving good overall throughput to the bulk
 935 streams. Such preferential treatment presents a possible end-to-end
 936 attack, but an adversary observing both
 937 ends of the stream can already learn this information through timing
 938 attacks.
 939
 940 \subsection{Congestion control}
 941 \label{subsec:congestion}
 942
 943 Even with bandwidth rate limiting, we still need to worry about
 944 congestion, either accidental or intentional. If enough users choose the
 945 same OR-to-OR connection for their circuits, that connection can become
 946 saturated. For example, an attacker could send a large file
 947 through the Tor network to a webserver he runs, and then
 948 refuse to read any of the bytes at the webserver end of the
 949 circuit. Without some congestion control mechanism, these bottlenecks
 950 can propagate back through the entire network. We don't need to
 951 reimplement full TCP windows (with sequence numbers,
 952 the ability to drop cells when we're full and retransmit later, and so
 953 on),
 954 because TCP already guarantees in-order delivery of each
 955 cell.
 956 %But we need to investigate further the effects of the current
 957 %parameters on throughput and latency, while also keeping privacy in mind;
 958 %see Section~\ref{sec:maintaining-anonymity} for more discussion.
 959 We describe our response below.
 960
 961 \textbf{Circuit-level throttling:}
 962 To control a circuit's bandwidth usage, each OR keeps track of two
 963 windows. The \emph{packaging window} tracks how many relay data cells the OR is
 964 allowed to package (from incoming TCP streams) for transmission back to the OP,
 965 and the \emph{delivery window} tracks how many relay data cells it is willing
 966 to deliver to TCP streams outside the network. Each window is initialized
 967 (say, to 1000 data cells). When a data cell is packaged or delivered,
 968 the appropriate window is decremented. When an OR has received enough
 969 data cells (currently 100), it sends a \emph{relay sendme} cell towards the OP,
 970 with streamID zero. When an OR receives a \emph{relay sendme} cell with
 971 streamID zero, it increments its packaging window. Either of these cells
 972 increments the corresponding window by 100. If the packaging window
 973 reaches 0, the OR stops reading from TCP connections for all streams
 974 on the corresponding circuit, and sends no more relay data cells until
 975 receiving a \emph{relay sendme} cell.
 976
 977 The OP behaves identically, except that it must track a packaging window
 978 and a delivery window for every OR in the circuit. If a packaging window
 979 reaches 0, it stops reading from streams destined for that OR.
 980
 981 \textbf{Stream-level throttling}:
 982 The stream-level congestion control mechanism is similar to the
 983 circuit-level mechanism. ORs and OPs use \emph{relay sendme} cells
 984 to implement end-to-end flow control for individual streams across
 985 circuits. Each stream begins with a packaging window (currently 500 cells),
 986 and increments the window by a fixed value (50) upon receiving a \emph{relay
 987 sendme} cell. Rather than always returning a \emph{relay sendme} cell as soon
 988 as enough cells have arrived, the stream-level congestion control also
 989 has to check whether data has been successfully flushed onto the TCP
 990 stream; it sends the \emph{relay sendme} cell only when the number of bytes pending
 991 to be flushed is under some threshold (currently 10 cells' worth).
 992
 993 %% Maybe omit this next paragraph. -NM
 994 %Currently, non-data relay cells do not affect the windows. Thus we
 995 %avoid potential deadlock issues, for example, arising because a stream
 996 %can't send a \emph{relay sendme} cell when its packaging window is empty.
 997
 998 These arbitrarily chosen parameters seem to give tolerable throughput
 999 and delay; see Section~\ref{sec:in-the-wild}.
1000
1001 \subsection{Rendezvous Points and hidden services}
1002 \label{subsec:rendezvous}
1003
1004 Rendezvous points are a building block for \emph{location-hidden
1005 services} (also known as \emph{responder anonymity}) in the Tor
1006 network.  Location-hidden services allow Bob to offer a TCP
1007 service, such as a webserver, without revealing his IP address.
1008 This type of anonymity protects against distributed DoS attacks:
1009 attackers are forced to attack the onion routing network
1010 because they do not know Bob's IP address.
1011
1012 Our design for location-hidden servers has the following goals.
1013 \textbf{Access-controlled:} Bob needs a way to filter incoming requests,
1014 so an attacker cannot flood Bob simply by making many connections to him.
1015 \textbf{Robust:} Bob should be able to maintain a long-term pseudonymous
1016 identity even in the presence of router failure. Bob's service must
1017 not be tied to a single OR, and Bob must be able to tie his service
1018 to new ORs. \textbf{Smear-resistant:}
1019 A social attacker who offers an illegal or disreputable location-hidden
1020 service should not be able to ``frame'' a rendezvous router by
1021 making observers believe the router created that service.
1022 \textbf{Application-transparent:} Although we require users
1023 to run special software to access location-hidden servers, we must not
1024 require them to modify their applications.
1025
1026 We provide location-hiding for Bob by allowing him to advertise
1027 several onion routers (his \emph{introduction points}) as contact
1028 points. He may do this on any robust efficient
1029 key-value lookup system with authenticated updates, such as a
1030 distributed hash table (DHT) like CFS~\cite{cfs:sosp01}\footnote{
1031 Rather than rely on an external infrastructure, the Onion Routing network
1032 can run the DHT itself.  At first, we can run a simple lookup
1033 system on the
1034 directory servers.} Alice, the client, chooses an OR as her
1035 \emph{rendezvous point}. She connects to one of Bob's introduction
1036 points, informs him of her rendezvous point, and then waits for him
1037 to connect to the rendezvous point. This extra level of indirection
1038 helps Bob's introduction points avoid problems associated with serving
1039 unpopular files directly (for example, if Bob serves
1040 material that the introduction point's community finds objectionable,
1041 or if Bob's service tends to get attacked by network vandals).
1042 The extra level of indirection also allows Bob to respond to some requests
1043 and ignore others.
1044
1045 In Appendix~\ref{sec:rendezvous-specifics} we provide a more detailed
1046 description of the rendezvous protocol, integration issues, attacks,
1047 and related rendezvous work.
1048
1049 \section{Other design decisions}
1050 \label{sec:other-design}
1051
1052 \subsection{Resource management and denial-of-service}
1053 \label{subsec:dos}
1054
1055 Providing Tor as a public service creates many opportunities for
1056 denial-of-service attacks against the network.  While
1057 flow control and rate limiting (discussed in
1058 Section~\ref{subsec:congestion}) prevent users from consuming more
1059 bandwidth than routers are willing to provide, opportunities remain for
1060 users to
1061 consume more network resources than their fair share, or to render the
1062 network unusable for others.
1063
1064 First of all, there are several CPU-consuming denial-of-service
1065 attacks wherein an attacker can force an OR to perform expensive
1066 cryptographic operations.  For example, an attacker who sends a
1067 \emph{create} cell full of junk bytes can force an OR to perform an RSA
1068 decrypt.  Similarly, an attacker can
1069 fake the start of a TLS handshake, forcing the OR to carry out its
1070 (comparatively expensive) half of the handshake at no real computational
1071 cost to the attacker.
1072
1073 We have not yet implemented any defenses for these attacks, but several
1074 approaches are possible. First, ORs can
1075 require clients to solve a puzzle~\cite{puzzles-tls} while beginning new
1076 TLS handshakes or accepting \emph{create} cells.  So long as these
1077 tokens are easy to verify and computationally expensive to produce, this
1078 approach limits the attack multiplier.  Additionally, ORs can limit
1079 the rate at which they accept \emph{create} cells and TLS connections,
1080 so that
1081 the computational work of processing them does not drown out the
1082 symmetric cryptography operations that keep cells
1083 flowing.  This rate limiting could, however, allow an attacker
1084 to slow down other users when they build new circuits.
1085
1086 % What about link-to-link rate limiting?
1087
1088 Adversaries can also attack the Tor network's hosts and network
1089 links. Disrupting a single circuit or link breaks all streams passing
1090 along that part of the circuit. Users similarly lose service
1091 when a router crashes or its operator restarts it. The current
1092 Tor design treats such attacks as intermittent network failures, and
1093 depends on users and applications to respond or recover as appropriate. A
1094 future design could use an end-to-end TCP-like acknowledgment protocol,
1095 so that no streams are lost unless the entry or exit point itself is
1096 disrupted. This solution would require more buffering at the network
1097 edges, however, and the performance and anonymity implications from this
1098 extra complexity still require investigation.
1099
1100 \subsection{Exit policies and abuse}
1101 \label{subsec:exitpolicies}
1102
1103 % originally, we planned to put the "users only know the hostname,
1104 % not the IP, but exit policies are by IP" problem here too. Not
1105 % worth putting in the submission, but worth thinking about putting
1106 % in sometime somehow. -RD
1107
1108 Exit abuse is a serious barrier to wide-scale Tor deployment. Anonymity
1109 presents would-be vandals and abusers with an opportunity to hide
1110 the origins of their activities. Attackers can harm the Tor network by
1111 implicating exit servers for their abuse. Also, applications that commonly
1112 use IP-based authentication (such as institutional mail or webservers)
1113 can be fooled by the fact that anonymous connections appear to originate
1114 at the exit OR.
1115
1116 We stress that Tor does not enable any new class of abuse. Spammers
1117 and other attackers already have access to thousands of misconfigured
1118 systems worldwide, and the Tor network is far from the easiest way
1119 to launch attacks.
1120 %Indeed, because of its limited
1121 %anonymity, Tor is probably not a good way to commit crimes.
1122 But because the
1123 onion routers can be mistaken for the originators of the abuse,
1124 and the volunteers who run them may not want to deal with the hassle of
1125 explaining anonymity networks to irate administrators, we must block or limit
1126 the abuse that travels through the Tor network.
1127
1128 To mitigate abuse issues, in Tor, each onion router's \emph{exit policy}
1129 describes to which external addresses and ports the router will
1130 connect. On one end of the spectrum are \emph{open exit}
1131 nodes that will connect anywhere. On the other end are \emph{middleman}
1132 nodes that only relay traffic to other Tor nodes, and \emph{private exit}
1133 nodes that only connect to a local host or network.  Using a private
1134 exit (if one exists) is a more secure way for a client to connect to a
1135 given host or network---an external adversary cannot eavesdrop traffic
1136 between the private exit and the final destination, and so is less sure of
1137 Alice's destination and activities. Most onion routers in the current
1138 network function as
1139 \emph{restricted exits} that permit connections to the world at large,
1140 but prevent access to certain abuse-prone addresses and services such
1141 as SMTP.
1142 Additionally, in some cases the OR can authenticate clients to
1143 prevent exit abuse without harming anonymity~\cite{or-discex00}.
1144
1145 %The abuse issues on closed (e.g. military) networks are different
1146 %from the abuse on open networks like the Internet. While these IP-based
1147 %access controls are still commonplace on the Internet, on closed networks,
1148 %nearly all participants will be honest, and end-to-end authentication
1149 %can be assumed for important traffic.
1150
1151 Many administrators use port restrictions to support only a
1152 limited set of services, such as HTTP, SSH, or AIM.
1153 This is not a complete solution, of course, since abuse opportunities for these
1154 protocols are still well known.
1155
1156 We have not yet encountered any abuse in the deployed network, but if
1157 we do we should consider using proxies to clean traffic for certain
1158 protocols as it leaves the network.  For example, much abusive HTTP
1159 behavior (such as exploiting buffer overflows or well-known script
1160 vulnerabilities) can be detected in a straightforward manner.
1161 Similarly, one could run automatic spam filtering software (such as
1162 SpamAssassin) on email exiting the OR network.
1163
1164 ORs may also rewrite exiting traffic to append
1165 headers or other information indicating that the traffic has passed
1166 through an anonymity service.  This approach is commonly used
1167 by email-only anonymity systems.  ORs can also
1168 run on servers with hostnames like {\tt anonymous} to further
1169 alert abuse targets to the nature of the anonymous traffic.
1170
1171 A mixture of open and restricted exit nodes allows the most
1172 flexibility for volunteers running servers. But while having many
1173 middleman nodes provides a large and robust network,
1174 having only a few exit nodes reduces the number of points
1175 an adversary needs to monitor for traffic analysis, and places a
1176 greater burden on the exit nodes.  This tension can be seen in the
1177 Java Anon Proxy
1178 cascade model, wherein only one node in each cascade needs to handle
1179 abuse complaints---but an adversary only needs to observe the entry
1180 and exit of a cascade to perform traffic analysis on all that
1181 cascade's users. The hydra model (many entries, few exits) presents a
1182 different compromise: only a few exit nodes are needed, but an
1183 adversary needs to work harder to watch all the clients; see
1184 Section~\ref{sec:conclusion}.
1185
1186 Finally, we note that exit abuse must not be dismissed as a peripheral
1187 issue: when a system's public image suffers, it can reduce the number
1188 and diversity of that system's users, and thereby reduce the anonymity
1189 of the system itself.  Like usability, public perception is a
1190 security parameter.  Sadly, preventing abuse of open exit nodes is an
1191 unsolved problem, and will probably remain an arms race for the
1192 foreseeable future.  The abuse problems faced by Princeton's CoDeeN
1193 project~\cite{darkside} give us a glimpse of likely issues.
1194
1195 \subsection{Directory Servers}
1196 \label{subsec:dirservers}
1197
1198 First-generation Onion Routing designs~\cite{freedom2-arch,or-jsac98} used
1199 in-band network status updates: each router flooded a signed statement
1200 to its neighbors, which propagated it onward. But anonymizing networks
1201 have different security goals than typical link-state routing protocols.
1202 For example, delays (accidental or intentional)
1203 that can cause different parts of the network to have different views
1204 of link-state and topology are not only inconvenient: they give
1205 attackers an opportunity to exploit differences in client knowledge.
1206 We also worry about attacks to deceive a
1207 client about the router membership list, topology, or current network
1208 state. Such \emph{partitioning attacks} on client knowledge help an
1209 adversary to efficiently deploy resources
1210 against a target~\cite{minion-design}.
1211
1212 Tor uses a small group of redundant, well-known onion routers to
1213 track changes in network topology and node state, including keys and
1214 exit policies.  Each such \emph{directory server} acts as an HTTP
1215 server, so clients can fetch current network state
1216 and router lists, and so other ORs can upload
1217 state information.  Onion routers periodically publish signed
1218 statements of their state to each directory server. The directory servers
1219 combine this information with their own views of network liveness,
1220 and generate a signed description (a \emph{directory}) of the entire
1221 network state. Client software is
1222 pre-loaded with a list of the directory servers and their keys,
1223 to bootstrap each client's view of the network.
1224 % XXX this means that clients will be forced to upgrade as the
1225 % XXX dirservers change or get compromised. argue that this is ok.
1226
1227 When a directory server receives a signed statement for an OR, it
1228 checks whether the OR's identity key is recognized. Directory
1229 servers do not automatically advertise unrecognized ORs. (If they did,
1230 an adversary could take over the network by creating many
1231 servers~\cite{sybil}.) Instead, new nodes must be approved by the
1232 directory
1233 server administrator before they are included. Mechanisms for automated
1234 node approval are an area of active research, and are discussed more
1235 in Section~\ref{sec:maintaining-anonymity}.
1236
1237 Of course, a variety of attacks remain. An adversary who controls
1238 a directory server can track clients by providing them different
1239 information---perhaps by listing only nodes under its control, or by
1240 informing only certain clients about a given node. Even an external
1241 adversary can exploit differences in client knowledge: clients who use
1242 a node listed on one directory server but not the others are vulnerable.
1243
1244 Thus these directory servers must be synchronized and redundant, so
1245 that they can agree on a common directory.  Clients should only trust
1246 this directory if it is signed by a threshold of the directory
1247 servers.
1248
1249 The directory servers in Tor are modeled after those in
1250 Mixminion~\cite{minion-design}, but our situation is easier. First,
1251 we make the
1252 simplifying assumption that all participants agree on the set of
1253 directory servers. Second, while Mixminion needs to predict node
1254 behavior, Tor only needs a threshold consensus of the current
1255 state of the network.
1256
1257 Tor directory servers build a consensus directory through a simple
1258 four-round broadcast protocol.  In round one, each server dates and
1259 signs its current opinion, and broadcasts it to the other directory
1260 servers; then in round two, each server rebroadcasts all the signed
1261 opinions it has received.  At this point all directory servers check
1262 to see whether any server has signed multiple opinions in the same
1263 period. Such a server is either broken or cheating, so the protocol
1264 stops and notifies the administrators, who either remove the cheater
1265 or wait for the broken server to be fixed.  If there are no
1266 discrepancies, each directory server then locally computes an algorithm
1267 (described below)
1268 on the set of opinions, resulting in a uniform shared directory. In
1269 round three servers sign this directory and broadcast it; and finally
1270 in round four the servers rebroadcast the directory and all the
1271 signatures.  If any directory server drops out of the network, its
1272 signature is not included on the final directory.
1273
1274 The rebroadcast steps ensure that a directory server is heard by
1275 either all of the other servers or none of them, even when some links
1276 are down (assuming that any two directory servers can talk directly or
1277 via a third). Broadcasts are feasible because there are relatively few
1278 directory servers (currently 3, but we expect as many as 9 as the network
1279 scales). Computing the shared directory locally is a straightforward
1280 threshold voting process: we include an OR if a majority of directory
1281 servers believe it to be good.
1282
1283 To avoid attacks where a router connects to all the directory servers
1284 but refuses to relay traffic from other routers, the directory servers
1285 must build circuits and use them to anonymously test router
1286 reliability~\cite{mix-acc}. Unfortunately, this defense is not yet
1287 designed or
1288 implemented.
1289
1290 Using directory servers is simpler and more flexible than flooding.
1291 Flooding is expensive, and complicates the analysis when we
1292 start experimenting with non-clique network topologies. Signed
1293 directories can be cached by other
1294 onion routers,
1295 so directory servers are not a performance
1296 bottleneck when we have many users, and do not aid traffic analysis by
1297 forcing clients to periodically announce their existence to any
1298 central point.
1299
1300
1301 \section{Attacks and Defenses}
1302 \label{sec:attacks}
1303
1304 Below we summarize a variety of attacks, and discuss how well our
1305 design withstands them.\\
1306
1307 \noindent{\large\bf Passive attacks}\\
1308 \emph{Observing user traffic patterns.} Observing a user's connection
1309 will not reveal her destination or data, but it will
1310 reveal traffic patterns (both sent and received). Profiling via user
1311 connection patterns requires further processing, because multiple
1312 application streams may be operating simultaneously or in series over
1313 a single circuit.
1314
1315 \emph{Observing user content.} While content at the user end is encrypted,
1316 connections to responders may not be (indeed, the responding website
1317 itself may be hostile). While filtering content is not a primary goal
1318 of Onion Routing, Tor can directly use Privoxy and related
1319 filtering services to anonymize application data streams.
1320
1321 \emph{Option distinguishability.} We allow clients to choose
1322 configuration options. For example, clients concerned about request
1323 linkability should rotate circuits more often than those concerned
1324 about traceability. Allowing choice may attract users with different
1325 %There is economic incentive to attract users by
1326 %allowing this choice;
1327 needs; but clients who are
1328 in the minority may lose more anonymity by appearing distinct than they
1329 gain by optimizing their behavior~\cite{econymics}.
1330
1331 \emph{End-to-end timing correlation.}  Tor only minimally hides
1332 such correlations. An attacker watching patterns of
1333 traffic at the initiator and the responder will be
1334 able to confirm the correspondence with high probability. The
1335 greatest protection currently available against such confirmation is to hide
1336 the connection between the onion proxy and the first Tor node,
1337 by running the OP on the Tor node or behind a firewall. This approach
1338 requires an observer to separate traffic originating at the onion
1339 router from traffic passing through it: a global observer can do this,
1340 but it might be beyond a limited observer's capabilities.
1341
1342 \emph{End-to-end size correlation.} Simple packet counting
1343 will also be effective in confirming
1344 endpoints of a stream. However, even without padding, we have some
1345 limited protection: the leaky pipe topology means different numbers
1346 of packets may enter one end of a circuit than exit at the other.
1347
1348 \emph{Website fingerprinting.} All the effective passive
1349 attacks above are traffic confirmation attacks,
1350 which puts them outside our design goals. There is also
1351 a passive traffic analysis attack that is potentially effective.
1352 Rather than searching exit connections for timing and volume
1353 correlations, the adversary may build up a database of
1354 ``fingerprints'' containing file sizes and access patterns for
1355 targeted websites. He can later confirm a user's connection to a given
1356 site simply by consulting the database. This attack has
1357 been shown to be effective against SafeWeb~\cite{hintz-pet02}.
1358 It may be less effective against Tor, since
1359 streams are multiplexed within the same circuit, and
1360 fingerprinting will be limited to
1361 the granularity of cells (currently 512 bytes). Additional
1362 defenses could include
1363 larger cell sizes, padding schemes to group websites
1364 into large sets, and link
1365 padding or long-range dummies.\footnote{Note that this fingerprinting
1366 attack should not be confused with the much more complicated latency
1367 attacks of~\cite{back01}, which require a fingerprint of the latencies
1368 of all circuits through the network, combined with those from the
1369 network edges to the target user and the responder website.}\\
1370
1371 \noindent{\large\bf Active attacks}\\
1372 \emph{Compromise keys.} An attacker who learns the TLS session key can
1373 see control cells and encrypted relay cells on every circuit on that
1374 connection; learning a circuit
1375 session key lets him unwrap one layer of the encryption. An attacker
1376 who learns an OR's TLS private key can impersonate that OR for the TLS
1377 key's lifetime, but he must
1378 also learn the onion key to decrypt \emph{create} cells (and because of
1379 perfect forward secrecy, he cannot hijack already established circuits
1380 without also compromising their session keys). Periodic key rotation
1381 limits the window of opportunity for these attacks. On the other hand,
1382 an attacker who learns a node's identity key can replace that node
1383 indefinitely by sending new forged descriptors to the directory servers.
1384
1385 \emph{Iterated compromise.} A roving adversary who can
1386 compromise ORs (by system intrusion, legal coercion, or extralegal
1387 coercion) could march down the circuit compromising the
1388 nodes until he reaches the end.  Unless the adversary can complete
1389 this attack within the lifetime of the circuit, however, the ORs
1390 will have discarded the necessary information before the attack can
1391 be completed.  (Thanks to the perfect forward secrecy of session
1392 keys, the attacker cannot force nodes to decrypt recorded
1393 traffic once the circuits have been closed.)  Additionally, building
1394 circuits that cross jurisdictions can make legal coercion
1395 harder---this phenomenon is commonly called ``jurisdictional
1396 arbitrage.'' The Java Anon Proxy project recently experienced the
1397 need for this approach, when
1398 a German court forced them to add a backdoor to
1399 all of their nodes~\cite{jap-backdoor}.
1400
1401 \emph{Run a recipient.} An adversary running a webserver
1402 trivially learns the timing patterns of users connecting to it, and
1403 can introduce arbitrary patterns in its responses.
1404 End-to-end attacks become easier: if the adversary can induce
1405 users to connect to his webserver (perhaps by advertising
1406 content targeted to those users), she now holds one end of their
1407 connection.  There is also a danger that application
1408 protocols and associated programs can be induced to reveal information
1409 about the initiator. Tor depends on Privoxy and similar protocol cleaners
1410 to solve this latter problem.
1411
1412 \emph{Run an onion proxy.} It is expected that end users will
1413 nearly always run their own local onion proxy. However, in some
1414 settings, it may be necessary for the proxy to run
1415 remotely---typically, in institutions that want
1416 to monitor the activity of those connecting to the proxy.
1417 Compromising an onion proxy compromises all future connections
1418 through it.
1419
1420 \emph{DoS non-observed nodes.} An observer who can only watch some
1421 of the Tor network can increase the value of this traffic
1422 by attacking non-observed nodes to shut them down, reduce
1423 their reliability, or persuade users that they are not trustworthy.
1424 The best defense here is robustness.
1425
1426 \emph{Run a hostile OR.}  In addition to being a local observer,
1427 an isolated hostile node can create circuits through itself, or alter
1428 traffic patterns to affect traffic at other nodes. Nonetheless, a hostile
1429 node must be immediately adjacent to both endpoints to compromise the
1430 anonymity of a circuit. If an adversary can
1431 run multiple ORs, and can persuade the directory servers
1432 that those ORs are trustworthy and independent, then occasionally
1433 some user will choose one of those ORs for the start and another
1434 as the end of a circuit. If an adversary
1435 controls $m>1$ out of $N$ nodes, he can to correlate at most
1436 $\left(\frac{m}{N}\right)^2$ of the traffic in this way---although an
1437 adversary
1438 could possibly attract a disproportionately large amount of traffic
1439 by running an OR with a permissive exit policy, or by
1440 degrading the reliability of other routers.
1441
1442 \emph{Introduce timing into messages.} This is simply a stronger
1443 version of passive timing attacks already discussed earlier.
1444
1445 \emph{Tagging attacks.} A hostile node could ``tag'' a
1446 cell by altering it. If the
1447 stream were, for example, an unencrypted request to a Web site,
1448 the garbled content coming out at the appropriate time would confirm
1449 the association. However, integrity checks on cells prevent
1450 this attack.
1451
1452 \emph{Replace contents of unauthenticated protocols.}  When
1453 relaying an unauthenticated protocol like HTTP, a hostile exit node
1454 can impersonate the target server. Clients
1455 should prefer protocols with end-to-end authentication.
1456
1457 \emph{Replay attacks.} Some anonymity protocols are vulnerable
1458 to replay attacks.  Tor is not; replaying one side of a handshake
1459 will result in a different negotiated session key, and so the rest
1460 of the recorded session can't be used.
1461
1462 \emph{Smear attacks.} An attacker could use the Tor network for
1463 socially disapproved acts, to bring the
1464 network into disrepute and get its operators to shut it down.
1465 Exit policies reduce the possibilities for abuse, but
1466 ultimately the network requires volunteers who can tolerate
1467 some political heat.
1468
1469 \emph{Distribute hostile code.} An attacker could trick users
1470 into running subverted Tor software that did not, in fact, anonymize
1471 their connections---or worse, could trick ORs into running weakened
1472 software that provided users with less anonymity.  We address this
1473 problem (but do not solve it completely) by signing all Tor releases
1474 with an official public key, and including an entry in the directory
1475 that lists which versions are currently believed to be secure.  To
1476 prevent an attacker from subverting the official release itself
1477 (through threats, bribery, or insider attacks), we provide all
1478 releases in source code form, encourage source audits, and
1479 frequently warn our users never to trust any software (even from
1480 us) that comes without source.\\
1481
1482 \noindent{\large\bf Directory attacks}\\
1483 \emph{Destroy directory servers.}  If a few directory
1484 servers disappear, the others still decide on a valid
1485 directory.  So long as any directory servers remain in operation,
1486 they will still broadcast their views of the network and generate a
1487 consensus directory.  (If more than half are destroyed, this
1488 directory will not, however, have enough signatures for clients to
1489 use it automatically; human intervention will be necessary for
1490 clients to decide whether to trust the resulting directory.)
1491
1492 \emph{Subvert a directory server.}  By taking over a directory server,
1493 an attacker can partially influence the final directory.  Since ORs
1494 are included or excluded by majority vote, the corrupt directory can
1495 at worst cast a tie-breaking vote to decide whether to include
1496 marginal ORs.  It remains to be seen how often such marginal cases
1497 occur in practice.
1498
1499 \emph{Subvert a majority of directory servers.} An adversary who controls
1500 more than half the directory servers can include as many compromised
1501 ORs in the final directory as he wishes. We must ensure that directory
1502 server operators are independent and attack-resistant.
1503
1504 \emph{Encourage directory server dissent.}  The directory
1505 agreement protocol assumes that directory server operators agree on
1506 the set of directory servers.  An adversary who can persuade some
1507 of the directory server operators to distrust one another could
1508 split the quorum into mutually hostile camps, thus partitioning
1509 users based on which directory they use.  Tor does not address
1510 this attack.
1511
1512 \emph{Trick the directory servers into listing a hostile OR.}
1513 Our threat model explicitly assumes directory server operators will
1514 be able to filter out most hostile ORs.
1515 % If this is not true, an
1516 % attacker can flood the directory with compromised servers.
1517
1518 \emph{Convince the directories that a malfunctioning OR is
1519 working.}  In the current Tor implementation, directory servers
1520 assume that an OR is running correctly if they can start a TLS
1521 connection to it.  A hostile OR could easily subvert this test by
1522 accepting TLS connections from ORs but ignoring all cells. Directory
1523 servers must actively test ORs by building circuits and streams as
1524 appropriate.  The tradeoffs of a similar approach are discussed
1525 in~\cite{mix-acc}.\\
1526
1527 \section{Early experiences: Tor in the Wild}
1528 \label{sec:in-the-wild}
1529
1530 As of mid-January 2004, the Tor network consists of 18 nodes
1531 (16 in the US, 2 in Europe), and more are joining each week as the code
1532 matures.\footnote{For comparison, the current remailer network
1533 has about 30 reliable nodes. We haven't asked PlanetLab to provide
1534 Tor nodes, since their AUP wouldn't allow exit nodes (see
1535 also~\cite{darkside}) and because we aim to build a long-term community of
1536 node operators and developers.} Each node has at least a 768Kb/768Kb
1537 connection, and
1538 many have 10Mb. The number of users varies (and of course, it's hard to
1539 tell for sure), but we sometimes have several hundred users---administrators at
1540 several companies have begun sending their entire departments' web
1541 traffic through Tor, to block other divisions of
1542 their company from reading their traffic. Tor users have reported using
1543 the network for web browsing, FTP, IRC, AIM, Kazaa, and SSH.
1544
1545 Each Tor node currently processes roughly 800,000 relay
1546 cells (a bit under half a gigabyte) per week. On average, about 80\%
1547 of each 500-byte payload is full for cells going back to the client,
1548 whereas about 40\% is full for cells coming from the client. (The difference
1549 arises because most of the network's traffic is web browsing.) Interactive
1550 traffic like SSH brings down the average a lot---once we have more
1551 experience, and assuming we can resolve the anonymity issues, we may
1552 partition traffic into two relay cell sizes: one to handle
1553 bulk traffic and one for interactive traffic.
1554
1555 Based in part on our restrictive default exit policy (we
1556 reject SMTP requests) and our low profile, we have had no abuse
1557 issues since the network was deployed in October
1558 2003. Our slow growth rate gives us time to add features,
1559 resolve bugs, and get a feel for what users actually want from an
1560 anonymity system.  Even though having more users would bolster our
1561 anonymity sets, we are not eager to attract the Kazaa or warez
1562 communities---we feel that we must build a reputation for privacy, human
1563 rights, research, and other socially laudable activities.
1564
1565 As for performance, profiling shows that Tor spends almost
1566 all its CPU time in AES, which is fast.  Current latency is attributable
1567 to two factors. First, network latency is critical: we are
1568 intentionally bouncing traffic around the world several times. Second,
1569 our end-to-end congestion control algorithm focuses on protecting
1570 volunteer servers from accidental DoS rather than on optimizing
1571 performance. % Right now the first $500 \times 500\mbox{B}=250\mbox{KB}$
1572 %of the stream arrives
1573 %quickly, and after that throughput depends on the rate that \emph{relay
1574 %sendme} acknowledgments arrive.
1575 To quantify these effects, we did some informal tests using a network of 4
1576 nodes on the same machine (a heavily loaded 1GHz Athlon). We downloaded a 60
1577 megabyte file from {\tt debian.org} every 30 minutes for 54 hours (108 sample
1578 points). It arrived in about 300 seconds on average, compared to 210s for a
1579 direct download. We ran a similar test on the production Tor network,
1580 fetching the front page of {\tt cnn.com} (55 kilobytes): %every 10 minutes for
1581 %26 hours (156 sample points):
1582 while a direct
1583 download consistently took about 0.3s, the performance through Tor was highly
1584 variable. Some downloads were as fast as 0.6s, with a median at 2.7s, and
1585 80\% finishing within 5.7s.  It seems that as the network expands, the chance
1586 of building a slow circuit (one that includes a slow or heavily loaded node
1587 or link) is increasing.  On the other hand, as our users remain satisfied
1588 with this increased latency, we can address our performance incrementally as we
1589 proceed with development.\footnote{For example, we have just begun pushing
1590 a pipelining patch to the production network that seems to decrease
1591 latency for medium-to-large files; we will present revised benchmarks
1592 as they become available.}
1593
1594 %With the current network's topology and load, users can typically get 1-2
1595 %megabits sustained transfer rate, which is good enough for now.
1596 Indeed, the Tor
1597 design aims foremost to provide a security research platform; performance
1598 only needs to be sufficient to retain users~\cite{econymics,back01}.
1599 We can tweak the congestion control
1600 parameters to provide faster throughput at the cost of
1601 larger buffers at each node; adding the heuristics mentioned in
1602 Section~\ref{subsec:rate-limit} to favor low-volume
1603 streams may also help. More research remains to find the
1604 right balance.
1605 % We should say _HOW MUCH_ latency there is in these cases. -NM
1606
1607 %performs badly on lossy networks. may need airhook or something else as
1608 %transport alternative?
1609
1610 Although Tor's clique topology and full-visibility directories present
1611 scaling problems, we still expect the network to support a few hundred
1612 nodes and maybe 10,000 users before we're forced to become
1613 more distributed. With luck, the experience we gain running the current
1614 topology will help us choose among alternatives when the time comes.
1615
1616 \section{Open Questions in Low-latency Anonymity}
1617 \label{sec:maintaining-anonymity}
1618
1619 In addition to the non-goals in
1620 Section~\ref{subsec:non-goals}, many other questions must be solved
1621 before we can be confident of Tor's security.
1622
1623 Many of these open issues are questions of balance. For example,
1624 how often should users rotate to fresh circuits? Frequent rotation
1625 is inefficient, expensive, and may lead to intersection attacks and
1626 predecessor attacks~\cite{wright03}, but infrequent rotation makes the
1627 user's traffic linkable. Besides opening fresh circuits, clients can
1628 also exit from the middle of the circuit,
1629 or truncate and re-extend the circuit. More analysis is
1630 needed to determine the proper tradeoff.
1631
1632 %% Duplicated by 'Better directory distribution' in section 9.
1633 %
1634 %A similar question surrounds timing of directory operations: how often
1635 %should directories be updated?  Clients that update infrequently receive
1636 %an inaccurate picture of the network, but frequent updates can overload
1637 %the directory servers. More generally, we must find more
1638 %decentralized yet practical ways to distribute up-to-date snapshots of
1639 %network status without introducing new attacks.
1640
1641 How should we choose path lengths? If Alice only ever uses two hops,
1642 then both ORs can be certain that by colluding they will learn about
1643 Alice and Bob. In our current approach, Alice always chooses at least
1644 three nodes unrelated to herself and her destination.
1645 %% This point is subtle, but not IMO necessary.  Anybody who thinks
1646 %% about it will see that it's implied by the above sentence; anybody
1647 %% who doesn't think about it is safe in his ignorance.
1648 %
1649 %Thus normally she chooses
1650 %three nodes, but if she is running an OR and her destination is on an OR,
1651 %she uses five.
1652 Should Alice choose a random path length (say,
1653 increasing it from a geometric distribution) to foil an attacker who
1654 uses timing to learn that he is the fifth hop and thus concludes that
1655 both Alice and the responder are on ORs?
1656
1657 Throughout this paper, we have assumed that end-to-end traffic
1658 confirmation will immediately and automatically defeat a low-latency
1659 anonymity system. Even high-latency anonymity systems can be
1660 vulnerable to end-to-end traffic confirmation, if the traffic volumes
1661 are high enough, and if users' habits are sufficiently
1662 distinct~\cite{statistical-disclosure,limits-open}. Can anything be
1663 done to
1664 make low-latency systems resist these attacks as well as high-latency
1665 systems? Tor already makes some effort to conceal the starts and ends of
1666 streams by wrapping long-range control commands in identical-looking
1667 relay cells. Link padding could frustrate passive observers who count
1668 packets; long-range padding could work against observers who own the
1669 first hop in a circuit. But more research remains to find an efficient
1670 and practical approach. Volunteers prefer not to run constant-bandwidth
1671 padding; but no convincing traffic shaping approach has been
1672 specified. Recent work on long-range padding~\cite{defensive-dropping}
1673 shows promise. One could also try to reduce correlation in packet timing
1674 by batching and re-ordering packets, but it is unclear whether this could
1675 improve anonymity without introducing so much latency as to render the
1676 network unusable.
1677
1678 A cascade topology may better defend against traffic confirmation by
1679 aggregating users, and making padding and
1680 mixing more affordable.  Does the hydra topology (many input nodes,
1681 few output nodes) work better against some adversaries? Are we going
1682 to get a hydra anyway because most nodes will be middleman nodes?
1683
1684 Common wisdom suggests that Alice should run her own OR for best
1685 anonymity, because traffic coming from her node could plausibly have
1686 come from elsewhere. How much mixing does this approach need?  Is it
1687 immediately beneficial because of real-world adversaries that can't
1688 observe Alice's router, but can run routers of their own?
1689
1690 To scale to many users, and to prevent an attacker from observing the
1691 whole network, it may be necessary
1692 to support far more servers than Tor currently anticipates.
1693 This introduces several issues.  First, if approval by a central set
1694 of directory servers is no longer feasible, what mechanism should be used
1695 to prevent adversaries from signing up many colluding servers? Second,
1696 if clients can no longer have a complete picture of the network,
1697 how can they perform discovery while preventing attackers from
1698 manipulating or exploiting gaps in their knowledge?  Third, if there
1699 are too many servers for every server to constantly communicate with
1700 every other, which non-clique topology should the network use?
1701 (Restricted-route topologies promise comparable anonymity with better
1702 scalability~\cite{danezis-pets03}, but whatever topology we choose, we
1703 need some way to keep attackers from manipulating their position within
1704 it~\cite{casc-rep}.) Fourth, if no central authority is tracking
1705 server reliability, how do we stop unreliable servers from making
1706 the network unusable?  Fifth, do clients receive so much anonymity
1707 from running their own ORs that we should expect them all to do
1708 so~\cite{econymics}, or do we need another incentive structure to
1709 motivate them?  Tarzan and MorphMix present possible solutions.
1710
1711 % advogato, captcha
1712
1713 When a Tor node goes down, all its circuits (and thus streams) must break.
1714 Will users abandon the system because of this brittleness? How well
1715 does the method in Section~\ref{subsec:dos} allow streams to survive
1716 node failure? If affected users rebuild circuits immediately, how much
1717 anonymity is lost? It seems the problem is even worse in a peer-to-peer
1718 environment---such systems don't yet provide an incentive for peers to
1719 stay connected when they're done retrieving content, so we would expect
1720 a higher churn rate.
1721
1722 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1723
1724 \section{Future Directions}
1725 \label{sec:conclusion}
1726
1727 Tor brings together many innovations into a unified deployable system. The
1728 next immediate steps include:
1729
1730 \emph{Scalability:} Tor's emphasis on deployability and design simplicity
1731 has led us to adopt a clique topology, semi-centralized
1732 directories, and a full-network-visibility model for client
1733 knowledge. These properties will not scale past a few hundred servers.
1734 Section~\ref{sec:maintaining-anonymity} describes some promising
1735 approaches, but more deployment experience will be helpful in learning
1736 the relative importance of these bottlenecks.
1737
1738 \emph{Bandwidth classes:} This paper assumes that all ORs have
1739 good bandwidth and latency. We should instead adopt the MorphMix model,
1740 where nodes advertise their bandwidth level (DSL, T1, T3), and
1741 Alice avoids bottlenecks by choosing nodes that match or
1742 exceed her bandwidth. In this way DSL users can usefully join the Tor
1743 network.
1744
1745 \emph{Incentives:} Volunteers who run nodes are rewarded with publicity
1746 and possibly better anonymity~\cite{econymics}. More nodes means increased
1747 scalability, and more users can mean more anonymity. We need to continue
1748 examining the incentive structures for participating in Tor. Further,
1749 we need to explore more approaches to limiting abuse, and understand
1750 why most people don't bother using privacy systems.
1751
1752 \emph{Cover traffic:} Currently Tor omits cover traffic---its costs
1753 in performance and bandwidth are clear but its security benefits are
1754 not well understood. We must pursue more research on link-level cover
1755 traffic and long-range cover traffic to determine whether some simple padding
1756 method offers provable protection against our chosen adversary.
1757
1758 %%\emph{Offer two relay cell sizes:} Traffic on the Internet tends to be
1759 %%large for bulk transfers and small for interactive traffic. One cell
1760 %%size cannot be optimal for both types of traffic.
1761 % This should go in the spec and todo, but not the paper yet. -RD
1762
1763 \emph{Caching at exit nodes:} Perhaps each exit node should run a
1764 caching web proxy~\cite{shsm03}, to improve anonymity for cached pages
1765 (Alice's request never
1766 leaves the Tor network), to improve speed, and to reduce bandwidth cost.
1767 On the other hand, forward security is weakened because caches
1768 constitute a record of retrieved files.  We must find the right
1769 balance between usability and security.
1770
1771 \emph{Better directory distribution:}
1772 Clients currently download a description of
1773 the entire network every 15 minutes. As the state grows larger
1774 and clients more numerous, we may need a solution in which
1775 clients receive incremental updates to directory state.
1776 More generally, we must find more
1777 scalable yet practical ways to distribute up-to-date snapshots of
1778 network status without introducing new attacks.
1779
1780 \emph{Implement location-hidden services:} The design in
1781 Appendix~\ref{sec:rendezvous-specifics} has not yet been implemented.
1782 While doing
1783 so we are likely to encounter additional issues that must be resolved,
1784 both in terms of usability and anonymity.
1785
1786 \emph{Further specification review:} Our public
1787 byte-level specification~\cite{tor-spec} needs
1788 external review.  We hope that as Tor
1789 is deployed, more people will examine its
1790 specification.
1791
1792 \emph{Multisystem interoperability:} We are currently working with the
1793 designer of MorphMix to unify the specification and implementation of
1794 the common elements of our two systems. So far, this seems
1795 to be relatively straightforward.  Interoperability will allow testing
1796 and direct comparison of the two designs for trust and scalability.
1797
1798 \emph{Wider-scale deployment:} The original goal of Tor was to
1799 gain experience in deploying an anonymizing overlay network, and
1800 learn from having actual users.  We are now at a point in design
1801 and development where we can start deploying a wider network.  Once
1802 we have many actual users, we will doubtlessly be better
1803 able to evaluate some of our design decisions, including our
1804 robustness/latency tradeoffs, our performance tradeoffs (including
1805 cell size), our abuse-prevention mechanisms, and
1806 our overall usability.
1807
1808 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1809
1810 %% commented out for anonymous submission
1811 \section*{Acknowledgments}
1812  We thank Peter Palfrader, Geoff Goodell, Adam Shostack, Joseph Sokol-Margolis,
1813    John Bashinski, and Zack Brown
1814    for editing and comments;
1815  Matej Pfajfar, Andrei Serjantov, Marc Rennhard for design discussions;
1816  Bram Cohen for congestion control discussions;
1817  Adam Back for suggesting telescoping circuits; and
1818  Cathy Meadows for formal analysis of the \emph{extend} protocol.
1819  This work has been supported by ONR and DARPA.
1820
1821 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1822
1823 \bibliographystyle{latex8}
1824 \bibliography{tor-design}
1825
1826 \newpage
1827 \appendix
1828
1829 \section{Rendezvous points and hidden services}
1830 \label{sec:rendezvous-specifics}
1831
1832 In this appendix we provide specifics about the rendezvous points
1833 of Section~\ref{subsec:rendezvous}. % We also describe integration
1834 %issues and related work.
1835 %, and how well the rendezvous point design
1836 %withstands various attacks.
1837 %    (Not any more; it's currently commented out. -NM)
1838 %
1839 %\SubSection{Rendezvous protocol overview}
1840 %
1841 The following steps are
1842 %We give an overview of the steps of a rendezvous. These are
1843 performed on behalf of Alice and Bob by their local OPs;
1844 application integration is described more fully below.
1845
1846 \begin{tightlist}
1847 \item Bob chooses some introduction points, and advertises them on
1848       the DHT.  He can add more later.
1849 \item Bob builds a circuit to each of his introduction points,
1850       and waits for requests.
1851 \item Alice learns about Bob's service out of band (perhaps Bob told her,
1852       or she found it on a website). She retrieves the details of Bob's
1853       service from the DHT.
1854 \item Alice chooses an OR as the rendezvous point (RP) for this
1855       transaction. She builds a circuit to the RP, and gives it a
1856       rendezvous cookie to recognize Bob.
1857 \item Alice opens an anonymous stream to one of Bob's introduction
1858       points, and gives it a message (encrypted to Bob's public key)
1859       telling it about herself,
1860       her RP and rendezvous cookie, and the
1861       start of a DH
1862       handshake. The introduction point sends the message to Bob.
1863 \item If Bob wants to talk to Alice, he builds a circuit to Alice's
1864       RP and sends the rendezvous cookie, the second half of the DH
1865       handshake, and a hash of the session
1866       key they now share. By the same argument as in
1867       Section~\ref{subsubsec:constructing-a-circuit}, Alice knows she
1868       shares the key only with Bob.
1869 \item The RP connects Alice's circuit to Bob's. Note that RP can't
1870       recognize Alice, Bob, or the data they transmit.
1871 \item Alice sends a \emph{relay begin} cell along the circuit. It
1872       arrives at Bob's OP, which connects to Bob's
1873       webserver.
1874 \item An anonymous stream has been established, and Alice and Bob
1875       communicate as normal.
1876 \end{tightlist}
1877
1878 When establishing an introduction point, Bob provides the onion router
1879 with a public ``introduction'' key. The hash of this public key
1880 identifies a unique service, and (since Bob signs his
1881 messages) prevents anybody else from usurping Bob's introduction point
1882 in the future. Bob uses the same public key to establish the other
1883 introduction points for his service, and periodically refreshes his
1884 entry in the DHT.
1885
1886 The message that Alice gives
1887 the introduction point includes a hash of Bob's public key % to identify
1888 %the service, along with
1889 and an optional initial authorization token (the
1890 introduction point can do prescreening, for example to block replays). Her
1891 message to Bob may include an end-to-end authorization token so Bob
1892 can choose whether to respond.
1893 The authorization tokens can be used to provide selective access:
1894 important users can get uninterrupted access.
1895 %important users get tokens to ensure uninterrupted access. %to the
1896 %service.
1897 During normal situations, Bob's service might simply be offered
1898 directly from mirrors, while Bob gives out tokens to high-priority users. If
1899 the mirrors are knocked down,
1900 %by distributed DoS attacks or even
1901 %physical attack,
1902 those users can switch to accessing Bob's service via
1903 the Tor rendezvous system.
1904
1905 Bob's introduction points are themselves subject to DoS---he must
1906 open many introduction points or risk such an attack.
1907 He can provide selected users with a current list or future schedule of
1908 unadvertised introduction points;
1909 this is most practical
1910 if there is a stable and large group of introduction points
1911 available. Bob could also give secret public keys
1912 for consulting the DHT\@. All of these approaches
1913 limit exposure even when
1914 some selected users collude in the DoS\@.
1915
1916 \subsection{Integration with user applications}
1917
1918 Bob configures his onion proxy to know the local IP address and port of his
1919 service, a strategy for authorizing clients, and a public key. Bob
1920 publishes the public key, an expiration time, and
1921 the current introduction points for his service into the DHT, indexed
1922 by the hash of the public key.  Bob's webserver is unmodified,
1923 and doesn't even know that it's hidden behind the Tor network.
1924
1925 Alice's applications also work unchanged---her client interface
1926 remains a SOCKS proxy. We encode all of the necessary information
1927 into the fully qualified domain name Alice uses when establishing her
1928 connection. Location-hidden services use a virtual top level domain
1929 called {\tt .onion}: thus hostnames take the form {\tt x.y.onion} where
1930 {\tt x} is the authorization cookie and {\tt y} encodes the hash of
1931 the public key. Alice's onion proxy
1932 examines addresses; if they're destined for a hidden server, it decodes
1933 the key and starts the rendezvous as described above.
1934
1935 \subsection{Previous rendezvous work}
1936
1937 Rendezvous points in low-latency anonymity systems were first
1938 described for use in ISDN telephony~\cite{jerichow-jsac98,isdn-mixes}.
1939 Later low-latency designs used rendezvous points for hiding location
1940 of mobile phones and low-power location
1941 trackers~\cite{federrath-ih96,reed-protocols97}.  Rendezvous for
1942 low-latency
1943 Internet connections was suggested in early Onion Routing
1944 work~\cite{or-ih96}, but the first published design was by Ian
1945 Goldberg~\cite{ian-thesis}. His design differs from
1946 ours in three ways. First, Goldberg suggests that Alice should manually
1947 hunt down a current location of the service via Gnutella; our approach
1948 makes lookup transparent to the user, as well as faster and more robust.
1949 Second, in Tor the client and server negotiate session keys
1950 via Diffie-Hellman, so plaintext is not exposed even at the rendezvous point. Third,
1951 our design minimizes the exposure from running the
1952 service, to encourage volunteers to offer introduction and rendezvous
1953 services. Tor's introduction points do not output any bytes to the
1954 clients; the rendezvous points don't know the client or the server,
1955 and can't read the data being transmitted. The indirection scheme is
1956 also designed to include authentication/authorization---if Alice doesn't
1957 include the right cookie with her request for service, Bob need not even
1958 acknowledge his existence.
1959
1960 %\SubSection{Attacks against rendezvous points}
1961 %
1962 %We describe here attacks against rendezvous points and how well
1963 %the system protects against them.
1964 %
1965 %\emph{Make many introduction requests.}  An attacker could
1966 %try to deny Bob service by flooding his introduction points with
1967 %requests.  Because the introduction points can block requests that
1968 %lack authorization tokens, however, Bob can restrict the volume of
1969 %requests he receives, or require a certain amount of computation for
1970 %every request he receives.
1971 %
1972 %\emph{Attack an introduction point.} An attacker could
1973 %disrupt a location-hidden service by disabling its introduction
1974 %points.  But because a service's identity is attached to its public
1975 %key, the service can simply re-advertise
1976 %itself at a different introduction point. Advertisements can also be
1977 %done secretly so that only high-priority clients know the address of
1978 %Bob's introduction points or so that different clients know of different
1979 %introduction points. This forces the attacker to disable all possible
1980 %introduction points.
1981 %
1982 %\emph{Compromise an introduction point.} An attacker who controls
1983 %Bob's introduction point can flood Bob with
1984 %introduction requests, or prevent valid introduction requests from
1985 %reaching him. Bob can notice a flood, and close the circuit.  To notice
1986 %blocking of valid requests, however, he should periodically test the
1987 %introduction point by sending rendezvous requests and making
1988 %sure he receives them.
1989 %
1990 %\emph{Compromise a rendezvous point.}  A rendezvous
1991 %point is no more sensitive than any other OR on
1992 %a circuit, since all data passing through the rendezvous is encrypted
1993 %with a session key shared by Alice and Bob.
1994
1995 \end{document}
1996
1997 % Style guide:
1998 %     U.S. spelling
1999 %     avoid contractions (it's, can't, etc.)
2000 %     prefer ``for example'' or ``such as'' to e.g.
2001 %     prefer ``that is'' to i.e.
2002 %     'mix', 'mixes' (as noun)
2003 %     'mix-net'
2004 %     'mix', 'mixing' (as verb)
2005 %     'middleman'  [Not with a hyphen; the hyphen has been optional
2006 %         since Middle English.]
2007 %     'nymserver'
2008 %     'Cypherpunk', 'Cypherpunks', 'Cypherpunk remailer'
2009 %     'Onion Routing design', 'onion router' [note capitalization]
2010 %     'SOCKS'
2011 %     Try not to use \cite as a noun.
2012 %     'Authorizating' sounds great, but it isn't a word.
2013 %     'First, second, third', not 'Firstly, secondly, thirdly'.
2014 %     'circuit', not 'channel'
2015 %     Typography: no space on either side of an em dash---ever.
2016 %     Hyphens are for multi-part words; en dashs imply movement or
2017 %        opposition (The Alice--Bob connection); and em dashes are
2018 %        for punctuation---like that.
2019 %     A relay cell; a control cell; a \emph{create} cell; a
2020 %     \emph{relay truncated} cell.  Never ``a \emph{relay truncated}.''
2021 %
2022 %     'Substitute ``Damn'' every time you're inclined to write ``very;'' your
2023 %     editor will delete it and the writing will be just as it should be.'
2024 %     -- Mark Twain