doc/design-paper/challenges.tex

   1 \documentclass{llncs}
   2
   3 \usepackage{url}
   4 \usepackage{amsmath}
   5 \usepackage{epsfig}
   6
   7 \newenvironment{tightlist}{\begin{list}{$\bullet$}{
   8   \setlength{\itemsep}{0mm}
   9     \setlength{\parsep}{0mm}
  10     %  \setlength{\labelsep}{0mm}
  11     %  \setlength{\labelwidth}{0mm}
  12     %  \setlength{\topsep}{0mm}
  13     }}{\end{list}}
  14
  15 \begin{document}
  16
  17 \title{Challenges in bringing low-latency stream anonymity to the masses (DRAFT)}
  18
  19 \author{Roger Dingledine and Nick Mathewson}
  20 \institute{The Free Haven Project\\
  21 \email{\{arma,nickm\}@freehaven.net}}
  22
  23 \maketitle
  24 \pagestyle{empty}
  25
  26 \begin{abstract}
  27 foo
  28 \end{abstract}
  29
  30 \section{Introduction}
  31
  32 Anonymous communication on the Internet today
  33
  34
  35 Tor is a low-latency anonymous communication overlay network
  36 \cite{tor-design}. We have been operating a publicly deployed Tor network
  37 since October 2003.
  38
  39 Tor aims to resist observers and insiders by distributing each transaction
  40 over several nodes in the network.  This ``distributed trust'' approach
  41 means the Tor network can be safely operated and used by a wide variety
  42 of mutually distrustful users, providing more sustainability and security
  43 than previous attempts at anonymizing networks.
  44
  45 The Tor network has a broad range of users, including ordinary citizens
  46 who want to avoid being profiled for targeted advertisements, corporations
  47 who don't want to reveal information to their competitors, and law
  48 enforcement and government intelligence agencies who need
  49 to do operations on the Internet without being noticed.
  50
  51 Tor has been funded by both the U.S. Navy, for use in securing government
  52 communications, and also the Electronic Frontier Foundation, for use in
  53 maintain civil liberties for ordinary citizens online.
  54 The Tor protocol is one of the leading choices
  55 to be the anonymizing layer in the European Union's PRIME directive to
  56 help maintain privacy in Europe. The University of Dresden in Germany
  57 has integrated an independent implementation of the Tor protocol into
  58 their popular Java Anon Proxy anonymizing client.  This wide variety of
  59 interests helps maintain both the stability and the security of the
  60 network.
  61
  62
  63
  64
  65 We deployed this thing called Tor. it's got all these different types of
  66 users. it's been backed by navy and eff, and prime and anonymizer looked at
  67 it. Because we're this cool, you should believe us when we tell you stuff.
  68
  69 In this paper we give the reader an understanding of Tor's context
  70 in the anonymity space and then we go on to describe the
  71 practical challenges that stand in the way of moving from a practical
  72 useful network to a practical useful anonymous network.
  73
  74 % The goal of the paper is to get the PET-audience reader up to speed
  75 % on all the issues we have with Tor, so he can, if he wants,
  76 % * understand the technical and policy and legal issues and why they're
  77 %   tricky in practice
  78 % * help us out with answering some of the technical decisions
  79 %   (and in writing it, we'll clarify our own opinions about them)
  80 % * help us out with answering some of the anonymity questions
  81
  82 \section{What Is Tor}
  83
  84 \subsection{Distributed trust: safety in numbers}
  85
  86 Tor provides \emph{forward privacy}, so that users can connect to
  87 Internet sites without revealing their logical or physical locations
  88 to those sites or to observers.  It also provides \emph{location-hidden
  89 services}, so that critical servers can support authorized users without
  90 giving adversaries an effective vector for physical or online attacks.
  91 Our design provides this protection even when a portion of its own
  92 infrastructure is controlled by an adversary.
  93
  94 To make private connections in Tor, users incrementally build a path or
  95 \emph{circuit} of encrypted connections through servers on the network,
  96 extending it one step at a time so that each server in the circuit only
  97 learns which server extended to it and which server it has been asked
  98 to extend to.  The client negotiates a separate set of encryption keys
  99 for each step along the circuit.
 100
 101 Once a circuit has been established, the client software waits for
 102 applications to request TCP connections, and directs these application
 103 streams along the circuit.  Many streams can be multiplexed along a single
 104 circuit, so applications don't need to wait for keys to be negotiated
 105 every time they open a connection.  Because each server sees no
 106 more than one end of the connection, a local eavesdropper or a compromised
 107 server cannot use traffic analysis to link the connection's source and
 108 destination.  The Tor client software rotates circuits periodically
 109 to prevent long-term linkability between different actions by a
 110 single user.
 111
 112 Tor differs from other deployed systems for traffic analysis resistance
 113 in its security and flexibility.  Mix networks such as Mixmaster or its
 114 successor Mixminion \cite{minion-design}
 115 gain the highest degrees of anonymity at the expense of introducing highly
 116 variable delays, thus making them unsuitable for applications such as web
 117 browsing that require quick response times.  Commercial single-hop proxies
 118 such as {\url{anonymizer.com}} present a single point of failure, where
 119 a single compromise can expose all users' traffic, and a single-point
 120 eavesdropper can perform traffic analysis on the entire network.
 121 Also, their proprietary implementations place any infrastucture that
 122 depends on these single-hop solutions at the mercy of their providers'
 123 financial health.  Tor can handle any TCP-based protocol, such as web
 124 browsing, instant messaging and chat, and secure shell login; and it is
 125 the only implemented anonymizing design with an integrated system for
 126 secure location-hidden services.
 127
 128 No organization can achieve this security on its own.  If a single
 129 corporation or government agency were to build a private network to
 130 protect its operations, any connections entering or leaving that network
 131 would be obviously linkable to the controlling organization.  The members
 132 and operations of that agency would be easier, not harder, to distinguish.
 133
 134 Instead, to protect our networks from traffic analysis, we must
 135 collaboratively blend the traffic from many organizations and private
 136 citizens, so that an eavesdropper can't tell which users are which,
 137 and who is looking for what information.  By bringing more users onto
 138 the network, all users become more secure \cite{econymics}.
 139
 140 Naturally, organizations will not want to depend on others for their
 141 security.  If most participating providers are reliable, Tor tolerates
 142 some hostile infiltration of the network.  For maximum protection,
 143 the Tor design includes an enclave approach that lets data be encrypted
 144 (and authenticated) end-to-end, so high-sensitivity users can be sure it
 145 hasn't been read or modified.  This even works for Internet services that
 146 don't have built-in encryption and authentication, such as unencrypted
 147 HTTP or chat, and it requires no modification of those services to do so.
 148
 149 weasel's graph of \# nodes and of bandwidth, ideally from week 0.
 150
 151 Tor has the following goals.
 152
 153 and we made these assumptions when trying to design the thing.
 154
 155 \section{Tor's position in the anonymity field}
 156
 157 There are many other classes of systems: single-hop proxies, open proxies,
 158 jap, mixminion, flash mixes, freenet, i2p, mute/ants/etc, tarzan,
 159 morphmix, freedom. Give brief descriptions and brief characterizations
 160 of how we differ. This is not the breakthrough stuff and we only have
 161 a page or two for it.
 162
 163
 164 \section{Crossroads}
 165
 166 Discuss each item that Tor hasn't solved yet that isn't just coding
 167 work.  Perhaps we'll have so many that we can pick out the best ones to
 168 discuss, so it's a bit less of a laundry list. Maybe they'll even fit
 169 into categories. The trick to making the paper good will be to find
 170 the right balance between going into depth and breadth of coverage.
 171
 172
 173 Peer-to-peer / practical issues:
 174
 175 Network discovery, sybil, node admission, scaling. It seems that the code
 176 will ship with something and that's our trust root. We could try to get
 177 people to build a web of trust, but no. Where we go from here depends
 178 on what threats we have in mind. Really decentralized if your threat is
 179 RIAA; less so if threat is to application data or individuals or...
 180
 181 Making use of servers with little bandwidth. How to handle hammering by
 182 certain applications.
 183
 184 Handling servers that are far away from the rest of the network, e.g. on
 185 the continents that aren't North America and Europe. High latency,
 186 often high packet loss.
 187
 188 Running Tor servers behind NATs, behind great-firewalls-of-China, etc.
 189 Restricted routes. How to propagate to everybody the topology? BGP
 190 style doesn't work because we don't want just *one* path. Point to
 191 Geoff's stuff.
 192
 193 Routing-zones. It seems that our threat model comes down to diversity and
 194 dispersal. But hard for Alice to know how to act. Many questions remain.
 195
 196 The China problem. We have lots of users in Iran and similar (we stopped
 197 logging, so it's hard to know now, but many Persian sites on how to use
 198 Tor), and they seem to be doing ok. But the China problem is bigger. Cite
 199 Stefan's paper, and talk about how we need to route through clients,
 200 and we maybe we should start with a time-release IP publishing system +
 201 advogato based reputation system, to bound the number of IPs leaked to the
 202 adversary.
 203
 204
 205 Policy issues:
 206
 207 Bittorrent and dmca. Should we add an IDS to autodetect protocols and
 208 snipe them? Takedowns and efnet abuse and wikipedia complaints and irc
 209 networks. Should we allow revocation of anonymity if a threshold of
 210 servers want to?
 211
 212 Image: substantial non-infringing uses. Image is a security parameter,
 213 since it impacts user base and perceived sustainability.
 214
 215 Sustainability. Previous attempts have been commercial which we think
 216 adds a lot of unnecessary complexity and accountability. Freedom didn't
 217 collect enough money to pay its servers; JAP bandwidth is supported by
 218 continued money, and they periodically ask what they will do when it
 219 dries up.
 220
 221 Logging. Making logs not revealing. A happy coincidence that verbose
 222 logging is our \#2 performance bottleneck. Is there a way to detect
 223 modified servers, or to have them volunteer the information that they're
 224 logging verbosely? Would that actually solve any attacks?
 225
 226
 227 Anonymity issues:
 228
 229 Transporting the stream vs transporting the packets.
 230
 231 The DNS problem in practice.
 232
 233 Applications that leak data. We can say they're not our problem, but
 234 they're somebody's problem.
 235
 236 How to measure performance without letting people selectively deny service
 237 by distinguishing pings. Heck, just how to measure performance at all. In
 238 practice people have funny firewalls that don't match up to their exit
 239 policies and Tor doesn't deal.
 240
 241 Mid-latency. Can we do traffic shape to get any defense against George's
 242 PET2004 paper? Will padding or long-range dummies do anything then? Will
 243 it kill the user base or can we get both approaches to play well together?
 244
 245 Does running a server help you or harm you? George's Oakland attack.
 246 Plausible deniability -- without even running your traffic through Tor! We
 247 have to pick the path length so adversary can't distinguish client from
 248 server (how many hops is good?).
 249
 250 When does fixing your entry or exit node help you?
 251 Helper nodes in the literature don't deal with churn, and
 252 especially active attacks to induce churn.
 253
 254 Survivable services are new in practice, yes? Hidden services seem
 255 less hidden than we'd like, since they stay in one place and get used
 256 a lot. They're the epitome of the need for helper nodes. This means
 257 that using Tor as a building block for Free Haven is going to be really
 258 hard. Also, they're brittle in terms of intersection and observation
 259 attacks. Would be nice to have hot-swap services, but hard to design.
 260
 261
 262 P2P + anonymity issues:
 263
 264 Incentives. Copy the page I wrote for the NSF proposal, and maybe extend
 265 it if we're feeling smart.
 266
 267 Usability: fc03 paper was great, except the lower latency you are the
 268 less useful it seems it is.
 269 A Tor gui, how jap's gui is nice but does not reflect the security
 270 they provide.
 271 Public perception, and thus advertising, is a security parameter.
 272
 273 Network investigation: Is all this bandwidth publishing thing a good idea?
 274 How can we collect stats better? Note weasel's smokeping, at
 275 http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
 276 which probably gives george and steven enough info to break tor?
 277
 278 Do general DoS attacks have anonymity implications? See e.g. Adam
 279 Back's IH paper, but I think there's more to be pointed out here.
 280
 281 % need to do somewhere in the paper:
 282
 283 have a serious discussion of morphmix's assumptions, since they would
 284 seem to be the direct competition. in fact tor is a flexible architecture
 285 that would encompass morphmix, and they're nearly identical except for
 286 path selection and node discovery. and the trust system morphmix has
 287 seems overkill (and/or insecure) based on the threat model we've picked.
 288
 289 need to discuss how we take the approach of building the thing, and then
 290 assuming that, how much anonymity can we get. we're not here to model or
 291 to simulate or to produce equations and formulae. but those have their
 292 roles too.
 293
 294
 295
 296
 297
 298 %%%
 299
 300
 301 TCP vs UDP
 302 argument 1: we need to do IP-level packet normalization, to block things like ip
 303 fingerprinting.
 304 argument 2: we still need to be easy to integrate with applications, so they can do
 305 application-level scrubbing.
 306 argument 3: we need a block-level encryption approach that can provide security despite
 307 packet loss and out-of-order delivery. i believe you that such a thing can be created,
 308 but no thing has yet been specified. so specify it for me if you want me to believe it.
 309 (freedom and cebolla are vulnerable to tagging and malleability attacks i believe.)
 310 argument 4: we still need to play with parameters for throughput, congestion control,
 311 etc -- since we need sequence numbers and maybe more to do replay detection,
 312 and just to handle duplicate frames. so we would be reimplementing some subset of tcp
 313 anyway.
 314 argument 5: tls over udp is not implemented or even specified.
 315 argument 6: exit policies over arbitrary IP packets seems to be an IDS-hard problem. i
 316 don't want to build an IDS into tor.
 317 argument 7: certain protocols are going to leak information at the IP layer anyway. for
 318 example, if we anonymizer your dns requests, but they still go to comcast's dns servers,
 319 that's bad.
 320 argument 8: hidden services, .exit addresses, etc are broken unless we have some way to
 321 reach into the application-level protocol and decide the hostname it's trying to get.
 322
 323 \bibliographystyle{plain} \bibliography{tor-design}
 324
 325 \end{document}
 326