doc/roadmaps/roadmap-2007.tex

   1 \documentclass{article}
   2
   3 \usepackage{url}
   4
   5 \newenvironment{tightlist}{\begin{list}{$\bullet$}{
   6   \setlength{\itemsep}{0mm}
   7     \setlength{\parsep}{0mm}
   8     %  \setlength{\labelsep}{0mm}
   9     %  \setlength{\labelwidth}{0mm}
  10     %  \setlength{\topsep}{0mm}
  11     }}{\end{list}}
  12 \newcommand{\tmp}[1]{{\bf #1} [......] \\}
  13 \newcommand{\plan}[1]{ {\bf (#1)}}
  14
  15 \begin{document}
  16
  17 \title{Tor Development Roadmap: Wishlist for Nov 2006--Dec 2007}
  18 \author{Roger Dingledine \and Nick Mathewson \and Shava Nerad}
  19
  20 \maketitle
  21 \pagestyle{plain}
  22
  23 % TO DO:
  24 %   add cites
  25 %   add time estimates
  26
  27
  28 \section{Introduction}
  29 %Hi, Roger!  Hi, Shava.  This paragraph should get deleted soon.  Right now,
  30 %this document goes into about as much detail as I'd like to go into for a
  31 %technical audience, since that's the audience I know best.  It doesn't have
  32 %time estimates everywhere.  It isn't well prioritized, and it doesn't
  33 %distinguish well between things that need lots of research and things that
  34 %don't.  The breakdowns don't all make sense.  There are lots of things where
  35 %I don't make it clear how they fit into larger goals, and lots of larger
  36 %goals that don't break down into little things. It isn't all stuff we can do
  37 %for sure, and it isn't even all stuff we can do for sure in 2007.  The
  38 %tmp\{\} macro indicates stuff I haven't said enough about.  That said, here
  39 %plangoes...
  40
  41 Tor (the software) and Tor (the overall software/network/support/document
  42 suite) are now experiencing all the crises of success.  Over the next year,
  43 we're probably going to grow more in terms of users, developers, and funding
  44 than before.  This gives us the opportunity to perform long-neglected
  45 maintenance tasks.
  46
  47 \section{Code and design infrastructure}
  48
  49 \subsection{Protocol revision}
  50 To maintain backward compatibility, we've postponed major protocol
  51 changes and redesigns for a long time.  Because of this, there are a number
  52 of sensible revisions we've been putting off until we could deploy several of
  53 them at once.  To do each of these, we first need to discuss design
  54 alternatives with other cryptographers and outside collaborators to
  55 make sure that our choices are secure.
  56
  57 First of all, our protocol needs better {\bf versioning support} so that we
  58 can make backward-incompatible changes to our core protocol.  There are
  59 difficult anonymity issues here, since many naive designs would make it easy
  60 to tell clients apart (and then track them) based on their supported versions.
  61
  62 With protocol versioning support would come the ability to {\bf future-proof
  63   our ciphersuites}.  For example, not only our OR protocol, but also our
  64 directory protocol, is pretty firmly tied to the SHA-1 hash function, which
  65 though not yet known to be insecure for our purposes, has begun to show
  66 its age.  We should
  67 remove assumptions throughout our design based on the assumption that public
  68 keys, secret keys, or digests will remain any particular size indefinitely.
  69
  70 Our OR {\bf authentication protocol}, though provably
  71 secure\cite{tap:pet2006}, relies more on particular aspects of RSA and our
  72 implementation thereof than we had initially believed.  To future-proof
  73 against changes, we should replace it with a less delicate approach.
  74
  75 \plan{For all the above: 2 person-months to specify, spread over several
  76   months with time for interaction with external participants.  One
  77   person-month to implement.  Start specifying in early 2007.}
  78
  79 We might design a {\bf stream migration} feature so that streams tunneled
  80 over Tor could be more resilient to dropped connections and changed IPs.
  81 \plan{Not in 2007.}
  82
  83 A new protocol could support {\bf multiple cell sizes}.  Right now, all data
  84 passes through the Tor network divided into 512-byte cells.  This is
  85 efficient for high-bandwidth protocols, but inefficient for protocols
  86 like SSH or AIM that send information in small chunks.  Of course, we need to
  87 investigate the extent to which multiple sizes could make it easier for an
  88 adversary to fingerprint a traffic pattern. \plan{Not in 2007.}
  89
  90 As a part of our design, we should investigate possible {\bf cipher modes}
  91 other than counter mode.  For example, a mode with built-in integrity
  92 checking, error propagation, and random access could simplify our protocol
  93 significantly.  Sadly, many of these are patented and unavailable for us.
  94 \plan{Not in 2007.}
  95
  96 \subsection{Scalability}
  97
  98 \subsubsection{Improved directory efficiency}
  99 Right now, clients download a statement of the {\bf network status} made by
 100 each directory authority.  We could reduce network bandwidth significantly by
 101 having the authorities jointly sign a statement reflecting their vote on the
 102 current network status.  This would save clients up to 160K per hour, and
 103 make their view of the network more uniform.  Of course, we'd need to make
 104 sure the voting process was secure and resilient to failures in the
 105 network.\plan{Must do; specify in 2006. 2 weeks to specify, 3-4 weeks to
 106   implement.}
 107
 108 We should {\bf shorten router descriptors}, since the current format includes
 109 a great deal of information that's only of interest to the directory
 110 authorities, and not of interest to clients.  We can do this by having each
 111 router upload a short-form and a long-form signed descriptor, and having
 112 clients download only the short form.  Even a naive version of this would
 113 save about 40\% of the bandwidth currently spent by clients downloading
 114 descriptors.\plan{Must do; specify in 2006. 3-4 weeks.}
 115
 116 We should {\bf have routers upload their descriptors even less often}, so
 117 that clients do not need to download replacements every 18 hours whether any
 118 information has changed or not.  (As of Tor 0.1.2.3-alpha, clients tolerate
 119 routers that don't upload often, but routers still upload at least every 18
 120 hours to support older clients.) \plan{Must do, but not until 0.1.1.x is
 121 deprecated in mid 2007. 1 week.}
 122
 123 \subsubsection{Non-clique topology}
 124 Our current network design achieves a certain amount of its anonymity by
 125 making clients act like each other through the simple expedient of making
 126 sure that all clients know all servers, and that any server can talk to any
 127 other server.  But as the number of servers increases to serve an
 128 ever-greater number of clients, these assumptions become impractical.
 129
 130 At worst, if these scalability issues become troubling before a solution is
 131 found, we can design and build a solution to {\bf split the network into
 132 multiple slices} until a better solution comes along.  This is not ideal,
 133 since rather than looking like all other users from a point of view of path
 134 selection, users would ``only'' look like 200,000--300,000 other
 135 users.\plan{Not unless needed.}
 136
 137 We are in the process of designing {\bf improved schemes for network
 138   scalability}.  Some approaches focus on limiting what an adversary can know
 139 about what a user knows; others focus on reducing the extent to which an
 140 adversary can exploit this knowledge.  These are currently in their infancy,
 141 and will probably not be needed in 2007, but they must be designed in 2007 if
 142 they are to be deployed in 2008.\plan{Design in 2007; unknown difficulty.
 143   Write a paper.}
 144
 145 \subsubsection{Relay incentives}
 146 To support more users on the network, we need to get more servers.  So far,
 147 we've relied on volunteerism to attract server operators, and so far it's
 148 served us well.  But in the long run, we need to {\bf design incentives for
 149   users to run servers} and relay traffic for others.  Most obviously, we
 150 could try to build the network so that servers offered improved service for
 151 other servers, but we would need to do so without weakening anonymity and
 152 making it obvious which connections originate from users running servers.  We
 153 have some preliminary designs~\cite{incentives-txt,tor-challenges},
 154 but need to perform
 155 some more research to make sure they would be safe and effective.\plan{Write
 156   a draft paper; 2 person-months.}
 157
 158 \subsection{Portability}
 159 Our {\bf Windows implementation}, though much improved, continues to lag
 160 behind Unix and Mac OS X, especially when running as a server.  We hope to
 161 merge promising patches from Mike Chiussi to address this point, and bring
 162 Windows performance on par with other platforms.\plan{Do in 2007; 1.5 months
 163   to integrate not counting Mike's work.}
 164
 165 We should have {\bf better support for portable devices}, including modes of
 166 operation that require less RAM, and that write to disk less frequently (to
 167 avoid wearing out flash RAM).\plan{Optional; 2 weeks.}
 168
 169 We should {\bf stop using socketpair on Windows}; instead, we can use
 170 in-memory structures to communicate between cpuworkers and the main thread,
 171 and between connections.\plan{Optional; 1 week.}
 172
 173 \subsection{Performance: resource usage}
 174 We've been working on {\bf using less RAM}, especially on servers.  This has
 175 paid off a lot for directory caches in the 0.1.2, which in some cases are
 176 using 90\% less memory than they used to require.  But we can do better,
 177 especially in the area around our buffer management algorithms, by using an
 178 approach more like the BSD and Linux kernels use instead of our current ring
 179 buffer approach.  (For OR connections, we can just use queues of cell-sized
 180 chunks produced with a specialized allocator.)  This could potentially save
 181 around 25 to 50\% of the memory currently allocated for network buffers, and
 182 make Tor a more attractive proposition for restricted-memory environments
 183 like old computers, mobile devices, and the like.\plan{Do in 2007; 2-3 weeks
 184   plus one week measurement.}
 185
 186 We should improve our {\bf bandwidth limiting}.  The current system has been
 187 crucial in making users willing to run servers: nobody is willing to run a
 188 server if it might use an unbounded amount of bandwidth, especially if they
 189 are charged for their usage.  We can make our system better by letting users
 190 configure bandwidth limits independently for their own traffic and traffic
 191 relayed for others; and by adding write limits for users running directory
 192 servers.\plan{Do in 2006; 2-3 weeks.}
 193
 194 On many hosts, sockets are still in short supply, and will be until we can
 195 migrate our protocol to UDP.  We can {\bf use fewer sockets} by making our
 196 self-to-self connections happen internally to the code rather than involving
 197 the operating system's socket implementation.\plan{Optional; 1 week.}
 198
 199 \subsection{Performance: network usage}
 200 We know too little about how well our current path
 201 selection algorithms actually spread traffic around the network in practice.
 202 We should {\bf research the efficacy of our traffic allocation} and either
 203 assure ourselves that it is close enough to optimal as to need no improvement
 204 (unlikely) or {\bf identify ways to improve network usage}, and get more
 205 users' traffic delivered faster.  Performing this research will require
 206 careful thought about anonymity implications.
 207
 208 We should also {\bf examine the efficacy of our congestion control
 209   algorithm}, and see whether we can improve client performance in the
 210 presence of a congested network through dynamic `sendme' window sizes or
 211 other means.  This will have anonymity implications too if we aren't careful.
 212
 213 \plan{For both of the above: research, design and write
 214   a measurement tool in 2007: 1 month.  See if we can interest a graduate
 215   student.}
 216
 217 We should work on making Tor's cell-based protocol  perform better on
 218 networks with low bandwidth
 219 and high packet loss.\plan{Do in 2007 if we're funded to do it; 4-6 weeks.}
 220
 221 \subsection{Performance scenario: one Tor client, many users}
 222 We should {\bf improve Tor's performance when a single Tor handles many
 223   clients}.  Many organizations want to manage a single Tor client on their
 224 firewall for many users, rather than having each user install a separate
 225 Tor client.  We haven't optimized for this scenario, and it is likely that
 226 there are some code paths in the current implementation that become
 227 inefficient when a single Tor is servicing hundreds or thousands of client
 228 connections.  (Additionally, it is likely that such clients have interesting
 229 anonymity requirements the we should investigate.)  We should profile Tor
 230 under appropriate loads, identify bottlenecks, and fix them.\plan{Do in 2007
 231   if we're funded to do it; 4-8 weeks.}
 232
 233 \subsection{Tor servers on asymmetric bandwidth}
 234
 235 Tor should work better on servers that have asymmetric connections like cable
 236 or DSL.  Because Tor has separate TCP connections between each
 237 hop, if the incoming bytes are arriving just fine and the outgoing bytes are
 238 all getting dropped on the floor, the TCP push-back mechanisms don't really
 239 transmit this information back to the incoming streams.\plan{Do in 2007 since
 240   related to bandwidth limiting.  3-4 weeks.}
 241
 242 \subsection{Running Tor as both client and server}
 243
 244 Many performance tradeoffs and balances that might need more attention.
 245 We first need to track and fix whatever bottlenecks emerge; but we also
 246 need to invent good algorithms for prioritizing the client's traffic
 247 without starving the server's traffic too much.\plan{No idea; try
 248 profiling and improving things in 2007.}
 249
 250 \subsection{Protocol redesign for UDP}
 251 Tor has relayed only TCP traffic since its first versions, and has used
 252 TLS-over-TCP to do so.  This approach has proved reliable and flexible, but
 253 in the long term we will need to allow UDP traffic on the network, and switch
 254 some or all of the network to using a UDP transport.  {\bf Supporting UDP
 255   traffic} will make Tor more suitable for protocols that require UDP, such
 256 as many VOIP protocols.  {\bf Using a UDP transport} could greatly reduce
 257 resource limitations on servers, and make the network far less interruptible
 258 by lossy connections.  Either of these protocol changes would require a great
 259 deal of design work, however.  We hope to be able to enlist the aid of a few
 260 talented graduate students to assist with the initial design and
 261 specification, but the actual implementation will require significant testing
 262 of different reliable transport approaches.\plan{Maybe do a design in 2007 if
 263 we find an interested academic.  Ian or Ben L might be good partners here.}
 264
 265 \section{Blocking resistance}
 266
 267 \subsection{Design for blocking resistance}
 268 We have written a design document explaining our general approach to blocking
 269 resistance.  We should workshop it with other experts in the field to get
 270 their ideas about how we can improve Tor's efficacy as an anti-censorship
 271 tool.
 272
 273 \subsection{Implementation: client-side and bridges-side}
 274
 275 Our anticensorship design calls for some nodes to act as ``bridges''
 276 that are outside a national firewall, and others inside the firewall to
 277 act as pure clients.  This part of the design is quite clear-cut; we're
 278 probably ready to begin implementing it.  To {\bf implement bridges}, we
 279 need to have servers publish themselves as limited-availability relays
 280 to a special bridge authority if they judge they'd make good servers.
 281 We will also need to help provide documentation for port forwarding,
 282 and an easy configuration tool for running as a bridge.
 283
 284 To {\bf implement clients}, we need to provide a flexible interface to
 285 learn about bridges and to act on knowledge of bridges. We also need
 286 to teach them how to know to use bridges as their first hop, and how to
 287 fetch directory information from both classes of directory authority.
 288
 289 Clients also need to {\bf use the encrypted directory variant} added in Tor
 290 0.1.2.3-alpha.  This will let them retrieve directory information over Tor
 291 once they've got their initial bridges. We may want to get the rest of the
 292 Tor user base to begin using this encrypted directory variant too, to
 293 provide cover.
 294
 295 Bridges will want to be able to {\bf listen on multiple addresses and ports}
 296 if they can, to give the adversary more ports to block.
 297
 298 \subsection{Research: anonymity implications from becoming a bridge}
 299
 300 \subsection{Implementation: bridge authority}
 301
 302 The design here is also reasonably clear-cut: we need to run some
 303 directory authorities with a slightly modified protocol that doesn't leak
 304 the entire list of bridges. Thus users can learn up-to-date information
 305 for bridges they already know about, but they can't learn about arbitrary
 306 new bridges.
 307
 308 \subsection{Normalizing the Tor protocol on the wire}
 309 Additionally, we should {\bf resist content-based filters}.  Though an
 310 adversary can't see what users are saying, some aspects of our protocol are
 311 easy to fingerprint {\em as} Tor.  We should correct this where possible.
 312
 313 Look like Firefox; or look like nothing?
 314 Future research: investigate timing similarities with other protocols.
 315
 316 \subsection{Access control for bridges}
 317 Design/impl: password-protecting bridges, in light of above.
 318 And/or more general access control.
 319
 320 \subsection{Research: scanning-resistance}
 321
 322 \subsection{Research/Design/Impl: how users discover bridges}
 323 Our design anticipates an arms race between discovery methods and censors.
 324 We need to begin the infrastructure on our side quickly, preferably in a
 325 flexible language like Python, so we can adapt quickly to censorship.
 326
 327 phase one: personal bridges
 328 phase two: families of personal bridges
 329 phase three: more structured social network
 330 phase four: bag of tricks
 331 Research: phase five...
 332
 333 Integration with Psiphon, etc?
 334
 335 \subsection{Document best practices for users}
 336 Document best practices for various activities common among
 337 blocked users (e.g. WordPress use).
 338
 339 \subsection{Research: how to know if a bridge has been blocked?}
 340
 341 \subsection{GeoIP maintenance, and "private" user statistics}
 342 How to know if the whole idea is working?
 343
 344 \subsection{Research: hiding whether the user is reading or publishing?}
 345
 346 \subsection{Research: how many bridges do you need to know to maintain
 347 reachability?}
 348
 349 \subsection{Resisting censorship of the Tor website, docs, and mirrors}
 350
 351 We should take some effort to consider {\bf initial distribution of Tor and
 352   related information} in countries where the Tor website and mirrors are
 353 censored.  (Right now, most countries that block access to Tor block only the
 354 main website and leave mirrors and the network itself untouched.)  Falling
 355 back on word-of-mouth is always a good last resort, but we should also take
 356 steps to make sure it's relatively easy for users to get ahold of a copy.
 357
 358 \section{Security}
 359
 360 \subsection{Security research projects}
 361
 362 We should investigate approaches with some promise to help Tor resist
 363 end-to-end traffic correlation attacks.  It's an open research question
 364 whether (and to what extent) {\bf mixed-latency} networks, {\bf low-volume
 365   long-distance padding}, or other approaches can resist these attacks, which
 366 are currently some of the most effective against careful Tor users.  We
 367 should research these questions and perform simulations to identify
 368 opportunities for strengthening our design without dropping performance to
 369 unacceptable levels. %Cite something
 370 \plan{Start doing this in 2007; write a paper.  8-16 weeks.}
 371
 372 We've got some preliminary results suggesting that {\bf a topology-aware
 373   routing algorithm}~\cite{feamster:wpes2004} could reduce Tor users'
 374 vulnerability against local or ISP-level adversaries, by ensuring that they
 375 are never in a position to watch both ends of a connection.  We need to
 376 examine the effects of this approach in more detail and consider side-effects
 377 on anonymity against other kinds of adversaries.  If the approach still looks
 378 promising, we should investigate ways for clients to implement it (or an
 379 approximation of it) without having to download routing tables for the whole
 380 Internet. \plan{Not in 2007 unless a graduate student wants to do it.}
 381
 382 %\tmp{defenses against end-to-end correlation}  We don't expect any to work
 383 %right now, but it would be useful to learn that one did.  Alternatively,
 384 %proving that one didn't would free up researchers in the field to go work on
 385 %other things.
 386 %
 387 % See above; I think I got this.
 388
 389 We should research the efficacy of {\bf website fingerprinting} attacks,
 390 wherein an adversary tries to match the distinctive traffic and timing
 391 pattern of the resources constituting a given website to the traffic pattern
 392 of a user's client.  These attacks work great in simulations, but in
 393 practice we hear they don't work nearly as well.  We should get some actual
 394 numbers to investigate the issue, and figure out what's going on.  If we
 395 resist these attacks, or can improve our design to resist them, we should.
 396 % add cites
 397 \plan{Possibly part of end-to-end correlation paper.  Otherwise, not in 2007
 398   unless a graduate student is interested.}
 399
 400 \subsection{Implementation security}
 401 Right now, each Tor node stores its keys unencrypted.  We should {\bf encrypt
 402   more Tor keys} so that Tor authorities can require a startup password.  We
 403 should look into adding intermediary medium-term ``signing keys'' between
 404 identity keys and onion keys, so that a password could be required to replace
 405 a signing key, but not to start Tor.  This would improve Tor's long-term
 406 security, especially in its directory authority infrastructure.\plan{Design this
 407   as a part of the revised ``v2.1'' directory protocol; implement it in
 408   2007. 3-4 weeks.}
 409
 410 We should also {\bf mark RAM that holds key material as non-swappable} so
 411 that there is no risk of recovering key material from a hard disk
 412 compromise.  This would require submitting patches upstream to OpenSSL, where
 413 support for marking memory as sensitive is currently in a very preliminary
 414 state.\plan{Nice to do, but not in immediate Tor scope.}
 415
 416 There are numerous tools for identifying trouble spots in code (such as
 417 Coverity or even VS2005's code analysis tool) and we should convince somebody
 418 to run some of them against the Tor codebase.  Ideally, we could figure out a
 419 way to get our code checked periodically rather than just once.\plan{Almost
 420   no time once we talk somebody into it.}
 421
 422 We should try {\bf protocol fuzzing} to identify errors in our
 423 implementation.\plan{Not in 2007 unless we find a grad student or
 424   undergraduate who wants to try.}
 425
 426 Our guard nodes help prevent an attacker from being able to become a chosen
 427 client's entry point by having each client choose a few favorite entry points
 428 as ``guards'' and stick to them.   We should implement a {\bf directory
 429   guards} feature to keep adversaries from enumerating Tor users by acting as
 430 a directory cache.\plan{Do in 2007; 2 weeks.}
 431
 432 \subsection{Detect corrupt exits and other servers}
 433 With the success of our network, we've attracted servers in many locations,
 434 operated by many kinds of people.  Unfortunately, some of these locations
 435 have compromised or defective networks, and some of these people are
 436 untrustworthy or incompetent.  Our current design relies on authority
 437 administrators to identify bad nodes and mark them as nonfunctioning.  We
 438 should {\bf automate the process of identifying malfunctioning nodes} as
 439 follows:
 440
 441 We should create a generic {\bf feedback mechanism for add-on tools} like
 442 Mike Perry's ``Snakes on a Tor'' to report failing nodes to authorities.
 443 \plan{Do in 2006; 1-2 weeks.}
 444
 445 We should write tools to {\bf detect more kinds of innocent node failure},
 446 such as nodes whose network providers intercept SSL, nodes whose network
 447 providers censor popular websites, and so on.  We should also try to detect
 448 {\bf routers that snoop traffic}; we could do this by launching connections
 449 to throwaway accounts, and seeing which accounts get used.\plan{Do in 2007;
 450   ask Mike Perry if he's interested.  4-6 weeks.}
 451
 452 We should add {\bf an efficient way for authorities to mark a set of servers
 453   as probably collaborating} though not necessarily otherwise dishonest.
 454 This happens when an administrator starts multiple routers, but doesn't mark
 455 them as belonging to the same family.\plan{Do during v2.1 directory protocol
 456   redesign; 1-2 weeks to implement.}
 457
 458 To avoid attacks where an adversary claims good performance in order to
 459 attract traffic, we should {\bf have authorities measure node performance}
 460 (including stability and bandwidth) themselves, and not simply believe what
 461 they're told.  Measuring stability can be done by tracking MTBF.  Measuring
 462 bandwidth can be tricky, since it's hard to distinguish between a server with
 463 low capacity, and a high-capacity server with most of its capacity in
 464 use.\plan{Do ``Stable'' in 2007; 2-3 weeks.  ``Fast'' will be harder; do it
 465   if we can interest a grad student.}
 466
 467 {\bf Operating a directory authority should be easier.}  We rely on authority
 468 operators to keep the network running well, but right now their job involves
 469 too much busywork and administrative overhead.  A better interface for them
 470 to use could free their time to work on exception cases rather than on
 471 adding named nodes to the network.\plan{Do in 2007; 4-5 weeks.}
 472
 473 \subsection{Protocol security}
 474
 475 In addition to other protocol changes discussed above,
 476 % And should we move some of them down here? -NM
 477 we should add {\bf hooks for denial-of-service resistance}; we have some
 478 preliminary designs, but we shouldn't postpone them until we really need them.
 479 If somebody tries a DDoS attack against the Tor network, we won't want to
 480 wait for all the servers and clients to upgrade to a new
 481 version.\plan{Research project; do this in 2007 if funded.}
 482
 483 \section{Development infrastructure}
 484
 485 \subsection{Build farm}
 486 We've begun to deploy a cross-platform distributed build farm of hosts
 487 that build and test the Tor source every time it changes in our development
 488 repository.
 489
 490 We need to {\bf get more participants}, so that we can test a larger variety
 491 of platforms.  (Previously, we've only found out when our code had broken on
 492 obscure platforms when somebody got around to building it.)
 493
 494 We need also to {\bf add our dependencies} to the build farm, so that we can
 495 ensure that libraries we need (especially libevent) do not stop working on
 496 any important platform between one release and the next.
 497
 498 \plan{This is ongoing as more buildbots arrive.}
 499
 500 \subsection{Improved testing harness}
 501 Currently, our {\bf unit tests} cover only about 20\% of the code base.  This
 502 is uncomfortably low; we should write more and switch to a more flexible
 503 testing framework.\plan{Ongoing basis, time permitting.}
 504
 505 We should also write flexible {\bf automated single-host deployment tests} so
 506 we can more easily verify that the current codebase works with the
 507 network.\plan{Worthwhile in 2007; would save lots of time.  2-4 weeks.}
 508
 509 We should build automated {\bf stress testing} frameworks so we can see which
 510 realistic loads cause Tor to perform badly, and regularly profile Tor against
 511 these loads.  This would give us {\it in vitro} performance values to
 512 supplement our deployment experience.\plan{Worthwhile in 2007; 2-6 weeks.}
 513
 514 We should improve our memory profiling code.\plan{...}
 515
 516
 517 \subsection{Centralized build system}
 518 We currently rely on a separate packager to maintain the packaging system and
 519 to build Tor on each platform for which we distribute binaries.  Separate
 520 package maintainers is sensible, but separate package builders has meant
 521 long turnaround times between source releases and package releases.  We
 522 should create the necessary infrastructure for us to produce binaries for all
 523 major packages within an hour or so of source release.\plan{We should
 524   brainstorm this at least in 2007.}
 525
 526 \subsection{Improved metrics}
 527 We need a way to {\bf measure the network's health, capacity, and degree of
 528   utilization}.  Our current means for doing this are ad hoc and not
 529 completely accurate
 530
 531 We need better ways to {\bf tell which countries are users are coming from,
 532   and how many there are}.  A good perspective of the network helps us
 533 allocate resources and identify trouble spots, but our current approaches
 534 will work less and less well as we make it harder for adversaries to
 535 enumerate users.  We'll probably want to shift to a smarter, statistical
 536 approach rather than our current ``count and extrapolate'' method.
 537
 538 \plan{All of this in 2007 if funded; 4-8 weeks}
 539
 540 % \tmp{We'd like to know how much of the network is getting used.}
 541 % I think this is covered above -NM
 542
 543 \subsection{Controller library}
 544 We've done lots of design and development on our controller interface, which
 545 allows UI applications and other tools to interact with Tor.  We could
 546 encourage the development of more such tools by releasing a {\bf
 547   general-purpose controller library}, ideally with API support for several
 548 popular programming languages.\plan{2006 or 2007; 1-2 weeks.}
 549
 550 \section{User experience}
 551
 552 \subsection{Get blocked less, get blocked less broadly}
 553 Right now, some services block connections from the Tor network because
 554 they don't have a better
 555 way to keep vandals from abusing them than blocking IP addresses associated
 556 with vandalism.  Our approach so far has been to educate them about better
 557 solutions that currently exist, but we should also {\bf create better
 558 solutions for limiting vandalism by anonymous users} like credential and
 559 blind-signature based implementations, and encourage their use. Other
 560 promising starting points including writing a patch and explanation for
 561 Wikipedia, and helping Freenode to document, maintain, and expand its
 562 current Tor-friendly position.\plan{Do a writeup here in 2007; 1-2 weeks.}
 563
 564 Those who do block Tor users also block overbroadly, sometimes blacklisting
 565 operators of Tor servers that do not permit exit to their services.  We could
 566 obviate innocent reasons for doing so by designing a {\bf narrowly-targeted Tor
 567   RBL service} so that those who wanted to overblock Tor could no longer
 568 plead incompetence.\plan{Possibly in 2007 if we decide it's a good idea; 3
 569   weeks.}
 570
 571 \subsection{All-in-one bundle}
 572 We need a well-tested, well-documented bundle of Tor and supporting
 573 applications configured to use it correctly.  We have an initial
 574 implementation well under way, but it will need additional work in
 575 identifying requisite Firefox extensions, identifying security threats,
 576 improving user experience, and so on.  This will need significantly more work
 577 before it's ready for a general public release.
 578
 579 \subsection{LiveCD Tor}
 580 We need a nice bootable livecd containing a minimal OS and a few applications
 581 configured to use it correctly.  The Anonym.OS project demonstrated that this
 582 is quite feasible, but their project is not currently maintained.
 583
 584 \subsection{A Tor client in a VM}
 585 \tmp{a.k.a JanusVM} which is quite related to the firewall-level deployment
 586 section below. JanusVM is a Linux kernel running in VMWare. It gets an IP
 587 address from the network, and serves as a DHCP server for its host Windows
 588 machine. It intercepts all outgoing traffic and redirects it into Privoxy,
 589 Tor, etc. This Linux-in-Windows approach may help us with scalability in
 590 the short term, and it may also be a good long-term solution rather than
 591 accepting all security risks in Windows.
 592
 593 %\subsection{Interface improvements}
 594 %\tmp{Allow controllers to manipulate server status.}
 595 % (Why is this in the User Experience section?) -RD
 596 % I think it's better left to a generic ``make controller iface better'' item.
 597
 598 \subsection{Firewall-level deployment}
 599 Another useful deployment mode for some users is using {\bf Tor in a firewall
 600   configuration}, and directing all their traffic through Tor.  This can be a
 601 little tricky to set up currently, but it's an effective way to make sure no
 602 traffic leaves the host un-anonymized.  To achieve this, we need to {\bf
 603   improve and port our new TransPort} feature which allows Tor to be used
 604 without SOCKS support; to {\bf add an anonymizing DNS proxy} feature to Tor;
 605 and to {\bf construct a recommended set of firewall configurations} to redirect
 606 traffic to Tor.
 607
 608 This is an area where {\bf deployment via a livecd}, or an installation
 609 targeted at specialized home routing hardware, could be useful.
 610
 611 \subsection{Assess software and configurations for anonymity risks}
 612 Right now, users and packagers are more or less on their own when selecting
 613 Firefox extensions.  We should {\bf assemble a recommended list of browser
 614   extensions} through experiment, and include this in the application bundles
 615 we distribute.
 616
 617 We should also describe {\bf best practices for using Tor with each class of
 618   application}. For example, Ethan Zuckerman has written a detailed
 619 tutorial on how to use Tor, Firefox, GMail, and Wordpress to blog with
 620 improved safety. There are many other cases on the Internet where anonymity
 621 would be helpful, and there are a lot of ways to screw up using Tor.
 622
 623 The Foxtor and Torbutton extensions serve similar purposes; we should pick a
 624 favorite, and merge in the useful features of the other.
 625
 626 %\tmp{clean up our own bundled software:
 627 %E.g. Merge the good features of Foxtor into Torbutton}
 628 %
 629 % What else did you have in mind? -NM
 630
 631 \subsection{Localization}
 632 Right now, most of our user-facing code is internationalized.  We need to
 633 internationalize the last few hold-outs (like the Tor expert installer), and get
 634 more translations for the parts that are already internationalized.
 635
 636 Also, we should look into a {\bf unified translator's solution}.  Currently,
 637 since different tools have been internationalized using the
 638 framework-appropriate method, different tools require translators to localize
 639 them via different interfaces.  Inasmuch as possible, we should make
 640 translators only need to use a single tool to translate the whole Tor suite.
 641
 642 \section{Support}
 643
 644 It would be nice to set up some {\bf user support infrastructure} and
 645 {\bf contributor support infrastructure}, especially focusing on server
 646 operators and on coordinating volunteers.
 647
 648 This includes intuitive and easy ticket systems for bug reports and
 649 feature suggestions (not just mailing lists with a half dozen people
 650 and no clear roles for who answers what), but it also includes a more
 651 personalized and efficient framework for interaction so we keep the
 652 attention and interest of the contributors, and so we make them feel
 653 helpful and wanted.
 654
 655 \section{Documentation}
 656
 657 \subsection{Unified documentation scheme}
 658
 659 We need to {\bf inventory our documentation.}  Our documentation so far has
 660 been mostly produced on an {\it ad hoc} basis, in response to particular
 661 needs and requests.  We should figure out what documentation we have, which of
 662 it (if any) should get priority, and whether we can't put it all into a
 663 single format.
 664
 665 We could {\bf unify the docs} into a single book-like thing.  This will also
 666 help us identify what sections of the ``book'' are missing.
 667
 668 \subsection{Missing technical documentation}
 669
 670 We should {\bf revise our design paper} to reflect the new decisions and
 671 research we've made since it was published in 2004.  This will help other
 672 researchers evaluate and suggest improvements to Tor's current design.
 673
 674 Other projects sometimes implement the client side of our protocol.  We
 675 encourage this, but we should write {\bf a document about how to avoid
 676 excessive resource use}, so we don't need to worry that they will do so
 677 without regard to the effect of their choices on server resources.
 678
 679 \subsection{Missing user documentation}
 680
 681 Our documentation falls into two broad categories: some is `discoursive' and
 682 explains in detail why users should take certain actions, and other
 683 documentation is `comprehensive' and describes all of Tor's features.  Right
 684 now, we have no document that is both deep, readable, and thorough.  We
 685 should correct this by identifying missing spots in our design.
 686
 687 \bibliographystyle{plain} \bibliography{tor-design}
 688
 689 \end{document}
 690