doc/HACKING/CircuitPaddingDevelopment.md

   1 # Circuit Padding Developer Documentation
   2
   3 This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor.
   4
   5 Written by Mike Perry and George Kadianakis.
   6
   7 # Table of Contents
   8
   9 - [0. Background](#0-background)
  10 - [1. Introduction](#1-introduction)
  11     - [1.1. System Overview](#11-system-overview)
  12     - [1.2. Layering Model](#12-layering-model)
  13     - [1.3. Computation Model](#13-computation-model)
  14     - [1.4. Deployment Constraints](#14-other-deployment-constraints)
  15 - [2. Creating New Padding Machines](#2-creating-new-padding-machines)
  16     - [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine)
  17     - [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown)
  18 - [3. Specifying Padding Machines](#3-specifying-padding-machines)
  19     - [3.1. Padding Machine States](#31-padding-machine-states)
  20     - [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions)
  21     - [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding)
  22     - [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts)
  23     - [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits)
  24 - [4. Evaluating Padding Machines](#4-evaluating-padding-machines)
  25     - [4.1. Pure Simulation](#41-pure-simulation)
  26     - [4.2. Testing in Chutney](#42-testing-in-chutney)
  27     - [4.3. Testing in Shadow](#43-testing-in-shadow)
  28     - [4.4. Testing on the Live Network](#44-testing-on-the-live-network)
  29 - [5. Example Padding Machines](#5-example-padding-machines)
  30     - [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines)
  31     - [5.2. Adaptive Padding Early](#52-adaptive-padding-early)
  32     - [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw)
  33     - [5.4. Other Padding Machines](#54-other-padding-machines)
  34 - [6. Framework Implementation Details](#6-framework-implementation-details)
  35     - [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions)
  36     - [6.2. Machine Application Events](#62-machine-application-events)
  37     - [6.3. Internal Machine Events](#63-internal-machine-events)
  38 - [7. Future Features and Optimizations](#7-future-features-and-optimizations)
  39     - [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control)
  40     - [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations)
  41     - [7.3. Better Machine Negotiation](#73-better-machine-negotiation)
  42     - [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions)
  43     - [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition)
  44 - [8. Open Research Problems](#8-open-research-problems)
  45     - [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup)
  46     - [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting)
  47     - [8.3. Open World Fingerprinting](#83-open-world-fingerprinting)
  48     - [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting)
  49     - [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels)
  50 - [9. Must Read Papers](#9-must-read-papers)
  51
  52
  53 ## 0. Background
  54
  55 Tor supports both connection-level and circuit-level padding, and both
  56 systems are live on the network today. The connection-level padding behavior
  57 is described in [section 2 of
  58 padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The
  59 circuit-level padding behavior is described in [section 3 of
  60 padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282).
  61
  62 These two systems are orthogonal and should not be confused. The
  63 connection-level padding system is only active while the TLS connection is
  64 otherwise idle. Moreover, it regards circuit-level padding as normal data
  65 traffic, and hence while the circuit-level padding system is actively padding,
  66 the connection-level padding system will not add any additional overhead.
  67
  68 While the currently deployed circuit-level padding behavior is quite simple,
  69 it is built on a flexible framework. This framework supports the description
  70 of event-driven finite state machine by filling in fields of a simple C
  71 structure, and is designed to support any delay-free statistically shaped
  72 cover traffic on individual circuits, with cover traffic flowing to and from a
  73 node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc).
  74
  75 This class of system was first proposed in
  76 [Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
  77 by Shmatikov and Wang, and extended for the website traffic fingerprinting
  78 domain by Juarez et al. in
  79 [Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The
  80 framework also supports fixed parameterized probability distributions, as
  81 used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias
  82 Pulls, and many other features.
  83
  84 This document describes how to use Tor's circuit padding framework to
  85 implement and deploy novel delay-free cover traffic defenses.
  86
  87 ## 1. Introduction
  88
  89 The circuit padding framework is the official way to implement padding
  90 defenses in Tor. It may be used in combination with application-layer
  91 defenses, and/or obfuscation defenses, or on its own.
  92
  93 Its current design should be enough to deploy most defenses without
  94 modification, but you can extend it to [provide new
  95 features](#7-future-features-and-optimizations) as well.
  96
  97 ### 1.1. System Overview
  98
  99 Circuit-level padding can occur between Tor clients and relays at any hop of
 100 one of the client's circuits. Both parties need to support the same padding
 101 mechanisms for the system to work, and the client must enable it.
 102
 103 We added a padding negotiation relay cell to the Tor protocol that clients use
 104 to ask a relay to start padding, as well as a torrc directive for researchers
 105 to pin their clients' relay selection to the subset of Tor nodes that
 106 implement their custom defenses, to support ethical live network testing and
 107 evaluation.
 108
 109 Circuit-level padding is performed by 'padding machines'. A padding machine is
 110 a finite state machine. Every state specifies a different form of
 111 padding style, or stage of padding, in terms of inter-packet timings and total
 112 packet counts.
 113
 114 Padding state machines are specified by filling in fields of a C structure,
 115 which specifies the transitions between padding states based on various events,
 116 probability distributions of inter-packet delays, and the conditions under
 117 which padding machines should be applied to circuits.
 118
 119 This compact C structure representation is designed to function as a
 120 microlanguage, which can be compiled down into a
 121 bitstring that [can be tuned](#13-computation-model) using various
 122 optimization methods (such as gradient descent, GAs, or GANs), either in
 123 bitstring form or C struct form.
 124
 125 The event driven, self-contained nature of this framework is also designed to
 126 make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously
 127 reproducible.
 128
 129 This document covers the engineering steps to write, test, and deploy a
 130 padding machine, as well as how to extend the framework to support new machine
 131 features.
 132
 133 If you prefer to learn by example, you may want to skip to either the
 134 [QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
 135 5](#5-example-padding-machines) for example machines to get you up and running
 136 quickly.
 137
 138 ### 1.2. Layering Model
 139
 140 The circuit padding framework is designed to provide one layer in a layered
 141 system of interchangeable components.
 142
 143 The circuit padding framework operates at the Tor circuit layer. It only deals
 144 with the inter-cell timings and quantity of cells sent on a circuit. It can
 145 insert cells on a circuit in arbitrary patterns, and in response to arbitrary
 146 conditions, but it cannot delay cells. It also does not deal with packet
 147 sizes, how cells are packed into TLS records, or ways that the Tor protocol
 148 might be recognized on the wire.
 149
 150 The problem of differentiating Tor traffic from non-Tor traffic based on
 151 TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the
 152 domain of [pluggable
 153 transports](https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/AChildsGardenOfPluggableTransports),
 154 which may optionally be used in conjunction with this framework (or without
 155 it).
 156
 157 This document focuses primarily on the circuit padding framework's cover
 158 traffic features, and will only briefly touch on the potential obfuscation and
 159 application layer coupling points of the framework. Explicit layer coupling
 160 points can be created by adding either new [machine application
 161 events](#62-machine-application-events) or new [internal machine
 162 events](#63-internal-machine-events) to the circuit padding framework, so that
 163 your padding machines can react to events from other layers.
 164
 165 ### 1.3. Computation Model
 166
 167 The circuit padding framework is designed to support succinctly specified
 168 defenses that can be tuned through [computer-assisted
 169 optimization](#4-evaluating-padding-machines).
 170
 171 We chose to generalize the original [Adaptive Padding 2-state
 172 design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an
 173 event-driven state machine because state machines are the simplest form of
 174 sequence recognition devices from [automata
 175 theory](https://en.wikipedia.org/wiki/Finite-state_machine).
 176
 177 Most importantly: this framing allows cover traffic defenses to be modeled as
 178 an optimization problem search space, expressed as fields of a C structure
 179 (which is simultaneously a compact opaque bitstring as well as a symbolic
 180 vector in an abstract feature space). This kind of space is particularly well
 181 suited to search by gradient descent, GAs, and GANs.
 182
 183 When performing this optimization search, each padding machine should have a
 184 fitness function, which will allow two padding machines to be compared for
 185 relative effectiveness. Optimization searches work best if this fitness can be
 186 represented as a single number, for example the total amount by which it
 187 reduces the [Balanced
 188 Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data)
 189 of an adversary's classifier, divided by an amount of traffic overhead.
 190
 191 Before you begin the optimization phase for your defense, you should
 192 also carefully consider the [features and
 193 optimizations](#7-future-features-and-optimizations) that we suspect will be
 194 useful, and also see if you can come up with any more. You should similarly be
 195 sure to restrict your search space to avoid areas of the bitstring/feature
 196 vector that you are sure you will not need. For example, some
 197 [applications](#8-open-research-problems) may not need the histogram
 198 accounting used by Adaptive Padding, but might need to add other forms of
 199 [pattern recognition](#75-more-complex-pattern-recognition) to react to
 200 sequences that resemble HTTP GET and HTTP POST.
 201
 202 ### 1.4. Other Deployment Constraints
 203
 204 The framework has some limitations that are the result of deliberate
 205 choices. We are unlikely to deploy defenses that ignore these limitations.
 206
 207 In particular, we have deliberately not provided any mechanism to delay actual
 208 user traffic, even though we are keenly aware that if we were to support
 209 additional delay, defenses would be able to have [more success with less
 210 bandwidth
 211 overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html).
 212
 213 In the website traffic fingerprinting domain, [provably optimal
 214 defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
 215 achieve their bandwidth overhead bounds by ensuring that a non-empty queue is
 216 maintained, by rate limiting traffic below the actual throughput of a circuit.
 217 For optimal results, this queue must avoid draining to empty, and yet it
 218 must also be drained fast enough to avoid tremendous queue overhead in fast
 219 Tor relays, which carry hundreds of thousands of circuits simultaneously.
 220
 221 Unfortunately, Tor's end-to-end flow control is not congestion control. Its
 222 window sizes are currently fixed. This means there is no signal when queuing
 223 occurs, and thus no ability to limit queue size through pushback. This means
 224 there is currently no way to do the fine-grained queue management necessary to
 225 create such a queue and rate limit traffic effectively enough to keep this
 226 queue from draining to empty, without also risking that aggregate queuing
 227 would cause out-of-memory conditions on fast relays.
 228
 229 It may be possible to create a congestion control algorithm that can support
 230 such fine grained queue management, but this is a [deeply unsolved area of
 231 research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
 232
 233 Even beyond these major technical hurdles, additional latency is also
 234 unappealing to the wider Internet community, for the simple reason that
 235 bandwidth [continues to increase
 236 exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html)
 237 where as the speed of light is fixed. Significant engineering effort has been
 238 devoted to optimizations that reduce the effect of latency on Internet
 239 protocols. To go against this trend would ensure our irrelevance to the wider
 240 conversation about traffic analysis defenses for low latency Internet protocols.
 241
 242 On the other hand, through [load
 243 balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt)
 244 and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we
 245 believe it is possible to add significant bandwidth overhead in the form of
 246 cover traffic, without significantly impacting end-user performance.
 247
 248 For these reasons, we believe the trade-off should be in favor of adding more
 249 cover traffic, rather than imposing queuing memory overhead and queuing delay.
 250
 251 As a last resort for narrowly scoped application domains (such as
 252 shaping Tor service-side onion service traffic to look like other websites or
 253 different application-layer protocols), delay *may* be added at the
 254 [application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf).
 255 Any additional cover traffic required by such defenses should still be
 256 added at the circuit padding layer using this framework, to provide
 257 engineering efficiency through loose layer coupling and component re-use, as
 258 well as to provide additional gains against [low
 259 resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47)
 260 end-to-end traffic correlation.
 261
 262 Because such delay-based defenses will impact performance significantly more
 263 than simply adding cover traffic, they must be optional, and negotiated by
 264 only specific application layer endpoints that want them. This will have
 265 consequences for anonymity sets and base rates, if such traffic shaping and
 266 additional cover traffic is not very carefully constructed.
 267
 268 In terms of acceptable overhead, because Tor onion services
 269 [currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html)
 270 less than 1% of the
 271 [total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html)
 272 of the Tor network, and because onion services exist to provide higher
 273 security as compared to Tor Exit traffic, they are an attractive target for
 274 higher-overhead defenses. We encourage researchers to target this use case
 275 for defenses that require more overhead, and/or for the deployment of
 276 optional negotiated application-layer delays on either the server or the
 277 client side.
 278
 279 ## 2. Creating New Padding Machines
 280
 281 This section explains how to use the existing mechanisms in Tor to define a
 282 new circuit padding machine.  We assume here that you know C, and are at
 283 least somewhat familiar with Tor development.  For more information on Tor
 284 development in general, see the other files in doc/HACKING/ in a recent Tor
 285 distribution.
 286
 287 Again, if you prefer to learn by example, you may want to skip to either the
 288 [QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
 289 5](#5-example-padding-machines) for example machines to get up and running
 290 quickly.
 291
 292 To create a new padding machine, you must:
 293
 294   1. Define your machine using the fields of a heap-allocated
 295      `circpad_machine_spec_t` C structure.
 296
 297   2. Register this object in the global list of available padding machines,
 298      using `circpad_register_padding_machine()`.
 299
 300   3. Ensure that your machine is properly negotiated under your desired
 301      circuit conditions.
 302
 303 ### 2.1. Registering a New Padding Machine
 304
 305 Again, a circuit padding machine is designed to be specified entirely as a [single
 306 C structure](#13-computation-model).
 307
 308 Your machine definitions should go into their own functions in
 309 [circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For
 310 details on all of the fields involved in specifying a padding machine, see
 311 [Section 3](#3-specifying-padding-machines).
 312
 313 You must register your machine in `circpad_machines_init()` in
 314 [circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To
 315 add a new padding machine specification, you must allocate a
 316 `circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a
 317 human readable name string, and a machine number equivalent to the number of
 318 machines in the list, and register the structure using
 319 `circpad_register_padding_machine()`.
 320
 321 Each machine must have a client instance and a relay instance. Register your
 322 client-side machine instance in the `origin_padding_machines` list, and your
 323 relay side machine instance in the `relay_padding_machines` list. Once you
 324 have registered your instance, you do not need to worry about deallocation;
 325 this is handled for you automatically.
 326
 327 Both machine lists use registration order to signal machine precedence for a
 328 given `machine_idx` slot on a circuit. This means that machines that are
 329 registered last are checked for activation *before* machines that are
 330 registered first. (This reverse precedence ordering allows us to
 331 deprecate older machines simply by adding new ones after them.)
 332
 333 ### 2.2. Machine Activation and Shutdown
 334
 335 After a machine has been successfully registered with the framework, it will
 336 be instantiated on any client-side circuits that support it. Only client-side
 337 circuits may initiate padding machines, but either clients or relays may shut
 338 down padding machines.
 339
 340 #### 2.2.1. Machine Application Conditions
 341
 342 The
 343 [circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641)
 344 of your `circpad_machine_spec_t` machine definition instance controls the
 345 conditions under which your machine will be attached and enabled on a Tor
 346 circuit, and when it gets shut down.
 347
 348 *All* of your explicitly specified conditions in
 349 `circpad_machine_spec_t.conditions` *must* be met for the machine to be
 350 applied to a circuit. If *any* condition ceases to be met, then the machine
 351 is shut down.  (This is checked on every event that arrives, even if the
 352 condition is unrelated to the event.)
 353 Another way to look at this is that
 354 all specified conditions must evaluate to true for the entire duration that
 355 your machine is running. If any are false, your machine does not run (or
 356 stops running and shuts down).
 357
 358 In particular, as part of the
 359 [circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149),
 360 the circuit padding subsystem gives the developer the option to enable a
 361 machine based on:
 362   - The
 363     [length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157)
 364     on the circuit (via the `min_hops` field).
 365   - The
 366     [current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174)
 367     of the circuit, such as streams, relay_early, etc. (via the
 368     `circpad_circuit_state_t state_mask` field).
 369   - The
 370     [purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178)
 371     (i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask`
 372     field).
 373
 374 This condition mechanism is the preferred way to determine if a machine should
 375 apply to a circuit. For information about potentially useful conditions that
 376 we have considered but have not yet implemented, see [Section
 377 7.3](#73-better-machine-negotiation). We will happily accept patches for those
 378 conditions, or any for other additional conditions that are needed for your
 379 use case.
 380
 381 #### 2.2.2. Detecting and Negotiating Machine Support
 382
 383 When a new machine specification is added to Tor (or removed from Tor), you
 384 should bump the Padding subprotocol version in `src/core/or/protover.c`, add a
 385 field to `protover_summary_flags_t` in `or.h`, and set this field in
 386 `memoize_protover_summary()` in versions.c. This new field must then be
 387 checked in `circpad_node_supports_padding()` in `circuitpadding.c`.
 388
 389 Note that this protocol version update and associated support check is not
 390 necessary if your experiments will *only* be using your own relays that
 391 support your own padding machines. This can be accomplished by using the
 392 `MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information.
 393
 394 If the protocol support check passes for the circuit, then the client sends a
 395 `RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the
 396 `circpad_machine_spec_t.target_hop` relay, and immediately enables the
 397 padding machine, and may begin sending padding. (The framework does not wait
 398 for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding,
 399 so that we can
 400 switch between machines rapidly.)
 401
 402 #### 2.2.3. Machine Shutdown Mechanisms
 403
 404 Padding machines can be shut down on a circuit in three main ways:
 405   1. During a `circpad_machine_event` callback, when
 406      `circpad_machine_spec_t.conditions` no longer applies (client side)
 407   2. After a transition to the CIRCPAD_STATE_END, if
 408      `circpad_machine_spec_t.should_negotiate_end` is set (client or relay
 409      side)
 410   3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the
 411      relay during negotiation.
 412
 413 Each of these cases causes the originating node to send a relay cell towards
 414 the other side, indicating that shutdown has occurred. The client side sends
 415 `RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends
 416 `RELAY_COMMAND_PADDING_NEGOTIATED`.
 417
 418 Because padding from malicious exit nodes can be used to construct active
 419 timing-based side channels to malicious guard nodes, the client checks that
 420 padding-related cells only come from relays with active padding machines.
 421 For this reason, when a client decides to shut down a padding machine,
 422 the framework frees the mutable `circuit_t.padding_info`, but leaves the
 423 `circuit_t.padding_machine` pointer set until the
 424 `RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any
 425 remaining in-flight padding packets are recognized a valid. Tor does
 426 not yet close circuits due to violation of this property, but the
 427 [vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem)
 428 does.
 429
 430 As an optimization, a client may replace a machine with another, by
 431 sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and
 432 immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine
 433 in the same index, without waiting for the response from the first negotiate
 434 cell.
 435
 436 Unfortunately, there is a known bug as a consequence of this optimization. If
 437 your machine depends on repeated shutdown and restart of the same machine
 438 number on the same circuit, please see [Bug
 439 30922](https://bugs.torproject.org/30992). Depending on your use case, we may
 440 need to fix that bug or help you find a workaround. See also [Section
 441 6.1.3](#613-deallocation-and-shutdown) for some more technical details on this
 442 mechanism.
 443
 444
 445 ## 3. Specifying Padding Machines
 446
 447 By now, you should understand how to register, negotiate, and control the
 448 lifetime of your padding machine, but you still don't know how to make it do
 449 anything yet. This section should help you understand how to specify how your
 450 machine reacts to events and adds padding to the wire.
 451
 452 If you prefer to learn by example first instead, you may wish to skip to
 453 [Section 5](#5-example-padding-machines).
 454
 455
 456 A padding machine is specified by filling in an instance of
 457 [circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances
 458 of this structure specify the precise functionality of a machine: it's
 459 what the circuit padding developer is called to write. These instances
 460 are created only at startup, and are referenced via `const` pointers during
 461 normal operation.
 462
 463 In this section we will go through the most important elements of this
 464 structure.
 465
 466 ### 3.1. Padding Machine States
 467
 468 A padding machine is a finite state machine where each state
 469 specifies a different style of padding.
 470
 471 As an example of a simple padding machine, you could have a state machine
 472 with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the
 473 `[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor,
 474 and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of
 475 course, padding machines can be more complicated than that, with dozens of
 476 states and non-trivial transitions.
 477
 478 Padding developers encode the machine states in the
 479 [circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each
 480 machine state is described by a
 481 [circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273)
 482 and each such structure specifies the style and amount of padding to be sent,
 483 as well as the possible state transitions.
 484
 485 The function `circpad_machine_states_init()` must be used for allocating and
 486 initializing the `circpad_machine_spec_t.states` array before states and
 487 state transitions can be defined, as some of the state object has non-zero
 488 default values.
 489
 490 ### 3.2. Padding Machine State Transitions
 491
 492 As described above, padding machines can have multiple states, to
 493 support different forms of padding. Machines can transition between
 494 states based on events that occur either on the circuit level or on
 495 the machine level.
 496
 497 State transitions are specified using the
 498 [next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381)
 499 of the `circpad_state_t` structure. As a simple example, to transition
 500 from state `A` to state `B` when event `E` occurs, you would use the
 501 following code: `A.next_state[E] = B`.
 502
 503 #### 3.2.1. State Transition Events
 504
 505 Here we will go through
 506 [the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30)
 507 that can be used to transition between states:
 508
 509 * Circuit-level events
 510   * `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received
 511   * `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent
 512   * `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent
 513   * `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received
 514 * Machine-level events
 515   * `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''.
 516   * `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens)
 517   * `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below)
 518
 519 ### 3.3. Specifying Per-State Padding
 520
 521 Each state of a padding machine specifies either:
 522   * A padding histogram describing inter-transmission delays between cells;
 523 d     OR
 524   * A parameterized delay probability distribution for inter-transmission
 525     delays between cells.
 526
 527 Either mechanism specifies essentially the *minimum inter-transmission time*
 528 distribution. If non-padding traffic does not get transmitted from this
 529 endpoint before the delay value sampled from this distribution expires, a
 530 padding packet is sent.
 531
 532 The choice between histograms and probability distributions can be subtle. A
 533 rule of thumb is that probability distributions are easy to specify and
 534 consume very little memory, but might not be able to describe certain types
 535 of complex padding logic. Histograms, in contrast, can support precise
 536 packet-count oriented or multimodal delay schemes, and can use token removal
 537 logic to reduce overhead and shape the total padding+non-padding inter-packet
 538 delay distribution towards an overall target distribution.
 539
 540 We suggest that you start with a probability distribution if possible, and
 541 you move to a histogram-based approach only if a probability distribution
 542 does not suit your needs.
 543
 544 #### 3.3.1. Padding Probability Distributions
 545
 546 The easiest, most compact way to schedule padding using a machine state is to
 547 use a probability distribution that specifies the possible delays. That can
 548 be done
 549 [using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339)
 550 `iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`.
 551
 552 The Tor circuit padding framework
 553 [supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214)
 554 of probability distributions, and the developer should use the
 555 [circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240)
 556 to specify them as well as the required parameters.
 557
 558 #### 3.3.2. Padding Histograms
 559
 560 A more advanced way to schedule padding is to use a ''padding
 561 histogram''. The main advantages of a histogram are that it allows you to
 562 specify distributions that are not easily parameterized in closed form, or
 563 require specific packet counts at particular time intervals. Histograms also
 564 allow you to make use of an optional traffic minimization and shaping
 565 optimization called *token removal*, which is central to the original
 566 [Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
 567 concept.
 568
 569 If a histogram is used by a state (as opposed to a fixed parameterized
 570 distribution), then the developer must use the
 571 [histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285)
 572 of the `circpad_state_t` structure.
 573
 574 The width of a histogram bin specifies the range of inter-packet delay times,
 575 whereas its height specifies the amount of tokens in each bin. To sample a
 576 padding delay from a histogram, we first randomly pick a bin (weighted by the
 577 amount of tokens in each bin) and then sample a delay from within that bin by
 578 picking a uniformly random delay using the width of the bin as the range.
 579
 580 Each histogram also has an ''infinity bin'' as its final bin.  If the
 581 ''infinity bin'' is chosen,
 582 we don't schedule any padding (i.e., we schedule padding with
 583 infinite delay). If the developer does not want infinite delay, they
 584 should not give any tokens to the ''infinity bin''.
 585
 586 If a token removal strategy is specified (via the
 587 `circpad_state_t.token_removal` field), each time padding is sent using a
 588 histogram, the padding machine will remove a token from the appropriate
 589 histogram bin whenever this endpoint sends *either a padding packet or a
 590 non-padding packet*. The different removal strategies govern what to do when
 591 the bin corresponding to the current inter-packet delay is empty.
 592
 593 Token removal is optional. It is useful if you want to do things like specify
 594 a burst should be at least N packets long, and you only want to add padding
 595 packets if there are not enough non-padding packets. The cost of doing token
 596 removal is additional memory allocations for making per-circuit copies of
 597 your histogram that can be modified.
 598
 599 ### 3.4. Specifying Precise Cell Counts
 600
 601 Padding machines should be able to specify the exact amount of padding they
 602 send. For histogram-based machines this can be done using a specific amount
 603 of tokens, but another (and perhaps easier) way to do this, is to use the
 604 [length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362)
 605 of the `circpad_state_t` structure.
 606
 607 The `length_dist` field is basically a probability distribution similar to the
 608 padding probability distributions, which applies to a specific machine state
 609 and specifies the amount of padding we are willing to send during that state.
 610 This value gets sampled when we transition to that state (TODO document this
 611 in the code).
 612
 613 ### 3.5. Specifying Overhead Limits
 614
 615 Separately from the length counts, it is possible to rate limit the overhead
 616 percentage of padding at both the global level across all machines, and on a
 617 per-machine basis.
 618
 619 At the global level, the overhead percentage of all circuit padding machines
 620 as compared to total traffic can be limited through the Tor consensus
 621 parameter `circpad_global_max_padding_pct`. This overhead is defined as the
 622 percentage of padding cells *sent* out of the sum of non padding and padding
 623 cells *sent*, and is applied *only after* at least
 624 `circpad_global_allowed_cells` padding cells are sent by that relay or client
 625 (to allow for small bursts of pure padding on otherwise idle or freshly
 626 restarted relays). When both of these limits are hit by a relay or client, no
 627 further padding cells will be sent, until sufficient non-padding traffic is
 628 sent to cause the percentage of padding traffic to fall back below the
 629 threshold.
 630
 631 Additionally, each individual padding machine can rate limit itself by
 632 filling in the fields `circpad_machine_spec_t.max_padding_percent` and
 633 `circpad_machine_spec_t.allowed_padding_count`, which behave identically to
 634 the consensus parameters, but only apply to that specific machine.
 635
 636 ## 4. Evaluating Padding Machines
 637
 638 One of the goals of the circuit padding framework is to provide improved
 639 evaluation and scientific reproducibility for lower cost. This includes both
 640 the [choice](#13-computation-model) of the compact C structure representation
 641 (which has an easy-to-produce bitstring representation for optimization by
 642 gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation.
 643
 644 So far, whenever evaluation cost has been a barrier, each research group has
 645 developed their own ad-hoc packet-level simulators of various padding
 646 mechanisms for evaluating website fingerprinting attacks and defenses. The
 647 process typically involves doing a crawl of Alexa top sites over Tor, and
 648 recording the Tor cell count and timing information for each page in the
 649 trace. These traces are then fed to simulations of defenses, which output
 650 modified trace files.
 651
 652 Because no standardized simulation and evaluation mechanism exists, it is
 653 often hard to tell if independent implementations of various attacks and
 654 defenses are in fact true-to-form or even properly calibrated for direct
 655 comparison, and discrepancies in results across the literature suggests
 656 this is not always so.
 657
 658 Our preferred outcome with this framework is that machines are tuned
 659 and optimized on a tracing simulator, but that the final results come from
 660 an actual live network test of the defense. The traces from this final crawl
 661 should be preserved as artifacts to be run on the simulator and reproduced
 662 on the live network by future papers, ideally in journal venues that have an
 663 artifact preservation policy.
 664
 665 ### 4.1. Pure Simulation
 666
 667 When doing initial tuning of padding machines, especially in adversarial
 668 settings, variations of a padding machine defense may need to be applied to
 669 network activity hundreds or even millions of times. The wall-clock time
 670 required to do this kind of tuning using live testing or even Shadow network
 671 emulation may often be prohibitive.
 672
 673 To help address this, and to better standardize results, Tobias Pulls has
 674 implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim),
 675 which uses Tor's unit test framework to simulate applying padding machines to
 676 circuit packet traces via a combination of Tor patches and python scripts. This
 677 simulator can be used to record traces from clients, Guards, Middles, Exits,
 678 and any other hop in the path, only for circuits that are run by the
 679 researcher. This makes it possible to safely record baseline traces and
 680 ultimately even mount passive attacks on the live network, without impacting
 681 or recording any normal user traffic.
 682
 683 In this way, a live crawl of the Alexa top sites could be performed once, to
 684 produce a standard "undefended" corpus. Padding machines can be then quickly
 685 evaluated and tuned on these simulated traces in a standardized way, and then
 686 the results can then be [reproduced on the live Tor
 687 network](#44-Testing-on-the-Live-Network) with the machines running on your own relays.
 688
 689 Please be mindful of the Limitations section of the simulator documentation,
 690 however, to ensure that you are aware of the edge cases and timing
 691 approximations that are introduced by this approach.
 692
 693 ### 4.2. Testing in Chutney
 694
 695 The Tor Project provides a tool called
 696 [Chutney](https://github.com/torproject/chutney/) which makes it very easy to
 697 setup private Tor networks. While getting it work for the first time might
 698 take you some time of doc reading, the final result is well worth it for the
 699 following reasons:
 700
 701 - You control all the relays and hence you have greater control and debugging
 702   capabilities.
 703 - You control all the relays and hence you can toggle padding support on/off
 704   at will.
 705 - You don't need to be cautious about overhead or damaging the real Tor
 706   network during testing.
 707 - You don't even need to be online; you can do all your testing offline over
 708   localhost.
 709
 710 A final word of warning here is that since Chutney runs over localhost, the
 711 packet latencies and delays are completely different from the real Tor
 712 network, so if your padding machines rely on real network timings you will
 713 get different results on Chutney. You can work around this by using a
 714 different set of delays if Chutney is used, or by moving your padding
 715 machines to the real network when you want to do latency-related testing.
 716
 717 ### 4.3. Testing in Shadow
 718
 719 [Shadow](https://shadow.github.io/) is an environment for running entire Tor
 720 network simulations, similar to Chutney, but designed to be both more memory
 721 efficient, as well as provide an accurate Tor network topology and latency
 722 model.
 723
 724 While Shadow is significantly more memory efficient than Chutney, and can make
 725 use of extremely accurate Tor network capacity and latency models, it will not
 726 be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim),
 727 if you need to do many many iterations of an experiment to tune your defense.
 728
 729 ### 4.4. Testing on the Live Network
 730
 731 Live network testing is the gold standard for verifying that any attack or
 732 defense is behaving as expected, to minimize the influence of simplifying
 733 assumptions.
 734
 735 However, it is not ethical, or necessarily possible, to run high-resolution
 736 traffic analysis attacks on the entire Tor network. But it is both ethical
 737 and possible to run small scale experiments that target only your own
 738 clients, who will only use your own Tor relays that support your new padding
 739 machines.
 740
 741 We provide the `MiddleNodes` torrc directive to enable this, which will allow
 742 you to specify the fingerprints and/or IP netmasks of relays to be used in
 743 the second hop position. Options to restrict other hops also exist, if your
 744 padding system is padding to a different hop. The `HSLayer2Nodes` option
 745 overrides the `MiddleNodes` option for onion service circuits, if both are
 746 set. (The
 747 [vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md)
 748 will set `HSLayer2Nodes`.)
 749
 750 When you run your own clients, and use MiddleNodes to restrict your clients
 751 to use your relays, you can perform live network evaluations of a defense
 752 applied to whatever traffic crawl or activity your clients do.
 753
 754 ## 5. Example Padding Machines
 755
 756 ### 5.1. Deployed Circuit Setup Machines
 757
 758 Tor currently has two padding machines enabled by default, which aim to hide
 759 certain features of the client-side onion service circuit setup protocol. For
 760 more details on their precise goal and function, please see
 761 [proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt)
 762 . In this section we will go over the code of those machines to clarify some
 763 of the engineering parts:
 764
 765 #### 5.1.1. Overview
 766
 767 The codebase of proposal 302 can be found in
 768 [circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c)
 769 and specifies four padding machines:
 770
 771 - The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine.
 772 - The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine.
 773 - The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine
 774 - The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine.
 775
 776 Each of those machines has its own setup function, and
 777 [they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718)
 778 by the circuit padding framework. To understand more about them, please
 779 carefully read the individual setup function for each machine which are
 780 fairly well documented. Each function goes through the following steps:
 781 - Machine initialization
 782   - Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70)
 783   - Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to
 784   - Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side.
 785 - Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply
 786 - Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding.
 787 - Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors.
 788 - Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120)
 789    - Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123).
 790 - Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list
 791
 792 ### 5.2. Adaptive Padding Early
 793
 794 [Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/)
 795 is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token
 796 removal to shift padding distributions, but instead uses fixed parameterized
 797 distributions to specify inter-packet timing thresholds for burst and gap
 798 inter-packet delays.
 799
 800 Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how
 801 to get this machine up and running, and has links to branches with a working
 802 implementation.
 803
 804 ### 5.3. Sketch of Tamaraw
 805
 806 The [Tamaraw defense
 807 paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the
 808 only defense to date that provides a proof of optimality for the finite-length
 809 website traffic fingerprinting domain. These bounds assume that a defense is
 810 able to perform a full, arbitrary transform of a trace that is under a fixed
 811 number of packets in length.
 812
 813 The key insight to understand Tamaraw's optimality is that it achieves one
 814 such optimal transform by delaying traffic below a circuit's throughput. By
 815 doing this, it creates a queue that is rarely empty, allowing it to produce
 816 a provably optimal transform with minimal overhead. As [Section
 817 1.4](#14-other-deployment-constraints) explains, this queue
 818 cannot be maintained on the live Tor network without risk of out-of-memory
 819 conditions at relays.
 820
 821 However, if the queue is not maintained in the Tor network, but instead by the
 822 application layer, it could be deployed by websites that opt in to using it.
 823
 824 In this case, the application layer component would do *optional* constant
 825 rate shaping, negotiated between a web browser and a website. The Circuit
 826 Padding Framework can then easily fill in any missing gaps of cover traffic
 827 packets, and also ensure that only a fixed length number of packets are sent
 828 in total.
 829
 830 However, for such a defense to be safe, additional care must be taken to
 831 ensure that the resulting traffic pattern still has a large
 832 anonymity/confusion set with other traces on the live network.
 833
 834 Accomplishing this is an unsolved problem.
 835
 836 ### 5.4. Other Padding Machines
 837
 838 Our partners in this project at RIT have produced a couple prototypes, based
 839 on their published research designs
 840 [REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS).
 841
 842 As [their writeup
 843 explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB
 844 uses delay, the circuit padding machine that they made is a no-delay version.
 845
 846 They also ran into an issue with the 0-delay timing workaround for [bug
 847 31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates
 848 with improved workarounds/fixes.
 849
 850 Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine).
 851
 852
 853 ## 6. Framework Implementation Details
 854
 855 If you need to add additional events, conditions, or other features to the
 856 circuit padding framework, then this section is for you.
 857
 858 ### 6.1. Memory Allocation Conventions
 859
 860 If the existing circuit padding features are sufficient for your needs, then
 861 you do not need to worry about memory management or pointer lifespans. The
 862 circuit padding framework should take care of this for you automatically.
 863
 864 However, if you need to add new padding machine features to support your
 865 padding machines, then it will be helpful to understand how circuits
 866 correspond to the global machine definitions, and how mutable padding machine
 867 memory is managed.
 868
 869 #### 6.1.1. Circuits and Padding Machines
 870
 871 In Tor, the
 872 [circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h)
 873 is the superclass structure for circuit-related state that is used on both
 874 clients and relays. On clients, the actual datatype of the object pointed to
 875 by `circuit_t *` is the subclass structure
 876 [origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The
 877 macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine
 878 if a circuit is a client-side (origin) circuit and to cast the pointer safely
 879 to `origin_circuit_t *`.
 880
 881 Because circuit padding machines can be present at both clients and relays,
 882 the circuit padding fields are stored in the `circuit_t *` superclass
 883 structure. Notice that there are actually two sets of circuit padding fields:
 884 a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *`
 885 array. Each of these arrays holds at most two elements, as there can be at
 886 most two padding machines on each circuit.
 887
 888 The `const circpad_machine_spec_t *` points to a globally allocated machine
 889 specification. These machine specifications are
 890 allocated and set up during Tor program startup, in `circpad_machines_init()`
 891 in
 892 [circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because
 893 the machine specification object is shared by all circuits, it must not be
 894 modified or freed until program exit (by `circpad_machines_free()`). The
 895 `const` qualifier should enforce this at compile time.
 896
 897 The `circpad_machine_runtime_t *` array member points to the mutable runtime
 898 information for machine specification at that same array index. This runtime
 899 structure keeps track of the current machine state, packet counts, and other
 900 information that must be updated as the machine operates. When a padding
 901 machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates
 902 the associated runtime information.
 903
 904 #### 6.1.2. Histogram Management
 905
 906 If a `circpad_state_t` of a machine specifies a `token_removal` strategy
 907 other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time
 908 there is a state transition into this state, `circpad_machine_setup_tokens()`
 909 will copy the read-only `circpad_state_t.histogram` array into a mutable
 910 version at `circpad_machine_runtime_t.histogram`. This mutable copy is used
 911 to decrement the histogram bin accounts as packets are sent, as per the
 912 specified token removal strategy.
 913
 914 When the state machine transitions out of this state, the mutable histogram copy is freed
 915 by this same `circpad_machine_setup_tokens()` function.
 916
 917 #### 6.1.3. Deallocation and Shutdown
 918
 919 As an optimization, padding machines can be swapped in and out by the client
 920 without waiting a full round trip for the relay machine to shut down.
 921
 922 Internally, this is accomplished by immediately freeing the heap-allocated
 923 `circuit_t.padding_info` field corresponding to that machine, but still preserving the
 924 `circuit_t.padding_machine` pointer to the global padding machine
 925 specification until the response comes back from the relay. Once the response
 926 comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the
 927 response machine number matches the current machine present.
 928
 929 Because of this partial shutdown condition, we have two macros for iterating
 930 over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate
 931 over machines that have both a `circuit_t.padding_info` slot and a
 932 `circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()`
 933 is used when we need to iterate over all machines that are either active or
 934 are simply waiting for a response to a shutdown request.
 935
 936 If the machine is replaced instead of just shut down, then the client frees
 937 the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine`
 938 and `circuit_t.padding_info` fields for this next machine immediately. This is
 939 done in `circpad_add_matching_machines()`. In this case, since the new machine
 940 should have a different machine number, the shut down response from the relay
 941 is silently discarded, since it will not match the new machine number.
 942
 943 If this sequence of machine teardown and spin-up happens rapidly enough for
 944 the same machine number (as opposed to different machines), then a race
 945 condition can happen. This is
 946 [known bug #30992](https://bugs.torproject.org/30992).
 947
 948 When the relay side decides to shut down a machine, it sends a
 949 `RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the
 950 current machine number on the client, that machine is torn down, by freeing
 951 the `circuit_t.padding_info` slot and immediately setting
 952 `circuit_t.padding_machine` slot to NULL.
 953
 954 Additionally, if Tor decides to close a circuit forcibly due to error before
 955 the padding machine is shut down, then `circuit_t.padding_info` is still
 956 properly freed by the call to `circpad_circuit_free_all_machineinfos()`
 957 in `circuit_free_()`.
 958
 959 ### 6.2. Machine Application Events
 960
 961 The framework checks client-side origin circuits to see if padding machines
 962 should be activated or terminated during specific event callbacks in
 963 `circuitpadding.c`. We list these event callbacks here only for reference. You
 964 should not modify any of these callbacks to get your machine to run; instead,
 965 you should use the `circpad_machine_spec_t.conditions` field.
 966
 967 However, you may add new event callbacks if you need other activation events,
 968 for example to provide obfuscation-layer or application-layer signaling. Any
 969 new event callbacks should behave exactly like the existing callbacks.
 970
 971 During each of these event callbacks, the framework checks to see if any
 972 current running padding machines have conditions that no longer apply as a
 973 result of the event, and shuts those machines down. Then, it checks to see if
 974 any new padding machines should be activated as a result of the event, based
 975 on their circuit application conditions. **Remember: Machines are checked in
 976 reverse order in the machine list. This means that later, more recently added
 977 machines take precedence over older, earlier entries in each list.**
 978
 979 Both of these checks are performed using the machine application conditions
 980 that you specify in your machine's `circpad_machine_spec_t.conditions` field.
 981
 982 The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are:
 983
 984   - `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is
 985     added to a circuit.
 986   - `circpad_machine_event_circ_built()`: Called when a circuit has completed
 987     construction and is
 988     opened. <!-- open != ready for traffic. Which do we mean? -nickm -->
 989   - `circpad_machine_event_circ_purpose_changed()`: Called when a circuit
 990     changes purpose.
 991   - `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit
 992     runs out of `RELAY_EARLY` cells.
 993   - `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a
 994     stream attached.
 995   - `circpad_machine_event_circ_has_no_streams()`: Called when the last
 996     stream is detached from a circuit.
 997
 998 ### 6.3. Internal Machine Events
 999
1000 To provide for some additional capabilities beyond simple finite state machine
1001 behavior, the circuit padding machines also have internal events that they
1002 emit to themselves when packet count length limits are hit, when the Infinity
1003 bin is sampled, and when the histogram bins are emptied of all tokens.
1004
1005 These events are implemented as `circpad_internal_event_*` functions in
1006 `circuitpadding.c`, which are called from various areas that determine when
1007 the events should occur.
1008
1009 While the conditions that trigger these internal events to be called may be
1010 complex, they are processed by the state machine definitions in a nearly
1011 identical manner as the cell processing events, with the exception that they
1012 are sent to the current machine only, rather than all machines on the circuit.
1013
1014
1015 ## 7. Future Features and Optimizations
1016
1017 While implementing the circuit padding framework, our goal was to deploy a
1018 system that obscured client-side onion service circuit setup and supported
1019 deployment of WTF-PAD and/or APE. Along the way, we noticed several features
1020 that might prove useful to others, but were not essential to implement
1021 immediately. We do not have immediate plans to implement these ideas, but we
1022 would gladly accept patches that do so.
1023
1024 The following list gives an overview of these improvements, but as this
1025 document ages, it may become stale. The canonical list of improvements that
1026 researchers may find useful is labeled in our bugtracker with
1027 [Padding Research](https://gitlab.torproject.org/tpo/core/tor/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Padding%20Research),
1028 and the list of improvements that are known to be necessary for some research
1029 areas are labeled with
1030 [Padding Research Requires](https://gitlab.torproject.org/tpo/core/tor/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Padding%20Research%20Requires).
1031
1032 Please consult those lists for the latest status of these issues. Note that
1033 not all fixes will be backported to all Tor versions, so be mindful of which
1034 Tor releases receive which fixes as you conduct your experiments.
1035
1036 ### 7.1. Load Balancing and Flow Control
1037
1038 Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit
1039 capacity, and we anticipate that if we inform our relay operator community of
1040 the need for non-Exit bandwidth to satisfy padding overhead requirements,
1041 they will be able to provide that with relative ease.
1042
1043 Unfortunately, padding machines that have large quantities of overhead will
1044 require changes to our load balancing system to account for this
1045 overhead. The necessary changes are documented in
1046 [Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt).
1047
1048 Additionally, padding cells are not currently subjected to flow control. For
1049 high amounts of padding, we may want to change this. See [ticket
1050 31782](https://bugs.torproject.org/31782) for details.
1051
1052 ### 7.2. Timing and Queuing Optimizations
1053
1054 The circuitpadding framework has some timing related issues that may impact
1055 results. If high-resolution timestamps are fed to opaque deep learning
1056 trainers, those training models may end up able to differentiate padding
1057 traffic from non-padding traffic due to these timing bugs.
1058
1059 The circuit padding cell event callbacks come from post-decryption points in
1060 the cell processing codepath, and from the pre-queue points in the cell send
1061 codepath. This means that our cell events will not reflect the actual time
1062 when packets are read or sent on the wire. This is much worse in the send
1063 direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will
1064 impose significant additional delay between when we currently report that a
1065 packet was sent, and when it actually hits the wire.
1066
1067 [Ticket 29494](https://bugs.torproject.org/29494) has a more detailed
1068 description of this problem, and an experimental branch that changes the cell
1069 event callback locations to be from circuitmux post-queue, which with KIST,
1070 should be an accurate reflection of when they are actually sent on the wire.
1071
1072 If your padding machine and problem space depends on very accurate notions of
1073 relay-side packet timing, please try that branch and let us know on the
1074 ticket if you need any further assistance fixing it up.
1075
1076 Additionally, with these changes, it will be possible to provide further
1077 overhead reducing optimizations by letting machines specify flags to indicate
1078 that padding should not be sent if there are any cells pending in the cell
1079 queue, for doing things like extending cell bursts more accurately and with
1080 less overhead.
1081
1082 However, even if we solve the queuing issues, Tor's current timers are not as
1083 precise as some padding designs may require. We will still have issues of
1084 timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653)
1085 describes an issue the circuit padding system has with sending 0-delay padding
1086 cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a
1087 libevent timer accuracy issue, which causes callbacks to vary up to 10ms from
1088 their scheduled time, even in absence of load.
1089
1090 All of these issues strongly suggest that you either truncate the resolution
1091 of any timers you feed to your classifier, or that you omit timestamps
1092 entirely from the classification problem until these issues are addressed.
1093
1094 ### 7.3. Better Machine Negotiation
1095
1096 Circuit padding is applied to circuits through machine conditions.
1097
1098 The following machine conditions may be useful for some use cases, but have
1099 not been implemented yet:
1100   * [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083)
1101   * [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092)
1102   * [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783)
1103   * [More flexible purpose handling](https://bugs.torproject.org/32040)
1104
1105 Additionally, the following features may help to obscure that padding is being
1106 negotiated, and/or streamline that negotiation process:
1107   * [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172)
1108   * [Better shutdown handling](https://bugs.torproject.org/30992)
1109   * [Preference-ordered negotiation menu](https://bugs.torproject.org/30348)
1110
1111 ### 7.4. Probabilistic State Transitions
1112
1113 Right now, the state machine transitions are fully deterministic. However,
1114 one could imagine a state machine that uses probabilistic transitions between
1115 states to simulate a random walk or Hidden Markov Model traversal across
1116 several pages.
1117
1118 The simplest way to implement this is to make  the `circpad_state_t.next_state` array
1119 into an array of structs that have a next state field, and a probability to
1120 transition to that state.
1121
1122 If you need this feature, please see [ticket
1123 31787](https://bugs.torproject.org/31787) for more details.
1124
1125 ### 7.5. More Complex Pattern Recognition
1126
1127 State machines are extremely efficient sequence recognition devices. But they
1128 are not great pattern recognition devices. This is one of the reasons why
1129 [Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
1130 used state machines in combination with histograms, to model the target
1131 distribution of interpacket delays for transmitted packets.
1132
1133 However, there currently is no such optimization for reaction to patterns of
1134 *received* traffic. There may also be cases where defenses must react to more
1135 complex patterns of sent traffic than can be expressed by our current
1136 histogram and length count events.
1137
1138 For example: if you wish your machine to react to a certain count of incoming
1139 cells in a row, right now you have to have a state for each cell, and use the
1140 infinity bin to timeout of the sequence in each state. We could make this more
1141 compact if each state had an arrival cell counter and inter-cell timeout. Or
1142 we could make more complex mechanisms to recognize certain patterns of arrival
1143 traffic in a state.
1144
1145 The best way to build recognition primitives like this into the framework is
1146 to add additional [Internal Machine Events](#63-internal-machine-events) for
1147 the pattern in question.
1148
1149 As another simple example, a useful additional event might be to transition
1150 whenever any of your histogram bins are empty, rather than all of them. To do
1151 this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum
1152 `circpad_event_t` in `circuitpadding.h`. You would then create a function
1153 `circuitpadding_internal_event_any_bin_empty()`, which would work just like
1154 `circuitpadding_internal_event_bin_empty()`, and also be called from
1155 `check_machine_token_supply()` in `circuitpadding.c` but with the check for
1156 each bin being zero instead of the total. With this change, new machines could
1157 react to this new event in the same way as any other.
1158
1159 If you have any other ideas that may be useful, please comment on [ticket
1160 32680](https://bugs.torproject.org/32680).
1161
1162
1163 ## 8. Open Research Problems
1164
1165 ### 8.1. Onion Service Circuit Setup
1166
1167 Our circuit setup padding does not address timing-based features, only
1168 packet counts. Deep learning can probably see this.
1169
1170 However, before going too deep down the timing rabithole, we may need to make
1171 [some improvements to Tor](#72-timing-and-queuing-optimizations). Please
1172 comment on those tickets if you need this.
1173
1174 ### 8.2. Onion Service Fingerprinting
1175
1176 We have done nothing to obscure the service side of onion service circuit
1177 setup. Because service-side onion services will have the reverse traffic byte
1178 counts as normal clients, they will likely need some kind of [hybrid
1179 application layer traffic shaping](#53-sketch-of-tamaraw), in addition to
1180 simple circuit setup obfuscation.
1181
1182 Fingerprinting in
1183 [combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)
1184 with
1185 [vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md)
1186 ia also an open issue.
1187
1188 ### 8.3. Open World Fingerprinting
1189
1190 Similarly, Open World/clearweb website fingerprinting defenses remain
1191 an unsolved problem from the practicality point of view. The original WTF-PAD
1192 defense was never tuned, and it is showing accuracy issues against deep
1193 learning attacks.
1194
1195 ### 8.4. Protocol Usage Fingerprinting
1196
1197 Traffic Fingerprinting to determine the protocol in use by a client has not
1198 been studied, either from the attack or defense point of view.
1199
1200 ### 8.5. Datagram Transport Side Channels
1201
1202 Padding can reduce the accuracy of dropped-cell side channels in such
1203 transports, but we don't know [how to measure
1204 this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
1205
1206 ## 9. Must Read Papers
1207
1208 These are by far the most important papers in the space, to date:
1209
1210  - [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
1211  - [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf)
1212  - [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf)
1213  - [WTF-PAD](http://arxiv.org/pdf/1512.00524)
1214
1215 Except for WTF-PAD, these papers were selected because they point towards
1216 optimality bounds that can be benchmarked against.
1217
1218 We cite them even though we are skeptical that provably optimal defenses can
1219 be constructed, at least not without trivial or impractical transforms (such as
1220 those that can be created with unbounded queue capacity, or stored knowledge
1221 of traces for every possible HTTP trace on the Internet).
1222
1223 We also are not demanding an optimality or security proof for every defense.
1224
1225 Instead, we cite the above as benchmarks. We believe the space, especially the
1226 open-world case, to be more akin to an optimization problem, where a
1227 WTF-PAD-like defense must be tuned through an optimizer to produce results
1228 comparable to provably optimal but practically unrealizable defenses, through
1229 rigorous adversarial evaluation.
1230
1231 ## A. Acknowledgments
1232
1233 This research was supported in part by NSF grants CNS-1619454 and CNS-1526306.