proposals/159-exit-scanning.txt

   1 Filename: 159-exit-scanning.txt
   2 Title: Exit Scanning
   3 Author: Mike Perry
   4 Created: 13-Feb-2009
   5 Status: Open
   6
   7 Overview:
   8
   9 This proposal describes the implementation and integration of an
  10 automated exit node scanner for scanning the Tor network for malicious,
  11 misconfigured, firewalled or filtered nodes.
  12
  13 Motivation:
  14
  15 Tor exit nodes can be run by anyone with an Internet connection. Often,
  16 these users aren't fully aware of limitations of their networking
  17 setup.  Content filters, antivirus software, advertisements injected by
  18 their service providers, malicious upstream providers, and the resource
  19 limitations of their computer or networking equipment have all been
  20 observed on the current Tor network.
  21
  22 It is also possible that some nodes exist purely for malicious
  23 purposes.  In the past, there have been intermittent instances of
  24 nodes spoofing SSH keys, as well as nodes being used for purposes of
  25 plaintext surveillance.
  26
  27 While it is not realistic to expect to catch extremely targeted or
  28 completely passive malicious adversaries, the goal is to prevent
  29 malicious adversaries from deploying dragnet attacks against large
  30 segments of the Tor userbase.
  31
  32
  33 Scanning methodology:
  34
  35 The first scans to be implemented are HTTP, HTML, Javascript, and
  36 SSL scans.
  37
  38 The HTTP scan scrapes Google for common filetype urls such as exe, msi,
  39 doc, dmg, etc. It then fetches these urls through Non-Tor and Tor, and
  40 compares the SHA1 hashes of the resulting content.
  41
  42 The SSL scan downloads certificates for all IPs a domain will locally
  43 resolve to and compares these certificates to those seen over Tor. The
  44 scanner notes if a domain had rotated certificates locally in the
  45 results for each scan.
  46
  47 The HTML scan checks HTML, Javascript, and plugin content for
  48 modifications. Because of the dynamic nature of most of the web, the
  49 scanner has a number of mechanisms built in to filter out false
  50 positives that are used when a change is noticed between Tor and
  51 Non-Tor.
  52
  53 All tests also share a URL-based false positive filter that
  54 automatically removes results retroactively if the number of failures
  55 exceeds a certain percentage of nodes tested with the URL.
  56
  57
  58 Deployment Stages:
  59
  60 To avoid instances where bugs cause us to mark exit nodes as BadExit
  61 improperly, it is proposed that we begin use of the scanner in stages.
  62
  63 1. Manual Review:
  64
  65   In the first stage, basic scans will be run by a small number of
  66   people while we stabilize the scanner. The scanner has the ability
  67   to resume crashed scans, and to rescan nodes that fail various
  68   tests.
  69
  70 2. Human Review:
  71
  72   In the second stage, results will be automatically mailed to
  73   an email list of interested parties for review. We will also begin
  74   classifying failure types into three to four different severity
  75   levels, based on both the reliability of the test and the nature of
  76   the failure.
  77
  78 3. Automatic BadExit Marking:
  79
  80   In the final stage, the scanner will begin marking exits depending
  81   on the failure severity level in one of three different ways: by
  82   node idhex, by node IP, or by node IP mask. A potential fourth, less
  83   severe category of results may still be delivered via email only for
  84   review.
  85
  86   BadExit markings will be delivered in batches upon completion
  87   of whole-network scans, so that the final false positive
  88   filter has an opportunity to filter out URLs that exhibit
  89   dynamic content beyond what we can filter.
  90
  91
  92 Specification of Exit Marking:
  93
  94 Technically, BadExit could be marked via SETCONF AuthDirBadExit over
  95 the control port, but this would allow full access to the directory
  96 authority configuration and operation.
  97
  98 The approved-routers file could also be used, but currently it only
  99 supports fingerprints, and it also contains other data unrelated to
 100 exit scanning that would be difficult to coordinate.
 101
 102 Instead, we propose that a new badexit-routers file that has three
 103 keywords:
 104
 105   BadExitNet 1*[exitpattern from 2.3 in dir-spec.txt]
 106   BadExitFP 1*[hexdigest from 2.3 in dir-spec.txt]
 107
 108 BadExitNet lines would follow the codepaths used by AuthDirBadExit to
 109 set authdir_badexit_policy, and BadExitFP would follow the codepaths
 110 from approved-router's !badexit lines.
 111
 112 The scanner would have exclusive ability to write, append, rewrite,
 113 and modify this file. Prior to building a new consensus vote, a
 114 participating Tor authority would read in a fresh copy.
 115
 116
 117 Security Implications:
 118
 119 Aside from evading the scanner's detection, there are two additional
 120 high-level security considerations:
 121
 122 1. Ensure nodes cannot be marked BadExit by an adversary at will
 123
 124 It is possible individual website owners will be able to target certain
 125 Tor nodes, but once they begin to attempt to fail more than the URL
 126 filter percentage of the exits, their sites will be automatically
 127 discarded.
 128
 129 Failing specific nodes is possible, but scanned results are fully
 130 reproducible, and BadExits should be rare enough that humans are never
 131 fully removed from the loop.
 132
 133 State (cookies, cache, etc) does not otherwise persist in the scanner
 134 between exit nodes to enable one exit node to bias the results of a
 135 later one.
 136
 137 2. Ensure that scanner compromise does not yield authority compromise
 138
 139 Having a separate file that is under the exclusive control of the
 140 scanner allows us to heavily isolate the scanner from the Tor
 141 authority, potentially even running them on separate machines.
 142