docs/image-fuzzer.txt

   1 # Specification for the fuzz testing tool
   2 #
   3 # Copyright (C) 2014 Maria Kustova <maria.k@catit.be>
   4 #
   5 # This program is free software: you can redistribute it and/or modify
   6 # it under the terms of the GNU General Public License as published by
   7 # the Free Software Foundation, either version 2 of the License, or
   8 # (at your option) any later version.
   9 #
  10 # This program is distributed in the hope that it will be useful,
  11 # but WITHOUT ANY WARRANTY; without even the implied warranty of
  12 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  13 # GNU General Public License for more details.
  14 #
  15 # You should have received a copy of the GNU General Public License
  16 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
  17
  18
  19 Image fuzzer
  20 ============
  21
  22 Description
  23 -----------
  24
  25 The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img
  26 by providing to them randomly corrupted images.
  27 Test images are generated from scratch and have valid inner structure with some
  28 elements, e.g. L1/L2 tables, having random invalid values.
  29
  30
  31 Test runner
  32 -----------
  33
  34 The test runner generates test images, executes tests utilizing generated
  35 images, indicates their results and collects all test related artifacts (logs,
  36 core dumps, test images, backing files).
  37 The test means execution of all available commands under test with the same
  38 generated test image.
  39 By default, the test runner generates new tests and executes them until
  40 keyboard interruption. But if a test seed is specified via the '--seed' runner
  41 parameter, then only one test with this seed will be executed, after its finish
  42 the runner will exit.
  43
  44 The runner uses an external image fuzzer to generate test images. An image
  45 generator should be specified as a mandatory parameter of the test runner.
  46 Details about interactions between the runner and fuzzers see "Module
  47 interfaces".
  48
  49 The runner activates generation of core dumps during test executions, but it
  50 assumes that core dumps will be generated in the current working directory.
  51 For comprehensive test results, please, set up your test environment
  52 properly.
  53
  54 Paths to binaries under test (SUTs) qemu-img and qemu-io are retrieved from
  55 environment variables. If the environment check fails the runner will
  56 use SUTs installed in system paths.
  57 qemu-img is required for creation of backing files, so it's mandatory to set
  58 the related environment variable if it's not installed in the system path.
  59 For details about environment variables see qemu-iotests/check.
  60
  61 The runner accepts a JSON array of fields expected to be fuzzed via the
  62 '--config' argument, e.g.
  63
  64        '[["feature_name_table"], ["header", "l1_table_offset"]]'
  65
  66 Each sublist can have one or two strings defining image structure elements.
  67 In the latter case a parent element should be placed on the first position,
  68 and a field name on the second one.
  69
  70 The runner accepts a list of commands under test as a JSON array via
  71 the '--command' argument. Each command is a list containing a SUT and all its
  72 arguments, e.g.
  73
  74        runner.py -c '[["qemu-io", "$test_img", "-c", "write $off $len"]]'
  75      /tmp/test ../qcow2
  76
  77 For variable arguments next aliases can be used:
  78     - $test_img for a fuzzed img
  79     - $off for an offset in the fuzzed image
  80     - $len for a data size
  81
  82 Values for last two aliases will be generated based on a size of a virtual
  83 disk of the generated image.
  84 In case when no commands are specified the runner will execute commands from
  85 the default list:
  86     - qemu-img check
  87     - qemu-img info
  88     - qemu-img convert
  89     - qemu-io -c read
  90     - qemu-io -c write
  91     - qemu-io -c aio_read
  92     - qemu-io -c aio_write
  93     - qemu-io -c flush
  94     - qemu-io -c discard
  95     - qemu-io -c truncate
  96
  97
  98 Qcow2 image generator
  99 ---------------------
 100
 101 The 'qcow2' generator is a Python package providing 'create_image' method as
 102 a single public API. See details in 'Test runner/image fuzzer' chapter of
 103 'Module interfaces'.
 104
 105 Qcow2 contains two submodules: fuzz.py and layout.py.
 106
 107 'fuzz.py' contains all fuzzing functions, one per image field. It's assumed
 108 that after code analysis every field will have own constraints for its value.
 109 For now only universal potentially dangerous values are used, e.g. type limits
 110 for integers or unsafe symbols as '%s' for strings. For bitmasks random amount
 111 of bits are set to ones. All fuzzed values are checked on non-equality to the
 112 current valid value of the field. In case of equality the value will be
 113 regenerated.
 114
 115 'layout.py' creates a random valid image, fuzzes a random subset of the image
 116 fields by 'fuzz.py' module and writes a fuzzed image to the file specified.
 117 If a fuzzer configuration is specified, then it has the next interpretation:
 118
 119     1. If a list contains a parent image element only, then some random portion
 120     of fields of this element will be fuzzed every test.
 121     The same behavior is applied for the entire image if no configuration is
 122     used. This case is useful for the test specialization.
 123
 124     2. If a list contains a parent element and a field name, then a field
 125     will be always fuzzed for every test. This case is useful for regression
 126     testing.
 127
 128 For now only header fields, header extensions and L1/L2 tables are generated.
 129
 130 Module interfaces
 131 -----------------
 132
 133 * Test runner/image fuzzer
 134
 135 The runner calls an image generator specifying the path to a test image file,
 136 path to a backing file and its format and a fuzzer configuration.
 137 An image generator is expected to provide a
 138
 139    'create_image(test_img_path, backing_file_path=None,
 140                  backing_file_format=None, fuzz_config=None)'
 141
 142 method that creates a test image, writes it to the specified file and returns
 143 the size of the virtual disk.
 144 The file should be created if it doesn't exist or overwritten otherwise.
 145 fuzz_config has a form of a list of lists. Every sublist can have one
 146 or two elements: first element is a name of a parent image element, second one
 147 if exists is a name of a field in this element.
 148 Example,
 149         [['header', 'l1_table_offset'],
 150          ['header', 'nb_snapshots'],
 151          ['feature_name_table']]
 152
 153 Random seed is set by the runner at every test execution for the regression
 154 purpose, so an image generator is not recommended to modify it internally.
 155
 156
 157 Overall fuzzer requirements
 158 ===========================
 159
 160 Input data:
 161 ----------
 162
 163  - image template (generator)
 164  - work directory
 165  - action vector (optional)
 166  - seed (optional)
 167  - SUT and its arguments (optional)
 168
 169
 170 Fuzzer requirements:
 171 -------------------
 172
 173 1.  Should be able to inject random data
 174 2.  Should be able to select a random value from the manually pregenerated
 175     vector (boundary values, e.g. max/min cluster size)
 176 3.  Image template should describe a general structure invariant for all
 177     test images (image format description)
 178 4.  Image template should be autonomous and other fuzzer parts should not
 179     rely on it
 180 5.  Image template should contain reference rules (not only block+size
 181     description)
 182 6.  Should generate the test image with the correct structure based on an image
 183     template
 184 7.  Should accept a seed as an argument (for regression purpose)
 185 8.  Should generate a seed if it is not specified as an input parameter.
 186 9.  The same seed should generate the same image for the same action vector,
 187     specified or generated.
 188 10. Should accept a vector of actions as an argument (for test reproducing and
 189     for test case specification, e.g. group of tests for header structure,
 190     group of test for snapshots, etc)
 191 11. Action vector should be randomly generated from the pool of available
 192     actions, if it is not specified as an input parameter
 193 12. Pool of actions should be defined automatically based on an image template
 194 13. Should accept a SUT and its call parameters as an argument or select them
 195     randomly otherwise. As far as it's expected to be rarely changed, the list
 196     of all possible test commands can be available in the test runner
 197     internally.
 198 14. Should support an external cancellation of a test run
 199 15. Seed should be logged (for regression purpose)
 200 16. All files related to a test result should be collected: a test image,
 201     SUT logs, fuzzer logs and crash dumps
 202 17. Should be compatible with python version 2.4-2.7
 203 18. Usage of external libraries should be limited as much as possible.
 204
 205
 206 Image formats:
 207 -------------
 208
 209 Main target image format is qcow2, but support of image templates should
 210 provide an ability to add any other image format.
 211
 212
 213 Effectiveness:
 214 -------------
 215
 216 The fuzzer can be controlled via template, seed and action vector;
 217 it makes the fuzzer itself invariant to an image format and test logic.
 218 It should be able to perform rather complex and precise tests, that can be
 219 specified via an action vector. Otherwise, knowledge about an image structure
 220 allows the fuzzer to generate the pool of all available areas can be fuzzed
 221 and randomly select some of them and so compose its own action vector.
 222 Also complexity of a template defines complexity of the fuzzer, so its
 223 functionality can be varied from simple model-independent fuzzing to smart
 224 model-based one.
 225
 226
 227 Glossary:
 228 --------
 229
 230 Action vector is a sequence of structure elements retrieved from an image
 231 format, each of them will be fuzzed for the test image. It's a subset of
 232 elements of the action pool. Example: header, refcount table, etc.
 233 Action pool is all available elements of an image structure that generated
 234 automatically from an image template.
 235 Image template is a formal description of an image structure and relations
 236 between image blocks.
 237 Test image is an output image of the fuzzer defined by the current seed and
 238 action vector.