testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/README

   1 Introduction:
   2
   3 The Flexible Filesystem Benchmark (FFSB) is a filesystem performance
   4 measurement tool.  It is a multi-threaded application (using
   5 pthreads), written entirely in C with cross-platform portability in
   6 mind.  It differs from other filesystem benchmarks in that the user
   7 may supply a profile to create custom workloads, while most other
   8 filesystem benchmarks use a fixed set of workloads.
   9
  10 As of version 5.1, it supports seven different basic operations, support
  11 for multiple groups of threads with different operation mixtures,
  12 support for operation across multiple filesystems, and support for
  13 filesystem aging prior to benchmarking.
  14
  15
  16 Differences from version 4.0 and older:
  17
  18 Version 5.0 and above represent almost a total re-write and many
  19 things have changed.  In version 5.0 and above FFSB moved to a
  20 time-regulated run versus doing a set number of different operations
  21 and timing the whole thing.  This is primarily to better deal with the
  22 use of multiple threadgroups which would otherwise not be synchronized
  23 at termination time.
  24
  25 Additionally, the FFSB configuration file format has changed in
  26 version 5.0, although we do support old-style configuration files
  27 along with a run-time passed on the command line.  In this mode,
  28 version 5.0 and above ignores the iterations parameter, and simply
  29 uses the time specified on the command line.
  30
  31 Behaviorally, most of the old operations are the same -- sequential
  32 reads and sequential writes work as they did before.  One change in
  33 version 5.0 is the skip-read behavior of reading then seeking forward
  34 a fixed amount then reading again is removed, we now support fully
  35 randomized reads and writes from random offsets within the file.
  36
  37 Version 4.0 didn't support overwrites (only appends) so we interpret
  38 writes in old config files to be append operations.
  39
  40 On Linux, CPU utilization information will only be accurate for
  41 systems using NPTL, older Linuxthreads systems will probably only see
  42 zeros for CPU utilization because Linuxthreads is non-compliant to
  43 POSIX. Version 4.0 and older could be recompiled to work on
  44 Linuxthreads, but in 5.0 and later we no longer support this.
  45
  46 We no longer support the "outputfile" on the command line.
  47
  48 One should simply use tee or similar to capture the output.  FFSB
  49 unbuffers standard out for this purpose, and errors are sent on
  50 standard error.
  51
  52 Global options:
  53
  54 There are eight valid global options placed at the beginning of the
  55 profile.  Three of them are required: num_filesystems (number of
  56 filesystems), num_threadgroups (number of threadgroups), and time
  57 (running time of the benchmark).  The other five options are:
  58
  59 directio   - each call to open will be made using O_DIRECT
  60 alignio    - aligns all block operations for random reads and writes
  61              on 4k boundaries.
  62 bufferedio - currently ignorred: it is intended to use libc
  63              fread,rwrite, instead of just unix read and write calls
  64 verbose    - currently ignored
  65
  66 callout    - calls and external command and waits for its termination
  67              before FFSB begins the benchmark phase.
  68              This is useful for synchronizing distributed clients,
  69              starting profilers, etc.
  70
  71 They must be specified in the above order (num_filesystems,
  72 num_threadgroups, time, directio, alignio, bufferedio, verbose,
  73 callout).
  74
  75
  76
  77 Filesystems:
  78
  79 Filesystems are specified to FFSB in the form of a directory.  FFSB
  80 assumes that the filesystem is mounted at this directory and will not
  81 do any verification of this fact beyond ensuring it can read/write to
  82 the location.  So be careful to ensure something with enough space to
  83 handle the dataset is in fact mounted at the specified location.
  84
  85 In the filesystem clause of the profile, one may set the starting
  86 number of files and directories as well as a minimum and maximum
  87 filesize for the filesystem.  One may also specify the blocksize
  88 used for creating the files separately in the filesystem clause.
  89
  90 Also, if a filesystem is to be aged, a special threadgroup clause may
  91 be embedded in a filesystem clause to specify the operation mixture
  92 and number of threads used to age the filesystem.  This threadgroup is
  93 run until filesystem utilization reaches the specified amount.
  94
  95 Inheritance --  if you are using multiple filesystems, all attributes
  96 except the location should be inherited from the previous filesystem.
  97 This is done to make it easier to add groups of similar filesystems.
  98 In this case, only the location is required in the filesystem clause.
  99
 100 As of version 5.1, filesystem re-use is supported if a given
 101 filesystem hasn't been modified beyond it's orginal specifications
 102 (number of files and directories is correct, and file sizes are within
 103 specifications).  This can be a huge time saver if one wishes to do
 104 multiple runs on the same data-set without altering it during a run,
 105 because the fileset doesn't need to be recreated before each run.
 106
 107 To do this, specify "reuse=1" in the filesystem clause, and FFSB will
 108 verify the fileset first, and if it checks out it will use it.
 109 Otherwise, it will remove everything and re-create the filesets for
 110 that filesystem.
 111
 112 Threadgroups:
 113
 114 An arbitrary number of threadgroups with differing numbers of threads
 115 and operation mixes can be specified.  The operations are specified
 116 using a weighting for each operation, if an operation isn't specified
 117 it's weighting is assumed to be zero (not used).
 118
 119 "Think-time" for a threadgroup may also be specified in millisecond
 120 amounts using the "op_delay" parameter, where every thread will wait
 121 for the specified amount between each operation.
 122
 123 Operations:
 124
 125 All operations begin by randomly selecting a filesystem from the list
 126 of filesystems specified in the profile.  The distribution aims to be
 127 uniform across all filesystems.
 128
 129
 130 The seven operations are:
 131
 132 reads  - read() calls with an overall amount and a blocksize
 133          operates on existing files.  Care must be taken to ensure
 134          that the read amount is smaller than the size of any possible
 135          file.
 136
 137          If random_read is specified, then the each individual blocks
 138          will be read starting from a random point with the file, and
 139          this will continune until the entire amount specifed has been
 140          read.  This offset of each random block will be totally
 141          random to the byte level, unless the "alignio" global parameter
 142          is on, and then the reads will be 4096 byte aligned.  This is
 143          generally recommended.
 144
 145
 146 readall - Very similar to read above, except it doesn't take an
 147           amount; it simply reads the entire file sequentially using the
 148           read_blocksize.   This is useful for situations where
 149           different filesystems have differently sized files, and sequential
 150           read patterns across all filesystems are desired.
 151
 152 writes - write() calls with an overall amount and blocksize
 153          this is an overwrite operation and will not enlarge an existing
 154          file, again one must be careful not to specify a write amount
 155          that is larger than any possible file in the data set.
 156
 157          If random_write is specified, then the each individual blocks
 158          will be written starting from a random point with the file, and
 159          this will continune until the entire amount specifed has been
 160          written out.  This offset of each random block will be totally
 161          random to the byte level, unless the "alignio" global parameter
 162          is on, and then the writes will be 4096 byte aligned.  This
 163          is generally recommended.
 164
 165          If the fsync_flag parameter for the threadgroup is non-zero,
 166          then after all of the write calls are finished, fsync() will
 167          be called on the file descriptor before the file is closed.
 168
 169
 170 creates - creates a file using open() call and determines the size
 171           randomly between on the constraints (min_filesize and
 172           max_filesize) for the selected filesystem. Write operations will
 173           be done using the same blocksize as is specified for the
 174           write operation.
 175 deletes - calls unlink() on a filename and removes it from the
 176           internal data-structures.  One must be careful to ensure
 177           there are enough files to delete at all times or else the benchmark
 178           will terminate.
 179 appends - calls write() using the append flag with an overall amount
 180           and a blocksize to be appended onto a randomly chosen file.
 181 metas   - this is actually a mix of several different directory
 182           operations.  Each "meta" operation consists of two directory
 183           creates, one directory remove, and a directory rename.
 184           These operations are all carried out separately from the
 185           other 5 operations.
 186
 187 Operation accounting:
 188
 189 Each operation which uses a blocksize counts each read/write of a
 190 blocksize as an operation (reads,writes,creates, and appends) whereas
 191 deletes and metas are considered single operations.
 192
 193 Running the benchmark:
 194
 195 There are three phases to running the benchmark, aging, fileset
 196 creates, and the benchmark phase.
 197
 198 The create phase is carried out across all filesystems simultanously
 199 with one dedicated thread per filesystem.
 200
 201 After the create phase, sync() is called to ensure all dirty data gets
 202 written out before the benchmark phase begins, and sync() is again
 203 called at the end of the benchmark phase.  The time in sync() at the
 204 end of the benchmark phase is counted as part of the benchmark phase.
 205
 206 Caveats/Holes/Bugs:
 207
 208 Aging and aging across multiple filesystems simultaneously hasn't been tested
 209 very much.
 210
 211 If *any* i/o operation or system call/libc call fails, the benchmark
 212 will terminate immediately.
 213
 214 The parser doesn't handle mal-formed or incorrect profiles very well
 215 (or at all).
 216
 217 The parser doesn't check to make sure all of the appropriate options
 218 have been specified.  For example, if writes are specified in a
 219 threadgroup but write_blocksize isn't specified, the parse won't catch
 220 it, but the benchmark run will fail later on.
 221
 222
 223 Configuration Files (new style):
 224
 225 New Style Configuration allows for arbitrary newlines between lines,
 226 and comments using '#' at the start of a line.  Also it allows tabs,
 227 whitespace before and after configuration parameters.
 228
 229 The new style configuration file is broken up into three main parts:
 230
 231 global parameters, filesystems, and threadgroups
 232
 233 The sections must be in the above order.
 234
 235 Global parameters:
 236
 237 Global Paramters are described above, the first three are always
 238 required. Example:
 239
 240 ----------
 241
 242 num_filesystems=1
 243 num_threadgroups=1
 244 time=30                 # time is in seconds
 245
 246 directio=0              # don't use direct io
 247 alignio=1               # align random IOs to 4k
 248 bufferedio=0            # this does nothing right now
 249 verbose=0               # this does nothing right now
 250
 251                         # calls and external command and waits
 252                         # everything until the newline is taken
 253                         # so you can have abritrary parmeters
 254 callout=synchronize.sh myhostname
 255
 256 ---------
 257
 258 All of these must appear in this order, though you can leave out the
 259 optional ones.
 260
 261 Filesystems:
 262
 263 Filesystems describe differnt logical sets of files residing in
 264 different directorys.  There is no strict requirement that they
 265 actually be on different filesystems, only that the directory
 266 specified already exists.
 267
 268 Filesystems are specified by a clause with a filesystem number like
 269 this:
 270
 271 [filesystem0]
 272         location=/mnt/testing/
 273         num_files=10
 274         num_dirs=1
 275         max_filesize=4096
 276         min_filesize=4096
 277 [end0]
 278
 279
 280 The clause must always begin with [filesystemX] and end with [endX]
 281 where X is the number of that filesystem.
 282
 283 You should start wiht X = 0, and increment by one for each following
 284 filesystem.  If they are out of order, things will likely break.
 285
 286 The required information for each filesystem is: location, num_files,
 287 num_dirs, max_filesize, and min_filesize.  Beyond those the following
 288 four options are supported:
 289
 290
 291
 292 reuse=1 # check the filesystem to see if it is reusable
 293
 294         # filesystem aging, three components required
 295         # takes agefs=1 to turn it on
 296         # then a valid threadgroup specification
 297         # then a desired utilization percentage
 298
 299 agefs=1 # age the filesystem according to the following threadgroup
 300         [threadgroup0]
 301                 num_threads=10
 302                 write_size=40960
 303                 write_blocksize=4096
 304                 create_weight=10
 305                 append_weight=10
 306                 delete_weight=1
 307         [end0]
 308 desired_util=0.20       # In this case, age until the fs is 20% full
 309
 310 create_blocksize=4096   # specify the blocksize to write()
 311                         # for creating the fileset, defaults to 4096
 312
 313 age_blocksize=4096      # specify the blocksize to write() for aging
 314
 315
 316 Also, to allow lazy people to use lots of filesystems, we support
 317 filesystem inheritance, which simply copies all options but the
 318 location from the previous filesystem clause if nothing is specified.
 319 Obviously, this doesn't work for filesystem0. (May not work for aging
 320 either?)
 321
 322 Full blown filesystem clause example:
 323
 324 ----
 325
 326 [filesystem0]
 327
 328         # required parts
 329
 330         location=/home/sonny/tmp
 331         num_files=100
 332         num_dirs=100
 333         max_filesize=65536
 334         min_filesize=4096
 335
 336         # aging part
 337         agefs=0
 338         [threadgroup0]
 339                 num_threads=10
 340                 write_size=40960
 341                 write_blocksize=4096
 342                 create_weight=10
 343                 append_weight=10
 344                 delete_weight=1
 345         [end0]
 346                 desired_util=0.02       # age until 2% full
 347
 348         # other optional commands
 349
 350         create_blocksize=1024           # use a small create blocksize
 351         age_blocksize=1024              # and smaller age create blocksize
 352         reuse=0                         # don't reuse it
 353 [end0]
 354
 355
 356
 357 --
 358
 359 Threadgroups:
 360
 361 Threadgropus are very similar to filesystems in that any number of
 362 them can be specified in clauses, and they must be in order starting
 363 with threadgroup0.
 364
 365 Example:
 366
 367 ---
 368
 369 [threadgroup0]
 370         num_threads=32
 371         read_weight=4
 372         append_weight=1
 373
 374         write_size=4096
 375         write_blocksize=4096
 376
 377         read_size=4096
 378         read_blocksize=4096
 379 [end0]
 380
 381 ---
 382
 383 In a threadgroup clause, num_threads is required and must be at least
 384 1.  Then, at least one operation must be given a weight greater than 0
 385 to be a valid threadgroup.  Operations can be given a weighting of 0,
 386 and in this case they are ignored.
 387
 388 Certain operations will also require other commands, for example, if
 389 read_weight is greater than zero, then one must also include a
 390 read_size and a read_blocksize.  Here's the table of requirements and
 391 options:
 392
 393
 394 Operation               Requirements                    Options
 395 --                      --                              --
 396 read_weight             read_size, read_blocksize       read_random
 397 readall_weight          read_blocksize                  none
 398 write_weight            write_size, write_blocksize     write_random,fsync_file
 399 create_weight           write_blocksize or create_blocksize     none
 400 append_weight           write_blocksize, write_size     none
 401 delete_weight           none                            none
 402 meta_weight             none                            none
 403
 404
 405
 406 Other threadgroup options:
 407
 408 op_delay=10  # specify a wait between operations in milli-seconds
 409
 410 bindfs=3     # This allows you to restrict a threadgroup's operation
 411              # to a specific filesystem number.  Currently only
 412              # binding to one specific filesystem is supported
 413