make-ext4-readme-tables-readable

   1 docs: make ext4 readme tables readable
   2
   3 From: Darrick J. Wong <darrick.wong@oracle.com>
   4
   5 The tables in the ext4 readme are not particularly space efficient in
   6 the text or html outputs, and they're totally broken in the pdf output.
   7 Convert them into titled paragraphs so that they render more nicely.
   8
   9 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
  10 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
  11 ---
  12  Documentation/filesystems/ext4/ext4.rst |  821 +++++++++++++++----------------
  13  1 file changed, 391 insertions(+), 430 deletions(-)
  14
  15
  16 diff --git a/Documentation/filesystems/ext4/ext4.rst b/Documentation/filesystems/ext4/ext4.rst
  17 index 9d4368d591fa..e2b6bb7c2730 100644
  18 --- a/Documentation/filesystems/ext4/ext4.rst
  19 +++ b/Documentation/filesystems/ext4/ext4.rst
  20 @@ -101,269 +101,256 @@ Options
  21  When mounting an ext4 filesystem, the following option are accepted:
  22  (*) == default
  23
  24 -======================= =======================================================
  25 -Mount Option            Description
  26 -======================= =======================================================
  27 -ro                     Mount filesystem read only. Note that ext4 will
  28 -                       replay the journal (and thus write to the
  29 -                       partition) even when mounted "read only". The
  30 -                       mount options "ro,noload" can be used to prevent
  31 -                       writes to the filesystem.
  32 -
  33 -journal_checksum       Enable checksumming of the journal transactions.
  34 -                       This will allow the recovery code in e2fsck and the
  35 -                       kernel to detect corruption in the kernel.  It is a
  36 -                       compatible change and will be ignored by older kernels.
  37 -
  38 -journal_async_commit   Commit block can be written to disk without waiting
  39 -                       for descriptor blocks. If enabled older kernels cannot
  40 -                       mount the device. This will enable 'journal_checksum'
  41 -                       internally.
  42 -
  43 -journal_path=path
  44 -journal_dev=devnum     When the external journal device's major/minor numbers
  45 -                       have changed, these options allow the user to specify
  46 -                       the new journal location.  The journal device is
  47 -                       identified through either its new major/minor numbers
  48 -                       encoded in devnum, or via a path to the device.
  49 -
  50 -norecovery             Don't load the journal on mounting.  Note that
  51 -noload                 if the filesystem was not unmounted cleanly,
  52 -                       skipping the journal replay will lead to the
  53 -                       filesystem containing inconsistencies that can
  54 -                       lead to any number of problems.
  55 -
  56 -data=journal           All data are committed into the journal prior to being
  57 -                       written into the main file system.  Enabling
  58 -                       this mode will disable delayed allocation and
  59 -                       O_DIRECT support.
  60 -
  61 -data=ordered   (*)     All data are forced directly out to the main file
  62 -                       system prior to its metadata being committed to the
  63 -                       journal.
  64 -
  65 -data=writeback         Data ordering is not preserved, data may be written
  66 -                       into the main file system after its metadata has been
  67 -                       committed to the journal.
  68 -
  69 -commit=nrsec   (*)     Ext4 can be told to sync all its data and metadata
  70 -                       every 'nrsec' seconds. The default value is 5 seconds.
  71 -                       This means that if you lose your power, you will lose
  72 -                       as much as the latest 5 seconds of work (your
  73 -                       filesystem will not be damaged though, thanks to the
  74 -                       journaling).  This default value (or any low value)
  75 -                       will hurt performance, but it's good for data-safety.
  76 -                       Setting it to 0 will have the same effect as leaving
  77 -                       it at the default (5 seconds).
  78 -                       Setting it to very large values will improve
  79 -                       performance.
  80 -
  81 -barrier=<0|1(*)>       This enables/disables the use of write barriers in
  82 -barrier(*)             the jbd code.  barrier=0 disables, barrier=1 enables.
  83 -nobarrier              This also requires an IO stack which can support
  84 -                       barriers, and if jbd gets an error on a barrier
  85 -                       write, it will disable again with a warning.
  86 -                       Write barriers enforce proper on-disk ordering
  87 -                       of journal commits, making volatile disk write caches
  88 -                       safe to use, at some performance penalty.  If
  89 -                       your disks are battery-backed in one way or another,
  90 -                       disabling barriers may safely improve performance.
  91 -                       The mount options "barrier" and "nobarrier" can
  92 -                       also be used to enable or disable barriers, for
  93 -                       consistency with other ext4 mount options.
  94 -
  95 -inode_readahead_blks=n This tuning parameter controls the maximum
  96 -                       number of inode table blocks that ext4's inode
  97 -                       table readahead algorithm will pre-read into
  98 -                       the buffer cache.  The default value is 32 blocks.
  99 -
 100 -nouser_xattr           Disables Extended User Attributes.  See the
 101 -                       attr(5) manual page for more information about
 102 -                       extended attributes.
 103 -
 104 -noacl                  This option disables POSIX Access Control List
 105 -                       support. If ACL support is enabled in the kernel
 106 -                       configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL is
 107 -                       enabled by default on mount. See the acl(5) manual
 108 -                       page for more information about acl.
 109 -
 110 -bsddf          (*)     Make 'df' act like BSD.
 111 -minixdf                        Make 'df' act like Minix.
 112 -
 113 -debug                  Extra debugging information is sent to syslog.
 114 -
 115 -abort                  Simulate the effects of calling ext4_abort() for
 116 -                       debugging purposes.  This is normally used while
 117 -                       remounting a filesystem which is already mounted.
 118 -
 119 -errors=remount-ro      Remount the filesystem read-only on an error.
 120 -errors=continue                Keep going on a filesystem error.
 121 -errors=panic           Panic and halt the machine if an error occurs.
 122 -                        (These mount options override the errors behavior
 123 -                        specified in the superblock, which can be configured
 124 -                        using tune2fs)
 125 -
 126 -data_err=ignore(*)     Just print an error message if an error occurs
 127 -                       in a file data buffer in ordered mode.
 128 -data_err=abort         Abort the journal if an error occurs in a file
 129 -                       data buffer in ordered mode.
 130 -
 131 -grpid                  New objects have the group ID of their parent.
 132 -bsdgroups
 133 -
 134 -nogrpid                (*)     New objects have the group ID of their creator.
 135 -sysvgroups
 136 -
 137 -resgid=n               The group ID which may use the reserved blocks.
 138 -
 139 -resuid=n               The user ID which may use the reserved blocks.
 140 -
 141 -sb=n                   Use alternate superblock at this location.
 142 -
 143 -quota                  These options are ignored by the filesystem. They
 144 -noquota                        are used only by quota tools to recognize volumes
 145 -grpquota               where quota should be turned on. See documentation
 146 -usrquota               in the quota-tools package for more details
 147 -                       (http://sourceforge.net/projects/linuxquota).
 148 -
 149 -jqfmt=<quota type>     These options tell filesystem details about quota
 150 -usrjquota=<file>       so that quota information can be properly updated
 151 -grpjquota=<file>       during journal replay. They replace the above
 152 -                       quota options. See documentation in the quota-tools
 153 -                       package for more details
 154 -                       (http://sourceforge.net/projects/linuxquota).
 155 -
 156 -stripe=n               Number of filesystem blocks that mballoc will try
 157 -                       to use for allocation size and alignment. For RAID5/6
 158 -                       systems this should be the number of data
 159 -                       disks *  RAID chunk size in file system blocks.
 160 -
 161 -delalloc       (*)     Defer block allocation until just before ext4
 162 -                       writes out the block(s) in question.  This
 163 -                       allows ext4 to better allocation decisions
 164 -                       more efficiently.
 165 -nodelalloc             Disable delayed allocation.  Blocks are allocated
 166 -                       when the data is copied from userspace to the
 167 -                       page cache, either via the write(2) system call
 168 -                       or when an mmap'ed page which was previously
 169 -                       unallocated is written for the first time.
 170 -
 171 -max_batch_time=usec    Maximum amount of time ext4 should wait for
 172 -                       additional filesystem operations to be batch
 173 -                       together with a synchronous write operation.
 174 -                       Since a synchronous write operation is going to
 175 -                       force a commit and then a wait for the I/O
 176 -                       complete, it doesn't cost much, and can be a
 177 -                       huge throughput win, we wait for a small amount
 178 -                       of time to see if any other transactions can
 179 -                       piggyback on the synchronous write.   The
 180 -                       algorithm used is designed to automatically tune
 181 -                       for the speed of the disk, by measuring the
 182 -                       amount of time (on average) that it takes to
 183 -                       finish committing a transaction.  Call this time
 184 -                       the "commit time".  If the time that the
 185 -                       transaction has been running is less than the
 186 -                       commit time, ext4 will try sleeping for the
 187 -                       commit time to see if other operations will join
 188 -                       the transaction.   The commit time is capped by
 189 -                       the max_batch_time, which defaults to 15000us
 190 -                       (15ms).   This optimization can be turned off
 191 -                       entirely by setting max_batch_time to 0.
 192 -
 193 -min_batch_time=usec    This parameter sets the commit time (as
 194 -                       described above) to be at least min_batch_time.
 195 -                       It defaults to zero microseconds.  Increasing
 196 -                       this parameter may improve the throughput of
 197 -                       multi-threaded, synchronous workloads on very
 198 -                       fast disks, at the cost of increasing latency.
 199 -
 200 -journal_ioprio=prio    The I/O priority (from 0 to 7, where 0 is the
 201 -                       highest priority) which should be used for I/O
 202 -                       operations submitted by kjournald2 during a
 203 -                       commit operation.  This defaults to 3, which is
 204 -                       a slightly higher priority than the default I/O
 205 -                       priority.
 206 -
 207 -auto_da_alloc(*)       Many broken applications don't use fsync() when
 208 -noauto_da_alloc                replacing existing files via patterns such as
 209 -                       fd = open("foo.new")/write(fd,..)/close(fd)/
 210 -                       rename("foo.new", "foo"), or worse yet,
 211 -                       fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
 212 -                       If auto_da_alloc is enabled, ext4 will detect
 213 -                       the replace-via-rename and replace-via-truncate
 214 -                       patterns and force that any delayed allocation
 215 -                       blocks are allocated such that at the next
 216 -                       journal commit, in the default data=ordered
 217 -                       mode, the data blocks of the new file are forced
 218 -                       to disk before the rename() operation is
 219 -                       committed.  This provides roughly the same level
 220 -                       of guarantees as ext3, and avoids the
 221 -                       "zero-length" problem that can happen when a
 222 -                       system crashes before the delayed allocation
 223 -                       blocks are forced to disk.
 224 -
 225 -noinit_itable          Do not initialize any uninitialized inode table
 226 -                       blocks in the background.  This feature may be
 227 -                       used by installation CD's so that the install
 228 -                       process can complete as quickly as possible; the
 229 -                       inode table initialization process would then be
 230 -                       deferred until the next time the  file system
 231 -                       is unmounted.
 232 -
 233 -init_itable=n          The lazy itable init code will wait n times the
 234 -                       number of milliseconds it took to zero out the
 235 -                       previous block group's inode table.  This
 236 -                       minimizes the impact on the system performance
 237 -                       while file system's inode table is being initialized.
 238 -
 239 -discard                        Controls whether ext4 should issue discard/TRIM
 240 -nodiscard(*)           commands to the underlying block device when
 241 -                       blocks are freed.  This is useful for SSD devices
 242 -                       and sparse/thinly-provisioned LUNs, but it is off
 243 -                       by default until sufficient testing has been done.
 244 -
 245 -nouid32                        Disables 32-bit UIDs and GIDs.  This is for
 246 -                       interoperability  with  older kernels which only
 247 -                       store and expect 16-bit values.
 248 -
 249 -block_validity(*)      These options enable or disable the in-kernel
 250 -noblock_validity       facility for tracking filesystem metadata blocks
 251 -                       within internal data structures.  This allows multi-
 252 -                       block allocator and other routines to notice
 253 -                       bugs or corrupted allocation bitmaps which cause
 254 -                       blocks to be allocated which overlap with
 255 -                       filesystem metadata blocks.
 256 -
 257 -dioread_lock           Controls whether or not ext4 should use the DIO read
 258 -dioread_nolock         locking. If the dioread_nolock option is specified
 259 -                       ext4 will allocate uninitialized extent before buffer
 260 -                       write and convert the extent to initialized after IO
 261 -                       completes. This approach allows ext4 code to avoid
 262 -                       using inode mutex, which improves scalability on high
 263 -                       speed storages. However this does not work with
 264 -                       data journaling and dioread_nolock option will be
 265 -                       ignored with kernel warning. Note that dioread_nolock
 266 -                       code path is only used for extent-based files.
 267 -                       Because of the restrictions this options comprises
 268 -                       it is off by default (e.g. dioread_lock).
 269 -
 270 -max_dir_size_kb=n      This limits the size of directories so that any
 271 -                       attempt to expand them beyond the specified
 272 -                       limit in kilobytes will cause an ENOSPC error.
 273 -                       This is useful in memory constrained
 274 -                       environments, where a very large directory can
 275 -                       cause severe performance problems or even
 276 -                       provoke the Out Of Memory killer.  (For example,
 277 -                       if there is only 512mb memory available, a 176mb
 278 -                       directory may seriously cramp the system's style.)
 279 -
 280 -i_version              Enable 64-bit inode version support. This option is
 281 -                       off by default.
 282 -
 283 -dax                    Use direct access (no page cache).  See
 284 -                       Documentation/filesystems/dax.txt.  Note that
 285 -                       this option is incompatible with data=journal.
 286 -======================= =======================================================
 287 +  ro
 288 +        Mount filesystem read only. Note that ext4 will replay the journal (and
 289 +        thus write to the partition) even when mounted "read only". The mount
 290 +        options "ro,noload" can be used to prevent writes to the filesystem.
 291 +
 292 +  journal_checksum
 293 +        Enable checksumming of the journal transactions.  This will allow the
 294 +        recovery code in e2fsck and the kernel to detect corruption in the
 295 +        kernel.  It is a compatible change and will be ignored by older
 296 +        kernels.
 297 +
 298 +  journal_async_commit
 299 +        Commit block can be written to disk without waiting for descriptor
 300 +        blocks. If enabled older kernels cannot mount the device. This will
 301 +        enable 'journal_checksum' internally.
 302 +
 303 +  journal_path=path, journal_dev=devnum
 304 +        When the external journal device's major/minor numbers have changed,
 305 +        these options allow the user to specify the new journal location.  The
 306 +        journal device is identified through either its new major/minor numbers
 307 +        encoded in devnum, or via a path to the device.
 308 +
 309 +  norecovery, noload
 310 +        Don't load the journal on mounting.  Note that if the filesystem was
 311 +        not unmounted cleanly, skipping the journal replay will lead to the
 312 +        filesystem containing inconsistencies that can lead to any number of
 313 +        problems.
 314 +
 315 +  data=journal
 316 +        All data are committed into the journal prior to being written into the
 317 +        main file system.  Enabling this mode will disable delayed allocation
 318 +        and O_DIRECT support.
 319 +
 320 +  data=ordered (*)
 321 +        All data are forced directly out to the main file system prior to its
 322 +        metadata being committed to the journal.
 323 +
 324 +  data=writeback
 325 +        Data ordering is not preserved, data may be written into the main file
 326 +        system after its metadata has been committed to the journal.
 327 +
 328 +  commit=nrsec (*)
 329 +        Ext4 can be told to sync all its data and metadata every 'nrsec'
 330 +        seconds. The default value is 5 seconds.  This means that if you lose
 331 +        your power, you will lose as much as the latest 5 seconds of work (your
 332 +        filesystem will not be damaged though, thanks to the journaling).  This
 333 +        default value (or any low value) will hurt performance, but it's good
 334 +        for data-safety.  Setting it to 0 will have the same effect as leaving
 335 +        it at the default (5 seconds).  Setting it to very large values will
 336 +        improve performance.
 337 +
 338 +  barrier=<0|1(*)>, barrier(*), nobarrier
 339 +        This enables/disables the use of write barriers in the jbd code.
 340 +        barrier=0 disables, barrier=1 enables.  This also requires an IO stack
 341 +        which can support barriers, and if jbd gets an error on a barrier
 342 +        write, it will disable again with a warning.  Write barriers enforce
 343 +        proper on-disk ordering of journal commits, making volatile disk write
 344 +        caches safe to use, at some performance penalty.  If your disks are
 345 +        battery-backed in one way or another, disabling barriers may safely
 346 +        improve performance.  The mount options "barrier" and "nobarrier" can
 347 +        also be used to enable or disable barriers, for consistency with other
 348 +        ext4 mount options.
 349 +
 350 +  inode_readahead_blks=n
 351 +        This tuning parameter controls the maximum number of inode table blocks
 352 +        that ext4's inode table readahead algorithm will pre-read into the
 353 +        buffer cache.  The default value is 32 blocks.
 354 +
 355 +  nouser_xattr
 356 +        Disables Extended User Attributes.  See the attr(5) manual page for
 357 +        more information about extended attributes.
 358 +
 359 +  noacl
 360 +        This option disables POSIX Access Control List support. If ACL support
 361 +        is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL
 362 +        is enabled by default on mount. See the acl(5) manual page for more
 363 +        information about acl.
 364 +
 365 +  bsddf        (*)
 366 +        Make 'df' act like BSD.
 367 +
 368 +  minixdf
 369 +        Make 'df' act like Minix.
 370 +
 371 +  debug
 372 +        Extra debugging information is sent to syslog.
 373 +
 374 +  abort
 375 +        Simulate the effects of calling ext4_abort() for debugging purposes.
 376 +        This is normally used while remounting a filesystem which is already
 377 +        mounted.
 378 +
 379 +  errors=remount-ro
 380 +        Remount the filesystem read-only on an error.
 381 +
 382 +  errors=continue
 383 +        Keep going on a filesystem error.
 384 +
 385 +  errors=panic
 386 +        Panic and halt the machine if an error occurs.  (These mount options
 387 +        override the errors behavior specified in the superblock, which can be
 388 +        configured using tune2fs)
 389 +
 390 +  data_err=ignore(*)
 391 +        Just print an error message if an error occurs in a file data buffer in
 392 +        ordered mode.
 393 +  data_err=abort
 394 +        Abort the journal if an error occurs in a file data buffer in ordered
 395 +        mode.
 396 +
 397 +  grpid | bsdgroups
 398 +        New objects have the group ID of their parent.
 399 +
 400 +  nogrpid (*) | sysvgroups
 401 +        New objects have the group ID of their creator.
 402 +
 403 +  resgid=n
 404 +        The group ID which may use the reserved blocks.
 405 +
 406 +  resuid=n
 407 +        The user ID which may use the reserved blocks.
 408 +
 409 +  sb=
 410 +        Use alternate superblock at this location.
 411 +
 412 +  quota, noquota, grpquota, usrquota
 413 +        These options are ignored by the filesystem. They are used only by
 414 +        quota tools to recognize volumes where quota should be turned on. See
 415 +        documentation in the quota-tools package for more details
 416 +        (http://sourceforge.net/projects/linuxquota).
 417 +
 418 +  jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file>
 419 +        These options tell filesystem details about quota so that quota
 420 +        information can be properly updated during journal replay. They replace
 421 +        the above quota options. See documentation in the quota-tools package
 422 +        for more details (http://sourceforge.net/projects/linuxquota).
 423 +
 424 +  stripe=n
 425 +        Number of filesystem blocks that mballoc will try to use for allocation
 426 +        size and alignment. For RAID5/6 systems this should be the number of
 427 +        data disks *  RAID chunk size in file system blocks.
 428 +
 429 +  delalloc     (*)
 430 +        Defer block allocation until just before ext4 writes out the block(s)
 431 +        in question.  This allows ext4 to better allocation decisions more
 432 +        efficiently.
 433 +
 434 +  nodelalloc
 435 +        Disable delayed allocation.  Blocks are allocated when the data is
 436 +        copied from userspace to the page cache, either via the write(2) system
 437 +        call or when an mmap'ed page which was previously unallocated is
 438 +        written for the first time.
 439 +
 440 +  max_batch_time=usec
 441 +        Maximum amount of time ext4 should wait for additional filesystem
 442 +        operations to be batch together with a synchronous write operation.
 443 +        Since a synchronous write operation is going to force a commit and then
 444 +        a wait for the I/O complete, it doesn't cost much, and can be a huge
 445 +        throughput win, we wait for a small amount of time to see if any other
 446 +        transactions can piggyback on the synchronous write.   The algorithm
 447 +        used is designed to automatically tune for the speed of the disk, by
 448 +        measuring the amount of time (on average) that it takes to finish
 449 +        committing a transaction.  Call this time the "commit time".  If the
 450 +        time that the transaction has been running is less than the commit
 451 +        time, ext4 will try sleeping for the commit time to see if other
 452 +        operations will join the transaction.   The commit time is capped by
 453 +        the max_batch_time, which defaults to 15000us (15ms).   This
 454 +        optimization can be turned off entirely by setting max_batch_time to 0.
 455 +
 456 +  min_batch_time=usec
 457 +        This parameter sets the commit time (as described above) to be at least
 458 +        min_batch_time.  It defaults to zero microseconds.  Increasing this
 459 +        parameter may improve the throughput of multi-threaded, synchronous
 460 +        workloads on very fast disks, at the cost of increasing latency.
 461 +
 462 +  journal_ioprio=prio
 463 +        The I/O priority (from 0 to 7, where 0 is the highest priority) which
 464 +        should be used for I/O operations submitted by kjournald2 during a
 465 +        commit operation.  This defaults to 3, which is a slightly higher
 466 +        priority than the default I/O priority.
 467 +
 468 +  auto_da_alloc(*), noauto_da_alloc
 469 +        Many broken applications don't use fsync() when replacing existing
 470 +        files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/
 471 +        rename("foo.new", "foo"), or worse yet, fd = open("foo",
 472 +        O_TRUNC)/write(fd,..)/close(fd).  If auto_da_alloc is enabled, ext4
 473 +        will detect the replace-via-rename and replace-via-truncate patterns
 474 +        and force that any delayed allocation blocks are allocated such that at
 475 +        the next journal commit, in the default data=ordered mode, the data
 476 +        blocks of the new file are forced to disk before the rename() operation
 477 +        is committed.  This provides roughly the same level of guarantees as
 478 +        ext3, and avoids the "zero-length" problem that can happen when a
 479 +        system crashes before the delayed allocation blocks are forced to disk.
 480 +
 481 +  noinit_itable
 482 +        Do not initialize any uninitialized inode table blocks in the
 483 +        background.  This feature may be used by installation CD's so that the
 484 +        install process can complete as quickly as possible; the inode table
 485 +        initialization process would then be deferred until the next time the
 486 +        file system is unmounted.
 487 +
 488 +  init_itable=n
 489 +        The lazy itable init code will wait n times the number of milliseconds
 490 +        it took to zero out the previous block group's inode table.  This
 491 +        minimizes the impact on the system performance while file system's
 492 +        inode table is being initialized.
 493 +
 494 +  discard, nodiscard(*)
 495 +        Controls whether ext4 should issue discard/TRIM commands to the
 496 +        underlying block device when blocks are freed.  This is useful for SSD
 497 +        devices and sparse/thinly-provisioned LUNs, but it is off by default
 498 +        until sufficient testing has been done.
 499 +
 500 +  nouid32
 501 +        Disables 32-bit UIDs and GIDs.  This is for interoperability  with
 502 +        older kernels which only store and expect 16-bit values.
 503 +
 504 +  block_validity(*), noblock_validity
 505 +        These options enable or disable the in-kernel facility for tracking
 506 +        filesystem metadata blocks within internal data structures.  This
 507 +        allows multi- block allocator and other routines to notice bugs or
 508 +        corrupted allocation bitmaps which cause blocks to be allocated which
 509 +        overlap with filesystem metadata blocks.
 510 +
 511 +  dioread_lock, dioread_nolock
 512 +        Controls whether or not ext4 should use the DIO read locking. If the
 513 +        dioread_nolock option is specified ext4 will allocate uninitialized
 514 +        extent before buffer write and convert the extent to initialized after
 515 +        IO completes. This approach allows ext4 code to avoid using inode
 516 +        mutex, which improves scalability on high speed storages. However this
 517 +        does not work with data journaling and dioread_nolock option will be
 518 +        ignored with kernel warning. Note that dioread_nolock code path is only
 519 +        used for extent-based files.  Because of the restrictions this options
 520 +        comprises it is off by default (e.g. dioread_lock).
 521 +
 522 +  max_dir_size_kb=n
 523 +        This limits the size of directories so that any attempt to expand them
 524 +        beyond the specified limit in kilobytes will cause an ENOSPC error.
 525 +        This is useful in memory constrained environments, where a very large
 526 +        directory can cause severe performance problems or even provoke the Out
 527 +        Of Memory killer.  (For example, if there is only 512mb memory
 528 +        available, a 176mb directory may seriously cramp the system's style.)
 529 +
 530 +  i_version
 531 +        Enable 64-bit inode version support. This option is off by default.
 532 +
 533 +  dax
 534 +        Use direct access (no page cache).  See
 535 +        Documentation/filesystems/dax.txt.  Note that this option is
 536 +        incompatible with data=journal.
 537
 538  Data Mode
 539  =========
 540 @@ -407,11 +394,8 @@ in table below.
 541
 542  Files in /proc/fs/ext4/<devname>
 543
 544 -================ =======
 545 - File            Content
 546 -================ =======
 547 - mb_groups       details of multiblock allocator buddy cache of free blocks
 548 -================ =======
 549 +  mb_groups
 550 +        details of multiblock allocator buddy cache of free blocks
 551
 552  /sys entries
 553  ============
 554 @@ -426,74 +410,71 @@ Files in /sys/fs/ext4/<devname>:
 555
 556  (see also Documentation/ABI/testing/sysfs-fs-ext4)
 557
 558 -============================= =================================================
 559 -File                          Content
 560 -============================= =================================================
 561 - delayed_allocation_blocks    This file is read-only and shows the number of
 562 -                              blocks that are dirty in the page cache, but
 563 -                              which do not have their location in the
 564 -                              filesystem allocated yet.
 565 -
 566 -inode_goal                    Tuning parameter which (if non-zero) controls
 567 -                              the goal inode used by the inode allocator in
 568 -                              preference to all other allocation heuristics.
 569 -                              This is intended for debugging use only, and
 570 -                              should be 0 on production systems.
 571 -
 572 -inode_readahead_blks          Tuning parameter which controls the maximum
 573 -                              number of inode table blocks that ext4's inode
 574 -                              table readahead algorithm will pre-read into
 575 -                              the buffer cache
 576 -
 577 -lifetime_write_kbytes         This file is read-only and shows the number of
 578 -                              kilobytes of data that have been written to this
 579 -                              filesystem since it was created.
 580 -
 581 - max_writeback_mb_bump        The maximum number of megabytes the writeback
 582 -                              code will try to write out before move on to
 583 -                              another inode.
 584 -
 585 - mb_group_prealloc            The multiblock allocator will round up allocation
 586 -                              requests to a multiple of this tuning parameter if
 587 -                              the stripe size is not set in the ext4 superblock
 588 -
 589 - mb_max_to_scan               The maximum number of extents the multiblock
 590 -                              allocator will search to find the best extent
 591 -
 592 - mb_min_to_scan               The minimum number of extents the multiblock
 593 -                              allocator will search to find the best extent
 594 -
 595 - mb_order2_req                Tuning parameter which controls the minimum size
 596 -                              for requests (as a power of 2) where the buddy
 597 -                              cache is used
 598 -
 599 - mb_stats                     Controls whether the multiblock allocator should
 600 -                              collect statistics, which are shown during the
 601 -                              unmount. 1 means to collect statistics, 0 means
 602 -                              not to collect statistics
 603 -
 604 - mb_stream_req                Files which have fewer blocks than this tunable
 605 -                              parameter will have their blocks allocated out
 606 -                              of a block group specific preallocation pool, so
 607 -                              that small files are packed closely together.
 608 -                              Each large file will have its blocks allocated
 609 -                              out of its own unique preallocation pool.
 610 -
 611 - session_write_kbytes         This file is read-only and shows the number of
 612 -                              kilobytes of data that have been written to this
 613 -                              filesystem since it was mounted.
 614 -
 615 - reserved_clusters            This is RW file and contains number of reserved
 616 -                              clusters in the file system which will be used
 617 -                              in the specific situations to avoid costly
 618 -                              zeroout, unexpected ENOSPC, or possible data
 619 -                              loss. The default is 2% or 4096 clusters,
 620 -                              whichever is smaller and this can be changed
 621 -                              however it can never exceed number of clusters
 622 -                              in the file system. If there is not enough space
 623 -                              for the reserved space when mounting the file
 624 -                              mount will _not_ fail.
 625 -============================= =================================================
 626 +  delayed_allocation_blocks
 627 +        This file is read-only and shows the number of blocks that are dirty in
 628 +        the page cache, but which do not have their location in the filesystem
 629 +        allocated yet.
 630 +
 631 +  inode_goal
 632 +        Tuning parameter which (if non-zero) controls the goal inode used by
 633 +        the inode allocator in preference to all other allocation heuristics.
 634 +        This is intended for debugging use only, and should be 0 on production
 635 +        systems.
 636 +
 637 +  inode_readahead_blks
 638 +        Tuning parameter which controls the maximum number of inode table
 639 +        blocks that ext4's inode table readahead algorithm will pre-read into
 640 +        the buffer cache.
 641 +
 642 +  lifetime_write_kbytes
 643 +        This file is read-only and shows the number of kilobytes of data that
 644 +        have been written to this filesystem since it was created.
 645 +
 646 +  max_writeback_mb_bump
 647 +        The maximum number of megabytes the writeback code will try to write
 648 +        out before move on to another inode.
 649 +
 650 +  mb_group_prealloc
 651 +        The multiblock allocator will round up allocation requests to a
 652 +        multiple of this tuning parameter if the stripe size is not set in the
 653 +        ext4 superblock
 654 +
 655 +  mb_max_to_scan
 656 +        The maximum number of extents the multiblock allocator will search to
 657 +        find the best extent.
 658 +
 659 +  mb_min_to_scan
 660 +        The minimum number of extents the multiblock allocator will search to
 661 +        find the best extent.
 662 +
 663 +  mb_order2_req
 664 +        Tuning parameter which controls the minimum size for requests (as a
 665 +        power of 2) where the buddy cache is used.
 666 +
 667 +  mb_stats
 668 +        Controls whether the multiblock allocator should collect statistics,
 669 +        which are shown during the unmount. 1 means to collect statistics, 0
 670 +        means not to collect statistics.
 671 +
 672 +  mb_stream_req
 673 +        Files which have fewer blocks than this tunable parameter will have
 674 +        their blocks allocated out of a block group specific preallocation
 675 +        pool, so that small files are packed closely together.  Each large file
 676 +        will have its blocks allocated out of its own unique preallocation
 677 +        pool.
 678 +
 679 +  session_write_kbytes
 680 +        This file is read-only and shows the number of kilobytes of data that
 681 +        have been written to this filesystem since it was mounted.
 682 +
 683 +  reserved_clusters
 684 +        This is RW file and contains number of reserved clusters in the file
 685 +        system which will be used in the specific situations to avoid costly
 686 +        zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or
 687 +        4096 clusters, whichever is smaller and this can be changed however it
 688 +        can never exceed number of clusters in the file system. If there is not
 689 +        enough space for the reserved space when mounting the file mount will
 690 +        _not_ fail.
 691
 692  Ioctls
 693  ======
 694 @@ -504,100 +485,80 @@ shown in the table below.
 695
 696  Table of Ext4 specific ioctls
 697
 698 -============================= =================================================
 699 -Ioctl                        Description
 700 -============================= =================================================
 701 - EXT4_IOC_GETFLAGS           Get additional attributes associated with inode.
 702 -                             The ioctl argument is an integer bitfield, with
 703 -                             bit values described in ext4.h. This ioctl is an
 704 -                             alias for FS_IOC_GETFLAGS.
 705 -
 706 - EXT4_IOC_SETFLAGS           Set additional attributes associated with inode.
 707 -                             The ioctl argument is an integer bitfield, with
 708 -                             bit values described in ext4.h. This ioctl is an
 709 -                             alias for FS_IOC_SETFLAGS.
 710 -
 711 - EXT4_IOC_GETVERSION
 712 - EXT4_IOC_GETVERSION_OLD
 713 -                             Get the inode i_generation number stored for
 714 -                             each inode. The i_generation number is normally
 715 -                             changed only when new inode is created and it is
 716 -                             particularly useful for network filesystems. The
 717 -                             '_OLD' version of this ioctl is an alias for
 718 -                             FS_IOC_GETVERSION.
 719 -
 720 - EXT4_IOC_SETVERSION
 721 - EXT4_IOC_SETVERSION_OLD
 722 -                             Set the inode i_generation number stored for
 723 -                             each inode. The '_OLD' version of this ioctl
 724 -                             is an alias for FS_IOC_SETVERSION.
 725 -
 726 - EXT4_IOC_GROUP_EXTEND       This ioctl has the same purpose as the resize
 727 -                             mount option. It allows to resize filesystem
 728 -                             to the end of the last existing block group,
 729 -                             further resize has to be done with resize2fs,
 730 -                             either online, or offline. The argument points
 731 -                             to the unsigned logn number representing the
 732 -                             filesystem new block count.
 733 -
 734 - EXT4_IOC_MOVE_EXT           Move the block extents from orig_fd (the one
 735 -                             this ioctl is pointing to) to the donor_fd (the
 736 -                             one specified in move_extent structure passed
 737 -                             as an argument to this ioctl). Then, exchange
 738 -                             inode metadata between orig_fd and donor_fd.
 739 -                             This is especially useful for online
 740 -                             defragmentation, because the allocator has the
 741 -                             opportunity to allocate moved blocks better,
 742 -                             ideally into one contiguous extent.
 743 -
 744 - EXT4_IOC_GROUP_ADD          Add a new group descriptor to an existing or
 745 -                             new group descriptor block. The new group
 746 -                             descriptor is described by ext4_new_group_input
 747 -                             structure, which is passed as an argument to
 748 -                             this ioctl. This is especially useful in
 749 -                             conjunction with EXT4_IOC_GROUP_EXTEND,
 750 -                             which allows online resize of the filesystem
 751 -                             to the end of the last existing block group.
 752 -                             Those two ioctls combined is used in userspace
 753 -                             online resize tool (e.g. resize2fs).
 754 -
 755 - EXT4_IOC_MIGRATE            This ioctl operates on the filesystem itself.
 756 -                             It converts (migrates) ext3 indirect block mapped
 757 -                             inode to ext4 extent mapped inode by walking
 758 -                             through indirect block mapping of the original
 759 -                             inode and converting contiguous block ranges
 760 -                             into ext4 extents of the temporary inode. Then,
 761 -                             inodes are swapped. This ioctl might help, when
 762 -                             migrating from ext3 to ext4 filesystem, however
 763 -                             suggestion is to create fresh ext4 filesystem
 764 -                             and copy data from the backup. Note, that
 765 -                             filesystem has to support extents for this ioctl
 766 -                             to work.
 767 -
 768 - EXT4_IOC_ALLOC_DA_BLKS              Force all of the delay allocated blocks to be
 769 -                             allocated to preserve application-expected ext3
 770 -                             behaviour. Note that this will also start
 771 -                             triggering a write of the data blocks, but this
 772 -                             behaviour may change in the future as it is
 773 -                             not necessary and has been done this way only
 774 -                             for sake of simplicity.
 775 -
 776 - EXT4_IOC_RESIZE_FS          Resize the filesystem to a new size.  The number
 777 -                             of blocks of resized filesystem is passed in via
 778 -                             64 bit integer argument.  The kernel allocates
 779 -                             bitmaps and inode table, the userspace tool thus
 780 -                             just passes the new number of blocks.
 781 -
 782 - EXT4_IOC_SWAP_BOOT          Swap i_blocks and associated attributes
 783 -                             (like i_blocks, i_size, i_flags, ...) from
 784 -                             the specified inode with inode
 785 -                             EXT4_BOOT_LOADER_INO (#5). This is typically
 786 -                             used to store a boot loader in a secure part of
 787 -                             the filesystem, where it can't be changed by a
 788 -                             normal user by accident.
 789 -                             The data blocks of the previous boot loader
 790 -                             will be associated with the given inode.
 791 -============================= =================================================
 792 +  EXT4_IOC_GETFLAGS
 793 +        Get additional attributes associated with inode.  The ioctl argument is
 794 +        an integer bitfield, with bit values described in ext4.h. This ioctl is
 795 +        an alias for FS_IOC_GETFLAGS.
 796 +
 797 +  EXT4_IOC_SETFLAGS
 798 +        Set additional attributes associated with inode.  The ioctl argument is
 799 +        an integer bitfield, with bit values described in ext4.h. This ioctl is
 800 +        an alias for FS_IOC_SETFLAGS.
 801 +
 802 +  EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
 803 +        Get the inode i_generation number stored for each inode. The
 804 +        i_generation number is normally changed only when new inode is created
 805 +        and it is particularly useful for network filesystems. The '_OLD'
 806 +        version of this ioctl is an alias for FS_IOC_GETVERSION.
 807 +
 808 +  EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD
 809 +        Set the inode i_generation number stored for each inode. The '_OLD'
 810 +        version of this ioctl is an alias for FS_IOC_SETVERSION.
 811 +
 812 +  EXT4_IOC_GROUP_EXTEND
 813 +        This ioctl has the same purpose as the resize mount option. It allows
 814 +        to resize filesystem to the end of the last existing block group,
 815 +        further resize has to be done with resize2fs, either online, or
 816 +        offline. The argument points to the unsigned logn number representing
 817 +        the filesystem new block count.
 818 +
 819 +  EXT4_IOC_MOVE_EXT
 820 +        Move the block extents from orig_fd (the one this ioctl is pointing to)
 821 +        to the donor_fd (the one specified in move_extent structure passed as
 822 +        an argument to this ioctl). Then, exchange inode metadata between
 823 +        orig_fd and donor_fd.  This is especially useful for online
 824 +        defragmentation, because the allocator has the opportunity to allocate
 825 +        moved blocks better, ideally into one contiguous extent.
 826 +
 827 +  EXT4_IOC_GROUP_ADD
 828 +        Add a new group descriptor to an existing or new group descriptor
 829 +        block. The new group descriptor is described by ext4_new_group_input
 830 +        structure, which is passed as an argument to this ioctl. This is
 831 +        especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which
 832 +        allows online resize of the filesystem to the end of the last existing
 833 +        block group.  Those two ioctls combined is used in userspace online
 834 +        resize tool (e.g. resize2fs).
 835 +
 836 +  EXT4_IOC_MIGRATE
 837 +        This ioctl operates on the filesystem itself.  It converts (migrates)
 838 +        ext3 indirect block mapped inode to ext4 extent mapped inode by walking
 839 +        through indirect block mapping of the original inode and converting
 840 +        contiguous block ranges into ext4 extents of the temporary inode. Then,
 841 +        inodes are swapped. This ioctl might help, when migrating from ext3 to
 842 +        ext4 filesystem, however suggestion is to create fresh ext4 filesystem
 843 +        and copy data from the backup. Note, that filesystem has to support
 844 +        extents for this ioctl to work.
 845 +
 846 +  EXT4_IOC_ALLOC_DA_BLKS
 847 +        Force all of the delay allocated blocks to be allocated to preserve
 848 +        application-expected ext3 behaviour. Note that this will also start
 849 +        triggering a write of the data blocks, but this behaviour may change in
 850 +        the future as it is not necessary and has been done this way only for
 851 +        sake of simplicity.
 852 +
 853 +  EXT4_IOC_RESIZE_FS
 854 +        Resize the filesystem to a new size.  The number of blocks of resized
 855 +        filesystem is passed in via 64 bit integer argument.  The kernel
 856 +        allocates bitmaps and inode table, the userspace tool thus just passes
 857 +        the new number of blocks.
 858 +
 859 +  EXT4_IOC_SWAP_BOOT
 860 +        Swap i_blocks and associated attributes (like i_blocks, i_size,
 861 +        i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO
 862 +        (#5). This is typically used to store a boot loader in a secure part of
 863 +        the filesystem, where it can't be changed by a normal user by accident.
 864 +        The data blocks of the previous boot loader will be associated with the
 865 +        given inode.
 866
 867  References
 868  ==========
 869
 870