1 ext4: import superblocks chapter from wiki page
3 From: Darrick J. Wong <darrick.wong@oracle.com>
5 Import the chapter about superblocks from the on-disk format wiki
6 page into the kernel documentation.
8 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
11 Documentation/filesystems/ext4/ondisk/globals.rst | 9
12 Documentation/filesystems/ext4/ondisk/index.rst | 1
13 Documentation/filesystems/ext4/ondisk/super.rst | 773 +++++++++++++++++++++
14 3 files changed, 783 insertions(+)
15 create mode 100644 Documentation/filesystems/ext4/ondisk/globals.rst
16 create mode 100644 Documentation/filesystems/ext4/ondisk/super.rst
19 diff --git a/Documentation/filesystems/ext4/ondisk/globals.rst b/Documentation/filesystems/ext4/ondisk/globals.rst
21 index 000000000000..4a33d0571bf2
23 +++ b/Documentation/filesystems/ext4/ondisk/globals.rst
25 +.. SPDX-License-Identifier: GPL-2.0
30 +The filesystem is sharded into a number of block groups, each of which
31 +have static metadata at fixed locations.
33 +.. include:: super.rst
34 diff --git a/Documentation/filesystems/ext4/ondisk/index.rst b/Documentation/filesystems/ext4/ondisk/index.rst
35 index 282ba197b6b2..dbb259f83976 100644
36 --- a/Documentation/filesystems/ext4/ondisk/index.rst
37 +++ b/Documentation/filesystems/ext4/ondisk/index.rst
38 @@ -5,3 +5,4 @@ Data Structures and Algorithms
39 ==============================
40 .. include:: about.rst
41 .. include:: overview.rst
42 +.. include:: globals.rst
43 diff --git a/Documentation/filesystems/ext4/ondisk/super.rst b/Documentation/filesystems/ext4/ondisk/super.rst
45 index 000000000000..1f5ac9ab6f0c
47 +++ b/Documentation/filesystems/ext4/ondisk/super.rst
49 +.. SPDX-License-Identifier: GPL-2.0
54 +The superblock records various information about the enclosing
55 +filesystem, such as block counts, inode counts, supported features,
56 +maintenance information, and more.
58 +If the sparse\_super feature flag is set, redundant copies of the
59 +superblock and group descriptors are kept only in the groups whose group
60 +number is either 0 or a power of 3, 5, or 7. If the flag is not set,
61 +redundant copies are kept in all groups.
63 +The superblock checksum is calculated against the superblock structure,
64 +which includes the FS UUID.
66 +The ext4 superblock is laid out as follows in
67 +``struct ext4_super_block``:
80 + - Total inode count.
83 + - s\_blocks\_count\_lo
84 + - Total block count.
87 + - s\_r\_blocks\_count\_lo
88 + - This number of blocks can only be allocated by the super-user.
91 + - s\_free\_blocks\_count\_lo
95 + - s\_free\_inodes\_count
99 + - s\_first\_data\_block
100 + - First data block. This must be at least 1 for 1k-block filesystems and
101 + is typically 0 for all other block sizes.
104 + - s\_log\_block\_size
105 + - Block size is 2 ^ (10 + s\_log\_block\_size).
108 + - s\_log\_cluster\_size
109 + - Cluster size is (2 ^ s\_log\_cluster\_size) blocks if bigalloc is
110 + enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size.
113 + - s\_blocks\_per\_group
114 + - Blocks per group.
117 + - s\_clusters\_per\_group
118 + - Clusters per group, if bigalloc is enabled. Otherwise
119 + s\_clusters\_per\_group must equal s\_blocks\_per\_group.
122 + - s\_inodes\_per\_group
123 + - Inodes per group.
127 + - Mount time, in seconds since the epoch.
131 + - Write time, in seconds since the epoch.
135 + - Number of mounts since the last fsck.
138 + - s\_max\_mnt\_count
139 + - Number of mounts beyond which a fsck is needed.
143 + - Magic signature, 0xEF53
147 + - File system state. See super_state_ for more info.
151 + - Behaviour when detecting errors. See super_errors_ for more info.
154 + - s\_minor\_rev\_level
155 + - Minor revision level.
159 + - Time of last check, in seconds since the epoch.
163 + - Maximum time between checks, in seconds.
167 + - Creator OS. See the table super_creator_ for more info.
171 + - Revision level. See the table super_revision_ for more info.
175 + - Default uid for reserved blocks.
179 + - Default gid for reserved blocks.
183 + - These fields are for EXT4_DYNAMIC_REV superblocks only.
185 + Note: the difference between the compatible feature set and the
186 + incompatible feature set is that if there is a bit set in the
187 + incompatible feature set that the kernel doesn't know about, it should
188 + refuse to mount the filesystem.
190 + e2fsck's requirements are more strict; if it doesn't know
191 + about a feature in either the compatible or incompatible feature set, it
192 + must abort and not try to meddle with things it doesn't understand...
196 + - First non-reserved inode.
200 + - Size of inode structure, in bytes.
203 + - s\_block\_group\_nr
204 + - Block group # of this superblock.
207 + - s\_feature\_compat
208 + - Compatible feature set flags. Kernel can still read/write this fs even
209 + if it doesn't understand a flag; fsck should not do that. See the
210 + super_compat_ table for more info.
213 + - s\_feature\_incompat
214 + - Incompatible feature set. If the kernel or fsck doesn't understand one
215 + of these bits, it should stop. See the super_incompat_ table for more
219 + - s\_feature\_ro\_compat
220 + - Readonly-compatible feature set. If the kernel doesn't understand one of
221 + these bits, it can still mount read-only. See the super_rocompat_ table
226 + - 128-bit UUID for volume.
229 + - s\_volume\_name[16]
233 + - s\_last\_mounted[64]
234 + - Directory where filesystem was last mounted.
237 + - s\_algorithm\_usage\_bitmap
238 + - For compression (Not used in e2fsprogs/Linux)
242 + - Performance hints. Directory preallocation should only happen if the
243 + EXT4_FEATURE_COMPAT_DIR_PREALLOC flag is on.
246 + - s\_prealloc\_blocks
247 + - #. of blocks to try to preallocate for ... files? (Not used in
251 + - s\_prealloc\_dir\_blocks
252 + - #. of blocks to preallocate for directories. (Not used in
256 + - s\_reserved\_gdt\_blocks
257 + - Number of reserved GDT entries for future filesystem expansion.
261 + - Journalling support is valid only if EXT4_FEATURE_COMPAT_HAS_JOURNAL is
265 + - s\_journal\_uuid[16]
266 + - UUID of journal superblock
270 + - inode number of journal file.
274 + - Device number of journal file, if the external journal feature flag is
279 + - Start of list of orphaned inodes to delete.
286 + - s\_def\_hash\_version
287 + - Default hash algorithm to use for directory hashes. See super_def_hash_
291 + - s\_jnl\_backup\_type
292 + - If this value is 0 or EXT3\_JNL\_BACKUP\_BLOCKS (1), then the
293 + ``s_jnl_blocks`` field contains a duplicate copy of the inode's
294 + ``i_block[]`` array and ``i_size``.
298 + - Size of group descriptors, in bytes, if the 64bit incompat feature flag
302 + - s\_default\_mount\_opts
303 + - Default mount options. See the super_mountopts_ table for more info.
306 + - s\_first\_meta\_bg
307 + - First metablock block group, if the meta\_bg feature is enabled.
311 + - When the filesystem was created, in seconds since the epoch.
314 + - s\_jnl\_blocks[17]
315 + - Backup copy of the journal inode's ``i_block[]`` array in the first 15
316 + elements and i\_size\_high and i\_size in the 16th and 17th elements,
321 + - 64bit support is valid only if EXT4_FEATURE_COMPAT_64BIT is set.
324 + - s\_blocks\_count\_hi
325 + - High 32-bits of the block count.
328 + - s\_r\_blocks\_count\_hi
329 + - High 32-bits of the reserved block count.
332 + - s\_free\_blocks\_count\_hi
333 + - High 32-bits of the free block count.
336 + - s\_min\_extra\_isize
337 + - All inodes have at least # bytes.
340 + - s\_want\_extra\_isize
341 + - New inodes should reserve # bytes.
345 + - Miscellaneous flags. See the super_flags_ table for more info.
349 + - RAID stride. This is the number of logical blocks read from or written
350 + to the disk before moving to the next disk. This affects the placement
351 + of filesystem metadata, which will hopefully make RAID storage faster.
355 + - #. seconds to wait in multi-mount prevention (MMP) checking. In theory,
356 + MMP is a mechanism to record in the superblock which host and device
357 + have mounted the filesystem, in order to prevent multiple mounts. This
358 + feature does not seem to be implemented...
362 + - Block # for multi-mount protection data.
365 + - s\_raid\_stripe\_width
366 + - RAID stripe width. This is the number of logical blocks read from or
367 + written to the disk before coming back to the current disk. This is used
368 + by the block allocator to try to reduce the number of read-modify-write
369 + operations in a RAID5/6.
372 + - s\_log\_groups\_per\_flex
373 + - Size of a flexible block group is 2 ^ ``s_log_groups_per_flex``.
376 + - s\_checksum\_type
377 + - Metadata checksum algorithm type. The only valid value is 1 (crc32c).
384 + - s\_kbytes\_written
385 + - Number of KiB written to this filesystem over its lifetime.
388 + - s\_snapshot\_inum
389 + - inode number of active snapshot. (Not used in e2fsprogs/Linux.)
393 + - Sequential ID of active snapshot. (Not used in e2fsprogs/Linux.)
396 + - s\_snapshot\_r\_blocks\_count
397 + - Number of blocks reserved for active snapshot's future use. (Not used in
401 + - s\_snapshot\_list
402 + - inode number of the head of the on-disk snapshot list. (Not used in
407 + - Number of errors seen.
410 + - s\_first\_error\_time
411 + - First time an error happened, in seconds since the epoch.
414 + - s\_first\_error\_ino
415 + - inode involved in first error.
418 + - s\_first\_error\_block
419 + - Number of block involved of first error.
422 + - s\_first\_error\_func[32]
423 + - Name of function where the error happened.
426 + - s\_first\_error\_line
427 + - Line number where error happened.
430 + - s\_last\_error\_time
431 + - Time of most recent error, in seconds since the epoch.
434 + - s\_last\_error\_ino
435 + - inode involved in most recent error.
438 + - s\_last\_error\_line
439 + - Line number where most recent error happened.
442 + - s\_last\_error\_block
443 + - Number of block involved in most recent error.
446 + - s\_last\_error\_func[32]
447 + - Name of function where the most recent error happened.
450 + - s\_mount\_opts[64]
451 + - ASCIIZ string of mount options.
454 + - s\_usr\_quota\_inum
455 + - Inode number of user `quota <quota>`__ file.
458 + - s\_grp\_quota\_inum
459 + - Inode number of group `quota <quota>`__ file.
462 + - s\_overhead\_blocks
463 + - Overhead blocks/clusters in fs. (Huh? This field is always zero, which
464 + means that the kernel calculates it dynamically.)
467 + - s\_backup\_bgs[2]
468 + - Block groups containing superblock backups (if sparse\_super2)
471 + - s\_encrypt\_algos[4]
472 + - Encryption algorithms in use. There can be up to four algorithms in use
473 + at any time; valid algorithm codes are given in the super_encrypt_ table
477 + - s\_encrypt\_pw\_salt[16]
478 + - Salt for the string2key algorithm for encryption.
482 + - Inode number of lost+found
485 + - s\_prj\_quota\_inum
486 + - Inode that tracks project quotas.
489 + - s\_checksum\_seed
490 + - Checksum seed used for metadata\_csum calculations. This value is
491 + crc32c(~0, $orig\_fs\_uuid).
495 + - Padding to the end of the block.
499 + - Superblock checksum.
503 +The superblock state is some combination of the following:
516 + - Orphans being recovered
520 +The superblock error policy is one of the following:
531 + - Remount read-only
537 +The filesystem creator is one of the following:
558 +The superblock revision is one of the following:
569 + - v2 format w/ dynamic inode sizes
571 +Note that ``EXT4_DYNAMIC_REV`` refers to a revision 1 or newer filesystem.
575 +The superblock compatible features field is a combination of any of the
585 + - Directory preallocation (COMPAT\_DIR\_PREALLOC).
587 + - “imagic inodes”. Not clear from the code what this does
588 + (COMPAT\_IMAGIC\_INODES).
590 + - Has a journal (COMPAT\_HAS\_JOURNAL).
592 + - Supports extended attributes (COMPAT\_EXT\_ATTR).
594 + - Has reserved GDT blocks for filesystem expansion
595 + (COMPAT\_RESIZE\_INODE). Requires RO\_COMPAT\_SPARSE\_SUPER.
597 + - Has directory indices (COMPAT\_DIR\_INDEX).
599 + - “Lazy BG”. Not in Linux kernel, seems to have been for uninitialized
600 + block groups? (COMPAT\_LAZY\_BG)
602 + - “Exclude inode”. Not used. (COMPAT\_EXCLUDE\_INODE).
604 + - “Exclude bitmap”. Seems to be used to indicate the presence of
605 + snapshot-related exclude bitmaps? Not defined in kernel or used in
606 + e2fsprogs (COMPAT\_EXCLUDE\_BITMAP).
608 + - Sparse Super Block, v2. If this flag is set, the SB field s\_backup\_bgs
609 + points to the two block groups that contain backup superblocks
610 + (COMPAT\_SPARSE\_SUPER2).
614 +The superblock incompatible features field is a combination of any of the
624 + - Compression (INCOMPAT\_COMPRESSION).
626 + - Directory entries record the file type. See ext4\_dir\_entry\_2 below
627 + (INCOMPAT\_FILETYPE).
629 + - Filesystem needs recovery (INCOMPAT\_RECOVER).
631 + - Filesystem has a separate journal device (INCOMPAT\_JOURNAL\_DEV).
633 + - Meta block groups. See the earlier discussion of this feature
634 + (INCOMPAT\_META\_BG).
636 + - Files in this filesystem use extents (INCOMPAT\_EXTENTS).
638 + - Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT).
640 + - Multiple mount protection. Not implemented (INCOMPAT\_MMP).
642 + - Flexible block groups. See the earlier discussion of this feature
643 + (INCOMPAT\_FLEX\_BG).
645 + - Inodes can be used to store large extended attribute values
646 + (INCOMPAT\_EA\_INODE).
648 + - Data in directory entry (INCOMPAT\_DIRDATA). (Not implemented?)
650 + - Metadata checksum seed is stored in the superblock. This feature enables
651 + the administrator to change the UUID of a metadata\_csum filesystem
652 + while the filesystem is mounted; without it, the checksum definition
653 + requires all metadata blocks to be rewritten (INCOMPAT\_CSUM\_SEED).
655 + - Large directory >2GB or 3-level htree (INCOMPAT\_LARGEDIR). Prior to
656 + this feature, directories could not be larger than 4GiB and could not
657 + have an htree more than 2 levels deep. If this feature is enabled,
658 + directories can be larger than 4GiB and have a maximum htree depth of 3.
660 + - Data in inode (INCOMPAT\_INLINE\_DATA).
662 + - Encrypted inodes are present on the filesystem. (INCOMPAT\_ENCRYPT).
666 +The superblock read-only compatible features field is a combination of any of
676 + - Sparse superblocks. See the earlier discussion of this feature
677 + (RO\_COMPAT\_SPARSE\_SUPER).
679 + - This filesystem has been used to store a file greater than 2GiB
680 + (RO\_COMPAT\_LARGE\_FILE).
682 + - Not used in kernel or e2fsprogs (RO\_COMPAT\_BTREE\_DIR).
684 + - This filesystem has files whose sizes are represented in units of
685 + logical blocks, not 512-byte sectors. This implies a very large file
686 + indeed! (RO\_COMPAT\_HUGE\_FILE)
688 + - Group descriptors have checksums. In addition to detecting corruption,
689 + this is useful for lazy formatting with uninitialized groups
690 + (RO\_COMPAT\_GDT\_CSUM).
692 + - Indicates that the old ext3 32,000 subdirectory limit no longer applies
693 + (RO\_COMPAT\_DIR\_NLINK). A directory's i\_links\_count will be set to 1
694 + if it is incremented past 64,999.
696 + - Indicates that large inodes exist on this filesystem
697 + (RO\_COMPAT\_EXTRA\_ISIZE).
699 + - This filesystem has a snapshot (RO\_COMPAT\_HAS\_SNAPSHOT).
701 + - `Quota <Quota>`__ (RO\_COMPAT\_QUOTA).
703 + - This filesystem supports “bigalloc”, which means that file extents are
704 + tracked in units of clusters (of blocks) instead of blocks
705 + (RO\_COMPAT\_BIGALLOC).
707 + - This filesystem supports metadata checksumming.
708 + (RO\_COMPAT\_METADATA\_CSUM; implies RO\_COMPAT\_GDT\_CSUM, though
709 + GDT\_CSUM must not be set)
711 + - Filesystem supports replicas. This feature is neither in the kernel nor
712 + e2fsprogs. (RO\_COMPAT\_REPLICA)
714 + - Read-only filesystem image; the kernel will not mount this image
715 + read-write and most tools will refuse to write to the image.
716 + (RO\_COMPAT\_READONLY)
718 + - Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
722 +The ``s_def_hash_version`` field is one of the following:
737 + - Legacy, unsigned.
739 + - Half MD4, unsigned.
743 +.. _super_mountopts:
745 +The ``s_default_mount_opts`` field is any combination of the following:
754 + - Print debugging info upon (re)mount. (EXT4\_DEFM\_DEBUG)
756 + - New files take the gid of the containing directory (instead of the fsgid
757 + of the current process). (EXT4\_DEFM\_BSDGROUPS)
759 + - Support userspace-provided extended attributes. (EXT4\_DEFM\_XATTR\_USER)
761 + - Support POSIX access control lists (ACLs). (EXT4\_DEFM\_ACL)
763 + - Do not support 32-bit UIDs. (EXT4\_DEFM\_UID16)
765 + - All data and metadata are commited to the journal.
766 + (EXT4\_DEFM\_JMODE\_DATA)
768 + - All data are flushed to the disk before metadata are committed to the
769 + journal. (EXT4\_DEFM\_JMODE\_ORDERED)
771 + - Data ordering is not preserved; data may be written after the metadata
772 + has been written. (EXT4\_DEFM\_JMODE\_WBACK)
774 + - Disable write flushes. (EXT4\_DEFM\_NOBARRIER)
776 + - Track which blocks in a filesystem are metadata and therefore should not
777 + be used as data blocks. This option will be enabled by default on 3.18,
778 + hopefully. (EXT4\_DEFM\_BLOCK\_VALIDITY)
780 + - Enable DISCARD support, where the storage device is told about blocks
781 + becoming unused. (EXT4\_DEFM\_DISCARD)
783 + - Disable delayed allocation. (EXT4\_DEFM\_NODELALLOC)
787 +The ``s_flags`` field is any combination of the following:
796 + - Signed directory hash in use.
798 + - Unsigned directory hash in use.
800 + - To test development code.
804 +The ``s_encrypt_algos`` list can contain any of the following:
813 + - Invalid algorithm (ENCRYPTION\_MODE\_INVALID).
815 + - 256-bit AES in XTS mode (ENCRYPTION\_MODE\_AES\_256\_XTS).
817 + - 256-bit AES in GCM mode (ENCRYPTION\_MODE\_AES\_256\_GCM).
819 + - 256-bit AES in CBC mode (ENCRYPTION\_MODE\_AES\_256\_CBC).
821 +Total size of the superblock is 1024 bytes.