document-case-insensitive-directories

   1 docs: ext4.rst: document case-insensitive directories
   2
   3 From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
   4
   5 Introduces the case-insensitive features on ext4 for system
   6 administrators.  Explain the minimum of design decisions that are
   7 important for sysadmins wanting to enable this feature.
   8
   9 Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
  10 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
  11 ---
  12  Documentation/admin-guide/ext4.rst | 38 ++++++++++++++++++++++++++++++
  13  1 file changed, 38 insertions(+)
  14
  15 diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
  16 index e506d3dae510..9e8b35ed7cd9 100644
  17 --- a/Documentation/admin-guide/ext4.rst
  18 +++ b/Documentation/admin-guide/ext4.rst
  19 @@ -91,10 +91,48 @@ Currently Available
  20  * large block (up to pagesize) support
  21  * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
  22    the ordering)
  23 +* Case-insensitive file name lookups
  24
  25  [1] Filesystems with a block size of 1k may see a limit imposed by the
  26  directory hash tree having a maximum depth of two.
  27
  28 +case-insensitive file name lookups
  29 +======================================================
  30 +
  31 +The case-insensitive file name lookup feature is supported on a
  32 +per-directory basis, allowing the user to mix case-insensitive and
  33 +case-sensitive directories in the same filesystem.  It is enabled by
  34 +flipping the +F inode attribute of an empty directory.  The
  35 +case-insensitive string match operation is only defined when we know how
  36 +text in encoded in a byte sequence.  For that reason, in order to enable
  37 +case-insensitive directories, the filesystem must have the
  38 +casefold feature, which stores the filesystem-wide encoding
  39 +model used.  By default, the charset adopted is the latest version of
  40 +Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
  41 +form.  The comparison algorithm is implemented by normalizing the
  42 +strings to the Canonical decomposition form, as defined by Unicode,
  43 +followed by a byte per byte comparison.
  44 +
  45 +The case-awareness is name-preserving on the disk, meaning that the file
  46 +name provided by userspace is a byte-per-byte match to what is actually
  47 +written in the disk.  The Unicode normalization format used by the
  48 +kernel is thus an internal representation, and not exposed to the
  49 +userspace nor to the disk, with the important exception of disk hashes,
  50 +used on large case-insensitive directories with DX feature.  On DX
  51 +directories, the hash must be calculated using the casefolded version of
  52 +the filename, meaning that the normalization format used actually has an
  53 +impact on where the directory entry is stored.
  54 +
  55 +When we change from viewing filenames as opaque byte sequences to seeing
  56 +them as encoded strings we need to address what happens when a program
  57 +tries to create a file with an invalid name.  The Unicode subsystem
  58 +within the kernel leaves the decision of what to do in this case to the
  59 +filesystem, which select its preferred behavior by enabling/disabling
  60 +the strict mode.  When Ext4 encounters one of those strings and the
  61 +filesystem did not require strict mode, it falls back to considering the
  62 +entire string as an opaque byte sequence, which still allows the user to
  63 +operate on that file, but the case-insensitive lookups won't work.
  64 +
  65  Options
  66  =======
  67
  68 --
  69 2.20.1
  70
  71