1 .\" Copyright (c) 2021 by Christian Brauner <christian.brauner@ubuntu.com>
3 .\" %%%LICENSE_START(VERBATIM)
4 .\" Permission is granted to make and distribute verbatim copies of this
5 .\" manual provided the copyright notice and this permission notice are
6 .\" preserved on all copies.
8 .\" Permission is granted to copy and distribute modified versions of this
9 .\" manual under the conditions for verbatim copying, provided that the
10 .\" entire resulting derived work is distributed under the terms of a
11 .\" permission notice identical to this one.
13 .\" Since the Linux kernel and libraries are constantly changing, this
14 .\" manual page may be incorrect or out-of-date. The author(s) assume no
15 .\" responsibility for errors or omissions, or for damages resulting from
16 .\" the use of the information contained herein. The author(s) may not
17 .\" have taken the same level of care in the production of this manual,
18 .\" which is licensed free of charge, as they might when working
21 .\" Formatted or processed versions of this manual, if unaccompanied by
22 .\" the source, must acknowledge the copyright and authors of this work.
25 .TH MOUNT_SETATTR 2 2021-03-22 "Linux" "Linux Programmer's Manual"
27 mount_setattr \- change properties of a mount or mount tree
32 .BR "#include <linux/fcntl.h>" " /* Definition of " AT_* " constants */"
33 .BR "#include <linux/mount.h>" " /* Definition of " MOUNT_ATTR_* " constants */"
34 .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
35 .B #include <unistd.h>
37 .BI "int syscall(SYS_mount_setattr, int " dirfd ", const char *" pathname ,
38 .BI " unsigned int " flags ", struct mount_attr *" attr \
43 glibc provides no wrapper for
45 necessitating the use of
50 system call changes the mount properties of a mount or an entire mount tree.
53 is a relative pathname,
54 then it is interpreted relative to
55 the directory referred to by the file descriptor
63 is interpreted relative to
64 the current working directory of the calling process.
67 is the empty string and
71 then the mount properties of the mount identified by
76 for an explanation of why the
82 system call uses an extensible structure
83 .RI ( "struct mount_attr" )
84 to allow for future extensions.
85 Any non-flag extensions to
87 will be implemented as new fields appended to the this structure,
88 with a zero value in a new field resulting in the kernel behaving
89 as though that extension field was not present.
93 zero-fill this structure on initialization.
94 See the "Extensibility" subsection under
100 argument should usually be specified as
101 .IR "sizeof(struct mount_attr)" .
102 However, if the caller is using a kernel that supports an extended
103 .IR "struct mount_attr" ,
104 but the caller does not intend to make use of these features,
105 it is possible to pass the size of an earlier
106 version of the structure together with the extended structure.
107 This allows the kernel to not copy later parts of the structure
108 that aren't used anyway.
109 With each extension that changes the size of
110 .IR "struct mount_attr" ,
111 the kernel will expose a definition of the form
112 .BI MOUNT_ATTR_SIZE_VER number\c
114 For example, the macro for the size of the initial version of
117 .BR MOUNT_ATTR_SIZE_VER0 .
121 argument can be used to alter the pathname resolution behavior.
122 The supported values are:
128 change the mount properties on
133 Change the mount properties of the entire mount tree.
135 .B AT_SYMLINK_NOFOLLOW
136 Don't follow trailing symbolic links.
139 Don't trigger automounts.
145 is a structure of the following form:
150 __u64 attr_set; /* Mount properties to set */
151 __u64 attr_clr; /* Mount properties to clear */
152 __u64 propagation; /* Mount propagation type */
153 __u64 userns_fd; /* User namespace file descriptor */
162 members are used to specify the mount properties that
163 are supposed to be set or cleared for a mount or mount tree.
166 enable a property on a mount or mount tree,
169 remove a property from a mount or mount tree.
171 When changing mount properties,
172 the kernel will first clear the flags specified
176 and then set the flags specified in the
179 For example, these settings:
183 struct mount_attr attr = {
184 .attr_clr = MOUNT_ATTR_NOEXEC | MOUNT_ATTR_NODEV,
185 .attr_set = MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOSUID,
190 are equivalent to the following steps:
194 unsigned int current_mnt_flags = mnt->mnt_flags;
197 * Clear all flags set in .attr_clr,
198 * clearing MOUNT_ATTR_NOEXEC and MOUNT_ATTR_NODEV.
200 current_mnt_flags &= ~attr->attr_clr;
203 * Now set all flags set in .attr_set,
204 * applying MOUNT_ATTR_RDONLY and MOUNT_ATTR_NOSUID.
206 current_mnt_flags |= attr->attr_set;
208 mnt->mnt_flags = current_mnt_flags;
212 As a result of this change, the mount or mount tree (a) is read-only;
213 (b) blocks the execution of set-user-ID and set-group-ID programs;
214 (c) allows execution of programs; and (d) allows access to devices.
216 Multiple changes with the same set of flags requested
221 are guaranteed to be idempotent after the changes have been applied.
223 The following mount attributes can be specified in the
232 makes the mount read-only.
235 removes the read-only setting if set on the mount.
240 causes the mount not to honor the set-user-ID and set-group-ID mode bits and
241 file capabilities when executing programs.
244 clears the set-user-ID, set-group-ID,
245 and file capability restriction if set on this mount.
250 prevents access to devices on this mount.
253 removes the restriction that prevented accessing devices on this mount.
258 prevents executing programs on this mount.
261 removes the restriction that prevented executing programs on this mount.
263 .B MOUNT_ATTR_NOSYMFOLLOW
266 prevents following symbolic links on this mount.
269 removes the restriction that prevented following symbolic links on this mount.
271 .B MOUNT_ATTR_NODIRATIME
274 prevents updating access time for directories on this mount.
277 removes the restriction that prevented updating access time for directories.
279 .B MOUNT_ATTR_NODIRATIME
280 can be combined with other access-time settings
281 and is implied by the noatime setting.
282 All other access-time settings are mutually exclusive.
284 .BR MOUNT_ATTR__ATIME " - changing access-time settings"
285 The access-time values listed below are an enumeration that
286 includes the value zero, expressed in the bits defined by the mask
287 .BR MOUNT_ATTR__ATIME .
288 Even though these bits are an enumeration
289 (in contrast to the other mount flags such as
290 .BR MOUNT_ATTR_NOEXEC ),
291 they are nonetheless passed in
297 which introduced this behavior.
300 since the access-time values are an enumeration rather than bit values,
301 a caller wanting to transition to a different access-time setting
302 cannot simply specify the access-time setting in
304 but must also include
309 The kernel will verify that
311 isn't partially set in
313 (i.e., either all bits in the
315 bit field are either set or clear), and that
317 doesn't have any access-time bits set if
323 .B MOUNT_ATTR_RELATIME
324 When a file is accessed via this mount,
325 update the file's last access time (atime)
326 only if the current value of atime is less than or equal to
327 the file's last modification time (mtime) or last status change time (ctime).
329 To enable this access-time setting on a mount or mount tree,
330 .B MOUNT_ATTR_RELATIME
339 .B MOUNT_ATTR_NOATIME
340 Do not update access times for (all types of) files on this mount.
342 To enable this access-time setting on a mount or mount tree,
343 .B MOUNT_ATTR_NOATIME
352 .B MOUNT_ATTR_STRICTATIME
353 Always update the last access time (atime)
354 when files are accessed on this mount.
356 To enable this access-time setting on a mount or mount tree,
357 .B MOUNT_ATTR_STRICTATIME
370 creates an ID-mapped mount.
371 The ID mapping is taken from the user namespace specified in
373 and attached to the mount.
375 Since it is not supported to
376 change the ID mapping of a mount after it has been ID mapped,
377 it is invalid to specify
382 For further details, see the subsection "ID-mapped mounts" under NOTES.
386 field is used to specify the propagation type of the mount or mount tree.
387 This field either has the value zero,
388 meaning leave the propagation type unchanged, or it has one of
389 the following values:
392 Turn all mounts into private mounts.
395 Turn all mounts into shared mounts.
398 Turn all mounts into dependent mounts.
401 Turn all mounts into unbindable mounts.
403 For further details on the above propagation types, see
404 .BR mount_namespaces (7).
412 is set to indicate the cause of the error.
421 nor a valid file descriptor.
425 is not a valid file descriptor.
428 The caller tried to change the mount to
429 .BR MOUNT_ATTR_RDONLY ,
430 but the mount still holds files open for writing.
433 The pathname specified via the
442 An unsupported value was set in
446 An unsupported value was specified in the
452 An unsupported value was specified in the
458 An unsupported value was specified in the
476 An access-time setting was specified in the
490 A file descriptor value was specified in
496 A valid file descriptor value was specified in
498 but the file descriptor did not refer to a user namespace.
501 The underlying filesystem does not support ID-mapped mounts.
504 The mount that is to be ID mapped is not a detached mount;
505 that is, the mount has not previously been visible in a mount namespace.
508 A partial access-time setting was specified in
515 The mount is located outside the caller's mount namespace.
518 The underlying filesystem has been mounted in a mount namespace that is
519 owned by a noninitial user namespace
522 A pathname was empty or had a nonexistent component.
525 When changing mount propagation to
527 a new peer group ID needs to be allocated for all mounts without a peer group
529 This allocation failed because there was not
530 enough memory to allocate the relevant internal structures.
533 When changing mount propagation to
535 a new peer group ID needs to be allocated for all mounts without a peer group
537 This allocation failed because
538 the kernel has run out of IDs.
539 .\" Christian Bruner: i.e. someone has somehow managed to
540 .\" allocate so many peer groups and managed to keep the kernel running
541 .\" (???) that the ida has ran out of ids
542 .\" Note that technically further error codes are possible that are
543 .\" specific to the ID allocation implementation used.
546 One of the mounts had at least one of
547 .BR MOUNT_ATTR_NOATIME ,
548 .BR MOUNT_ATTR_NODEV ,
549 .BR MOUNT_ATTR_NODIRATIME ,
550 .BR MOUNT_ATTR_NOEXEC ,
551 .BR MOUNT_ATTR_NOSUID ,
554 set and the flag is locked.
555 Mount attributes become locked on a mount if:
558 A new mount or mount tree is created causing mount propagation across user
560 (i.e., propagation to a mount namespace owned by a different user namespace).
561 The kernel will lock the aforementioned flags to prevent these sensitive
562 properties from being altered.
564 A new mount and user namespace pair is created.
565 This happens for example when specifying
566 .B CLONE_NEWUSER | CLONE_NEWNS
572 The aforementioned flags become locked in the new mount namespace
573 to prevent sensitive mount properties from being altered.
574 Since the newly created mount namespace will be owned by the
575 newly created user namespace,
576 a calling process that is privileged in the new
577 user namespace would\(emin the absence of such locking\(embe
578 able to alter sensitive mount properties (e.g., to remount a mount
579 that was marked read-only as read-write in the new mount namespace).
583 A valid file descriptor value was specified in
585 but the file descriptor refers to the initial user namespace.
588 An attempt was made to add an ID mapping to a mount that is already ID mapped.
591 The caller does not have
593 in the initial user namespace.
596 first appeared in Linux 5.12.
597 .\" commit 7d6beb71da3cc033649d641e1e608713b8220290
598 .\" commit 2a1867219c7b27f928e2545782b86daaf9ad50bd
599 .\" commit 9caccd41541a6f7d6279928d9f971f6642c361af
605 Creating an ID-mapped mount makes it possible to
606 change the ownership of all files located under a mount.
607 Thus, ID-mapped mounts make it possible to
608 change ownership in a temporary and localized way.
609 It is a localized change because the ownership changes are
610 visible only via a specific mount.
611 All other users and locations where the filesystem is exposed are unaffected.
612 It is a temporary change because
613 the ownership changes are tied to the lifetime of the mount.
615 Whenever callers interact with the filesystem through an ID-mapped mount,
616 the ID mapping of the mount will be applied to
617 user and group IDs associated with filesystem objects.
618 This encompasses the user and group IDs associated with inodes
619 and also the following
623 .IR security.capability ,
624 whenever filesystem capabilities
625 are stored or returned in the
626 .B VFS_CAP_REVISION_3
628 which stores a root user ID alongside the capabilities
630 .BR capabilities (7)).
632 .I system.posix_acl_access
634 .IR system.posix_acl_default ,
635 whenever user IDs or group IDs are stored in
641 The following conditions must be met in order to create an ID-mapped mount:
643 The caller must have the
645 capability in the initial user namespace.
647 The filesystem must be mounted in a mount namespace
648 that is owned by the initial user namespace.
650 The underlying filesystem must support ID-mapped mounts.
656 filesystems support ID-mapped mounts
657 with more filesystems being actively worked on.
659 The mount must not already be ID-mapped.
660 This also implies that the ID mapping of a mount cannot be altered.
662 The mount must be a detached mount;
664 it must have been created by calling
668 flag and it must not already have been visible in a mount namespace.
669 (To put things another way:
670 the mount must not have been attached to the filesystem hierarchy
671 with a system call such as
674 ID mappings can be created for user IDs, group IDs, and project IDs.
675 An ID mapping is essentially a mapping of a range of user or group IDs into
676 another or the same range of user or group IDs.
677 ID mappings are written to map files as three numbers
678 separated by white space.
679 The first two numbers specify the starting user or group ID
680 in each of the two user namespaces.
681 The third number specifies the range of the ID mapping.
683 a mapping for user IDs such as "1000\ 1001\ 1" would indicate that
684 user ID 1000 in the caller's user namespace is mapped to
685 user ID 1001 in its ancestor user namespace.
686 Since the map range is 1,
687 only user ID 1000 is mapped.
689 It is possible to specify up to 340 ID mappings for each ID mapping type.
690 If any user IDs or group IDs are not mapped,
691 all files owned by that unmapped user or group ID will appear as
692 being owned by the overflow user ID or overflow group ID respectively.
694 Further details on setting up ID mappings can be found in
695 .BR user_namespaces (7).
697 In the common case, the user namespace passed in
703 to create an ID-mapped mount will be the user namespace of a container.
704 In other scenarios it will be a dedicated user namespace associated with
705 a user's login session as is the case for portable home directories in
706 .BR systemd-homed.service (8)).
707 It is also perfectly fine to create a dedicated user namespace
708 for the sake of ID mapping a mount.
710 ID-mapped mounts can be useful in the following
711 and a variety of other scenarios:
713 Sharing files or filesystems
714 between multiple users or multiple machines,
715 especially in complex scenarios.
717 ID-mapped mounts are used to implement portable home directories in
718 .BR systemd-homed.service (8),
719 where they allow users to move their home directory
720 to an external storage device
721 and use it on multiple computers
722 where they are assigned different user IDs and group IDs.
723 This effectively makes it possible to
724 assign random user IDs and group IDs at login time.
726 Sharing files or filesystems
727 from the host with unprivileged containers.
728 This allows a user to avoid having to change ownership permanently through
731 ID mapping a container's root filesystem.
732 Users don't need to change ownership permanently through
734 Especially for large root filesystems, using
736 can be prohibitively expensive.
738 Sharing files or filesystems
739 between containers with non-overlapping ID mappings.
741 Implementing discretionary access (DAC) permission checking
742 for filesystems lacking a concept of ownership.
744 Efficiently changing ownership on a per-mount basis.
747 changing ownership of large sets of files is instantaneous with
749 This is especially useful when ownership of
750 an entire root filesystem of a virtual machine or container
751 is to be changed as mentioned above.
752 With ID-mapped mounts,
755 system call will be sufficient to change the ownership of all files.
757 Taking the current ownership into account.
758 ID mappings specify precisely
759 what a user or group ID is supposed to be mapped to.
760 This contrasts with the
762 system call which cannot by itself
763 take the current ownership of the files it changes into account.
764 It simply changes the ownership to the specified user ID and group ID.
766 Locally and temporarily restricted ownership changes.
767 ID-mapped mounts make it possible to change ownership locally,
768 restricting the ownership changes to specific mounts,
769 and temporarily as the ownership changes only apply as long as the mount exists.
771 changing ownership via the
773 system call changes the ownership globally and permanently.
776 In order to allow for future extensibility,
778 requires the user-space application to specify the size of the
780 structure that it is passing.
781 By providing this information, it is possible for
783 to provide both forwards- and backwards-compatibility, with
785 acting as an implicit version number.
786 (Because new extension fields will always
787 be appended, the structure size will always increase.)
788 This extensibility design is very similar to other system calls such as
789 .BR perf_setattr (2),
790 .BR perf_event_open (2),
797 be the size of the structure as specified by the user-space application,
800 be the size of the structure which the kernel supports,
801 then there are three cases to consider:
807 then there is no version mismatch and
809 can be used verbatim.
815 then there are some extension fields that the kernel supports
816 which the user-space application is unaware of.
817 Because a zero value in any added extension field signifies a no-op,
818 the kernel treats all of the extension fields
819 not provided by the user-space application
820 as having zero values.
821 This provides backwards-compatibility.
827 then there are some extension fields which the user-space application is aware
828 of but which the kernel does not support.
829 Because any extension field must have its zero values signify a no-op,
830 the kernel can safely ignore the unsupported extension fields
831 if they are all zero.
832 If any unsupported extension fields are non-zero,
833 then \-1 is returned and
837 This provides forwards-compatibility.
839 Because the definition of
841 may change in the future
842 (with new fields being added when system headers are updated),
843 user-space applications should zero-fill
845 to ensure that recompiling the program with new headers will not result in
846 spurious errors at runtime.
847 The simplest way is to use a designated initializer:
851 struct mount_attr attr = {
852 .attr_set = MOUNT_ATTR_RDONLY,
853 .attr_clr = MOUNT_ATTR_NODEV
858 Alternatively, the structure can be zero-filled using
860 or similar functions:
864 struct mount_attr attr;
865 memset(&attr, 0, sizeof(attr));
866 attr.attr_set = MOUNT_ATTR_RDONLY;
867 attr.attr_clr = MOUNT_ATTR_NODEV;
871 A user-space application that wishes to determine which extensions the running
872 kernel supports can do so by conducting a binary search on
874 with a structure which has every byte nonzero
875 (to find the largest value which doesn't produce an error of
880 * This program allows the caller to create a new detached mount
881 * and set various properties on it.
887 #include <linux/mount.h>
888 #include <linux/types.h>
893 #include <sys/syscall.h>
897 mount_setattr(int dirfd, const char *pathname, unsigned int flags,
898 struct mount_attr *attr, size_t size)
900 return syscall(SYS_mount_setattr, dirfd, pathname, flags,
905 open_tree(int dirfd, const char *filename, unsigned int flags)
907 return syscall(SYS_open_tree, dirfd, filename, flags);
911 move_mount(int from_dirfd, const char *from_pathname,
912 int to_dirfd, const char *to_pathname, unsigned int flags)
914 return syscall(SYS_move_mount, from_dirfd, from_pathname,
915 to_dirfd, to_pathname, flags);
918 static const struct option longopts[] = {
919 {"map\-mount", required_argument, NULL, 'a'},
920 {"recursive", no_argument, NULL, 'b'},
921 {"read\-only", no_argument, NULL, 'c'},
922 {"block\-setid", no_argument, NULL, 'd'},
923 {"block\-devices", no_argument, NULL, 'e'},
924 {"block\-exec", no_argument, NULL, 'f'},
925 {"no\-access\-time", no_argument, NULL, 'g'},
926 { NULL, 0, NULL, 0 },
929 #define exit_log(format, ...) do \e
931 fprintf(stderr, format, ##__VA_ARGS__); \e
932 exit(EXIT_FAILURE); \e
936 main(int argc, char *argv[])
938 struct mount_attr *attr = &(struct mount_attr){};
940 bool recursive = false;
944 while ((ret = getopt_long_only(argc, argv, "",
945 longopts, &index)) != \-1) {
948 fd_userns = open(optarg, O_RDONLY | O_CLOEXEC);
949 if (fd_userns == \-1)
950 exit_log("%m \- Failed top open %s\en", optarg);
956 attr\->attr_set |= MOUNT_ATTR_RDONLY;
959 attr\->attr_set |= MOUNT_ATTR_NOSUID;
962 attr\->attr_set |= MOUNT_ATTR_NODEV;
965 attr\->attr_set |= MOUNT_ATTR_NOEXEC;
968 attr\->attr_set |= MOUNT_ATTR_NOATIME;
969 attr\->attr_clr |= MOUNT_ATTR__ATIME;
972 exit_log("Invalid argument specified");
976 if ((argc \- optind) < 2)
977 exit_log("Missing source or target mount point\en");
979 const char *source = argv[optind];
980 const char *target = argv[optind + 1];
982 /* In the following, \-1 as the \(aqdirfd\(aq argument ensures that
983 open_tree() fails if \(aqsource\(aq is not an absolute pathname. */
984 .\" Christian Brauner
985 .\" When writing programs I like to never use relative paths with AT_FDCWD
986 .\" because. Because making assumptions about the current working directory
987 .\" of the calling process is just too easy to get wrong; especially when
988 .\" pivot_root() or chroot() are in play.
989 .\" My absolut preference (joke intended) is to open a well-known starting
990 .\" point with an absolute path to get a dirfd and then scope all future
991 .\" operations beneath that dirfd. This already works with old-style
992 .\" openat() and _very_ cautious programming but openat2() and its
993 .\" resolve-flag space have made this **chef's kiss**.
994 .\" If I can't operate based on a well-known dirfd I use absolute paths
995 .\" with a -EBADF dirfd passed to *at() functions.
997 int fd_tree = open_tree(\-1, source,
998 OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC |
999 AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0));
1001 exit_log("%m \- Failed to open %s\en", source);
1003 if (fd_userns >= 0) {
1004 attr\->attr_set |= MOUNT_ATTR_IDMAP;
1005 attr\->userns_fd = fd_userns;
1008 ret = mount_setattr(fd_tree, "",
1009 AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0),
1010 attr, sizeof(struct mount_attr));
1012 exit_log("%m \- Failed to change mount attributes\en");
1016 /* In the following, \-1 as the \(aqto_dirfd\(aq argument ensures that
1017 open_tree() fails if \(aqtarget\(aq is not an absolute pathname. */
1019 ret = move_mount(fd_tree, "", \-1, target,
1020 MOVE_MOUNT_F_EMPTY_PATH);
1022 exit_log("%m \- Failed to attach mount to %s\en", target);
1036 .BR mount_namespaces (7),
1037 .BR capabilities (7),
1038 .BR user_namespaces (7),