2 .\" Copyright (c) 2016, 2019, 2021 by Michael Kerrisk <mtk.manpages@gmail.com>
4 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
7 .TH mount_namespaces 7 (date) "Linux man-pages (unreleased)"
9 mount_namespaces \- overview of Linux mount namespaces
11 For an overview of namespaces, see
14 Mount namespaces provide isolation of the list of mounts seen
15 by the processes in each namespace instance.
16 Thus, the processes in each of the mount namespace instances
17 will see distinct single-directory hierarchies.
19 The views provided by the
20 .IR /proc/ pid /mounts ,
21 .IR /proc/ pid /mountinfo ,
23 .IR /proc/ pid /mountstats
24 files (all described in
26 correspond to the mount namespace in which the process with the PID
29 (All of the processes that reside in the same mount namespace
30 will see the same view in these files.)
32 A new mount namespace is created using either
39 When a new mount namespace is created,
40 its mount list is initialized as follows:
42 If the namespace is created using
44 the mount list of the child's namespace is a copy
45 of the mount list in the parent process's mount namespace.
47 If the namespace is created using
49 the mount list of the new namespace is a copy of
50 the mount list in the caller's previous mount namespace.
52 Subsequent modifications to the mount list
56 in either mount namespace will not (by default) affect the
57 mount list seen in the other namespace
58 (but see the following discussion of shared subtrees).
61 After the implementation of mount namespaces was completed,
62 experience showed that the isolation that they provided was,
63 in some cases, too great.
64 For example, in order to make a newly loaded optical disk
65 available in all mount namespaces,
66 a mount operation was required in each namespace.
67 For this use case, and others,
68 the shared subtree feature was introduced in Linux 2.6.15.
69 This feature allows for automatic, controlled propagation of
75 (or, more precisely, between the mounts that are members of a
77 that are propagating events to one another).
79 Each mount is marked (via
81 as having one of the following
82 .IR "propagation types" :
85 This mount shares events with members of a peer group.
89 events immediately under this mount will propagate
90 to the other mounts that are members of the peer group.
92 here means that the same
96 will automatically occur
97 under all of the other mounts in the peer group.
102 events that take place under
103 peer mounts will propagate to this mount.
106 This mount is private; it does not have a peer group.
110 events do not propagate into or out of this mount.
116 events propagate into this mount from
117 a (master) shared peer group.
121 events under this mount do not propagate to any peer.
123 Note that a mount can be the slave of another peer group
124 while at the same time sharing
129 with a peer group of which it is a member.
130 (More precisely, one peer group can be the slave of another peer group.)
133 This is like a private mount,
134 and in addition this mount can't be bind mounted.
135 Attempts to bind mount this mount
141 When a recursive bind mount
147 flags) is performed on a directory subtree,
148 any bind mounts within the subtree are automatically pruned
149 (i.e., not replicated)
150 when replicating that subtree to produce the target subtree.
152 For a discussion of the propagation type assigned to a new mount,
155 The propagation type is a per-mount-point setting;
156 some mounts may be marked as shared
157 (with each shared mount being a member of a distinct peer group),
158 while others are private
159 (or slaved or unbindable).
161 Note that a mount's propagation type determines whether
167 the mount are propagated.
168 Thus, the propagation type does not affect propagation of events for
169 grandchildren and further removed descendant mounts.
170 What happens if the mount itself is unmounted is determined by
171 the propagation type that is in effect for the
175 Members are added to a
177 when a mount is marked as shared and either:
179 the mount is replicated during the creation of a new mount namespace; or
181 a new bind mount is created from the mount.
183 In both of these cases, the new mount joins the peer group
184 of which the existing mount is a member.
186 A new peer group is also created when a child mount is created under
187 an existing mount that is marked as shared.
188 In this case, the new child mount is also marked as shared and
189 the resulting peer group consists of all the mounts
190 that are replicated under the peers of parent mounts.
192 A mount ceases to be a member of a peer group when either
193 the mount is explicitly unmounted,
194 or when the mount is implicitly unmounted because a mount namespace is removed
195 (because it has no more member processes).
197 The propagation type of the mounts in a mount namespace
198 can be discovered via the "optional fields" exposed in
199 .IR /proc/ pid /mountinfo .
202 for details of this file.)
203 The following tags can appear in the optional fields
204 for a record in that file:
207 This mount is shared in peer group
209 Each peer group has a unique ID that is automatically
210 generated by the kernel,
211 and all mounts in the same peer group will show the same ID.
212 (These IDs are assigned starting from the value 1,
213 and may be recycled when a peer group ceases to have any members.)
216 This mount is a slave to shared peer group
219 .IR propagate_from:X " (since Linux 2.6.26)"
220 .\" commit 97e7e0f71d6d948c25f11f0a33878d9356d9579e
221 This mount is a slave and receives propagation from shared peer group
223 This tag will always appear in conjunction with a
228 is the closest dominant peer group under the process's root directory.
231 is the immediate master of the mount,
232 or if there is no dominant peer group under the same root,
235 field is present and not the
238 For further details, see below.
241 This is an unbindable mount.
243 If none of the above tags is present, then this is a private mount.
244 .SS MS_SHARED and MS_PRIVATE example
245 Suppose that on a terminal in the initial mount namespace,
246 we mark one mount as shared and another as private,
247 and then view the mounts in
248 .IR /proc/self/mountinfo :
252 sh1# \fBmount \-\-make\-shared /mntS\fP
253 sh1# \fBmount \-\-make\-private /mntP\fP
254 sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
255 77 61 8:17 / /mntS rw,relatime shared:1
256 83 61 8:15 / /mntP rw,relatime
261 .I /proc/self/mountinfo
264 is a shared mount in peer group 1, and that
266 has no optional tags, indicating that it is a private mount.
267 The first two fields in each record in this file are the unique
268 ID for this mount, and the mount ID of the parent mount.
269 We can further inspect this file to see that the parent mount of
273 is the root directory,
275 which is mounted as private:
279 sh1# \fBcat /proc/self/mountinfo | awk \[aq]$1 == 61\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
280 61 0 8:2 / / rw,relatime
284 On a second terminal,
285 we create a new mount namespace where we run a second shell
286 and inspect the mounts:
290 $ \fBPS1=\[aq]sh2# \[aq] sudo unshare \-m \-\-propagation unchanged sh\fP
291 sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
292 222 145 8:17 / /mntS rw,relatime shared:1
293 225 145 8:15 / /mntP rw,relatime
297 The new mount namespace received a copy of the initial mount namespace's
299 These new mounts maintain the same propagation types,
300 but have unique mount IDs.
302 .I \-\-propagation\~unchanged
305 from marking all mounts as private when creating a new mount namespace,
306 .\" Since util-linux 2.27
307 which it does by default.)
309 In the second terminal, we then create submounts under each of
313 and inspect the set-up:
317 sh2# \fBmkdir /mntS/a\fP
318 sh2# \fBmount /dev/sdb6 /mntS/a\fP
319 sh2# \fBmkdir /mntP/b\fP
320 sh2# \fBmount /dev/sdb7 /mntP/b\fP
321 sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
322 222 145 8:17 / /mntS rw,relatime shared:1
323 225 145 8:15 / /mntP rw,relatime
324 178 222 8:22 / /mntS/a rw,relatime shared:2
325 230 225 8:23 / /mntP/b rw,relatime
329 From the above, it can be seen that
331 was created as shared (inheriting this setting from its parent mount) and
333 was created as a private mount.
335 Returning to the first terminal and inspecting the set-up,
336 we see that the new mount created under the shared mount
338 propagated to its peer mount (in the initial mount namespace),
339 but the new mount created under the private mount
345 sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
346 77 61 8:17 / /mntS rw,relatime shared:1
347 83 61 8:15 / /mntP rw,relatime
348 179 77 8:22 / /mntS/a rw,relatime shared:2
353 Making a mount a slave allows it to receive propagated
357 events from a master shared peer group,
358 while preventing it from propagating events to that master.
359 This is useful if we want to (say) receive a mount event when
360 an optical disk is mounted in the master shared peer group
361 (in another mount namespace),
366 events under the slave mount
367 from having side effects in other namespaces.
369 We can demonstrate the effect of slaving by first marking
370 two mounts as shared in the initial mount namespace:
374 sh1# \fBmount \-\-make\-shared /mntX\fP
375 sh1# \fBmount \-\-make\-shared /mntY\fP
376 sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
377 132 83 8:23 / /mntX rw,relatime shared:1
378 133 83 8:22 / /mntY rw,relatime shared:2
382 On a second terminal,
383 we create a new mount namespace and inspect the mounts:
387 sh2# \fBunshare \-m \-\-propagation unchanged sh\fP
388 sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
389 168 167 8:23 / /mntX rw,relatime shared:1
390 169 167 8:22 / /mntY rw,relatime shared:2
394 In the new mount namespace, we then mark one of the mounts as a slave:
398 sh2# \fBmount \-\-make\-slave /mntY\fP
399 sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
400 168 167 8:23 / /mntX rw,relatime shared:1
401 169 167 8:22 / /mntY rw,relatime master:2
405 From the above output, we see that
407 is now a slave mount that is receiving propagation events from
408 the shared peer group with the ID 2.
410 Continuing in the new namespace, we create submounts under each of
417 sh2# \fBmkdir /mntX/a\fP
418 sh2# \fBmount /dev/sda3 /mntX/a\fP
419 sh2# \fBmkdir /mntY/b\fP
420 sh2# \fBmount /dev/sda5 /mntY/b\fP
424 When we inspect the state of the mounts in the new mount namespace,
427 was created as a new shared mount
428 (inheriting the "shared" setting from its parent mount) and
430 was created as a private mount:
434 sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
435 168 167 8:23 / /mntX rw,relatime shared:1
436 169 167 8:22 / /mntY rw,relatime master:2
437 173 168 8:3 / /mntX/a rw,relatime shared:3
438 175 169 8:5 / /mntY/b rw,relatime
442 Returning to the first terminal (in the initial mount namespace),
443 we see that the mount
445 propagated to the peer (the shared
453 sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
454 132 83 8:23 / /mntX rw,relatime shared:1
455 133 83 8:22 / /mntY rw,relatime shared:2
456 174 132 8:3 / /mntX/a rw,relatime shared:3
460 Now we create a new mount under
466 sh1# \fBmkdir /mntY/c\fP
467 sh1# \fBmount /dev/sda1 /mntY/c\fP
468 sh1# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
469 132 83 8:23 / /mntX rw,relatime shared:1
470 133 83 8:22 / /mntY rw,relatime shared:2
471 174 132 8:3 / /mntX/a rw,relatime shared:3
472 178 133 8:1 / /mntY/c rw,relatime shared:4
476 When we examine the mounts in the second mount namespace,
477 we see that in this case the new mount has been propagated
479 and that the new mount is itself a slave mount (to peer group 4):
483 sh2# \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
484 168 167 8:23 / /mntX rw,relatime shared:1
485 169 167 8:22 / /mntY rw,relatime master:2
486 173 168 8:3 / /mntX/a rw,relatime shared:3
487 175 169 8:5 / /mntY/b rw,relatime
488 179 169 8:1 / /mntY/c rw,relatime master:4
492 .SS MS_UNBINDABLE example
493 One of the primary purposes of unbindable mounts is to avoid
494 the "mount explosion" problem when repeatedly performing bind mounts
495 of a higher-level subtree at a lower-level mount.
496 The problem is illustrated by the following shell session.
498 Suppose we have a system with the following mounts:
502 # \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
509 Suppose furthermore that we wish to recursively bind mount
510 the root directory under several users' home directories.
511 We do this for the first user, and inspect the mounts:
515 # \fBmount \-\-rbind / /home/cecilia/\fP
516 # \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
520 /dev/sda1 on /home/cecilia
521 /dev/sdb6 on /home/cecilia/mntX
522 /dev/sdb7 on /home/cecilia/mntY
526 When we repeat this operation for the second user,
527 we start to see the explosion problem:
531 # \fBmount \-\-rbind / /home/henry\fP
532 # \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
536 /dev/sda1 on /home/cecilia
537 /dev/sdb6 on /home/cecilia/mntX
538 /dev/sdb7 on /home/cecilia/mntY
539 /dev/sda1 on /home/henry
540 /dev/sdb6 on /home/henry/mntX
541 /dev/sdb7 on /home/henry/mntY
542 /dev/sda1 on /home/henry/home/cecilia
543 /dev/sdb6 on /home/henry/home/cecilia/mntX
544 /dev/sdb7 on /home/henry/home/cecilia/mntY
550 we have not only recursively added the
554 mounts, but also the recursive mounts of those directories under
556 that were created in the previous step.
557 Upon repeating the step for a third user,
558 it becomes obvious that the explosion is exponential in nature:
562 # \fBmount \-\-rbind / /home/otto\fP
563 # \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
567 /dev/sda1 on /home/cecilia
568 /dev/sdb6 on /home/cecilia/mntX
569 /dev/sdb7 on /home/cecilia/mntY
570 /dev/sda1 on /home/henry
571 /dev/sdb6 on /home/henry/mntX
572 /dev/sdb7 on /home/henry/mntY
573 /dev/sda1 on /home/henry/home/cecilia
574 /dev/sdb6 on /home/henry/home/cecilia/mntX
575 /dev/sdb7 on /home/henry/home/cecilia/mntY
576 /dev/sda1 on /home/otto
577 /dev/sdb6 on /home/otto/mntX
578 /dev/sdb7 on /home/otto/mntY
579 /dev/sda1 on /home/otto/home/cecilia
580 /dev/sdb6 on /home/otto/home/cecilia/mntX
581 /dev/sdb7 on /home/otto/home/cecilia/mntY
582 /dev/sda1 on /home/otto/home/henry
583 /dev/sdb6 on /home/otto/home/henry/mntX
584 /dev/sdb7 on /home/otto/home/henry/mntY
585 /dev/sda1 on /home/otto/home/henry/home/cecilia
586 /dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
587 /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
591 The mount explosion problem in the above scenario can be avoided
592 by making each of the new mounts unbindable.
593 The effect of doing this is that recursive mounts of the root
594 directory will not replicate the unbindable mounts.
595 We make such a mount for the first user:
599 # \fBmount \-\-rbind \-\-make\-unbindable / /home/cecilia\fP
603 Before going further, we show that unbindable mounts are indeed unbindable:
608 # \fBmount \-\-bind /home/cecilia /mntZ\fP
609 mount: wrong fs type, bad option, bad superblock on /home/cecilia,
610 missing codepage or helper program, or other error
612 In some cases useful info is found in syslog \- try
617 Now we create unbindable recursive bind mounts for the other two users:
621 # \fBmount \-\-rbind \-\-make\-unbindable / /home/henry\fP
622 # \fBmount \-\-rbind \-\-make\-unbindable / /home/otto\fP
626 Upon examining the list of mounts,
627 we see there has been no explosion of mounts,
628 because the unbindable mounts were not replicated
629 under each user's directory:
633 # \fBmount | awk \[aq]{print $1, $2, $3}\[aq]\fP
637 /dev/sda1 on /home/cecilia
638 /dev/sdb6 on /home/cecilia/mntX
639 /dev/sdb7 on /home/cecilia/mntY
640 /dev/sda1 on /home/henry
641 /dev/sdb6 on /home/henry/mntX
642 /dev/sdb7 on /home/henry/mntY
643 /dev/sda1 on /home/otto
644 /dev/sdb6 on /home/otto/mntX
645 /dev/sdb7 on /home/otto/mntY
649 .SS Propagation type transitions
650 The following table shows the effect that applying a new propagation type
652 .IR mount\~\-\-make\-xxxx )
653 has on the existing propagation type of a mount.
654 The rows correspond to existing propagation types,
655 and the columns are the new propagation settings.
656 For reasons of space, "private" is abbreviated as "priv" and
657 "unbindable" as "unbind".
661 make-shared make-slave make-priv make-unbind
663 shared shared slave/priv [1] priv unbind
664 slave slave+shared slave [2] priv unbind
665 slave+shared slave+shared slave priv unbind
666 private shared priv [2] priv unbind
667 unbindable shared unbind [2] priv unbind
670 Note the following details to the table:
672 If a shared mount is the only mount in its peer group,
673 making it a slave automatically makes it private.
675 Slaving a nonshared mount has no effect on the mount.
677 .SS Bind (MS_BIND) semantics
678 Suppose that the following command is performed:
682 mount \-\-bind A/a B/b
690 is the destination mount,
692 is a subdirectory path under the mount point
696 is a subdirectory path under the mount point
698 The propagation type of the resulting mount,
700 depends on the propagation types of the mounts
704 and is summarized in the following table.
707 lb2 lb1 lb2 lb2 lb2 lb0
708 lb2 lb1 lb2 lb2 lb2 lb0
711 shared private slave unbind
713 dest(B) shared shared shared slave+shared invalid
714 nonshared shared private slave invalid
717 Note that a recursive bind of a subtree follows the same semantics
718 as for a bind operation on each mount in the subtree.
719 (Unbindable mounts are automatically pruned at the target mount point.)
721 For further details, see
722 .I Documentation/filesystems/sharedsubtree.rst
723 in the kernel source tree.
725 .SS Move (MS_MOVE) semantics
726 Suppose that the following command is performed:
738 is the destination mount, and
740 is a subdirectory path under the mount point
742 The propagation type of the resulting mount,
744 depends on the propagation types of the mounts
748 and is summarized in the following table.
751 lb2 lb1 lb2 lb2 lb2 lb0
752 lb2 lb1 lb2 lb2 lb2 lb0
755 shared private slave unbind
757 dest(B) shared shared shared slave+shared invalid
758 nonshared shared private slave unbindable
761 Note: moving a mount that resides under a shared mount is invalid.
763 For further details, see
764 .I Documentation/filesystems/sharedsubtree.rst
765 in the kernel source tree.
768 Suppose that we use the following command to create a mount:
778 is the destination mount, and
780 is a subdirectory path under the mount point
782 The propagation type of the resulting mount,
784 follows the same rules as for a bind mount,
785 where the propagation type of the source mount
786 is considered always to be private.
788 .SS Unmount semantics
789 Suppose that we use the following command to tear down a mount:
803 is the parent mount and
805 is a subdirectory path under the mount point
809 is shared, then all most-recently-mounted mounts at
811 on mounts that receive propagation from mount
813 and do not have submounts under them are unmounted.
815 .SS The /proc/ pid /mountinfo "propagate_from" tag
818 tag is shown in the optional fields of a
819 .IR /proc/ pid /mountinfo
820 record in cases where a process can't see a slave's immediate master
821 (i.e., the pathname of the master is not reachable from
822 the filesystem root directory)
823 and so cannot determine the
824 chain of propagation between the mounts it can see.
826 In the following example, we first create a two-link master-slave chain
834 command is used to make the
836 mount point unreachable from the root directory,
837 creating a situation where the master of
839 is not reachable from the (new) root directory of the process.
841 First, we bind mount the root directory onto
847 so that after the later
851 filesystem remains visible at the correct location
852 in the chroot-ed environment.
856 # \fBmkdir \-p /mnt/proc\fP
857 # \fBmount \-\-bind / /mnt\fP
858 # \fBmount \-\-bind /proc /mnt/proc\fP
862 Next, we ensure that the
864 mount is a shared mount in a new peer group (with no peers):
868 # \fBmount \-\-make\-private /mnt\fP # Isolate from any previous peer group
869 # \fBmount \-\-make\-shared /mnt\fP
870 # \fBcat /proc/self/mountinfo | grep \[aq]/mnt\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
871 239 61 8:2 / /mnt ... shared:102
872 248 239 0:4 / /mnt/proc ... shared:5
883 # \fBmkdir \-p /tmp/etc\fP
884 # \fBmount \-\-bind /mnt/etc /tmp/etc\fP
885 # \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
886 239 61 8:2 / /mnt ... shared:102
887 248 239 0:4 / /mnt/proc ... shared:5
888 267 40 8:2 /etc /tmp/etc ... shared:102
892 Initially, these two mounts are in the same peer group,
900 so that it can propagate events to the next slave in the chain:
904 # \fBmount \-\-make\-slave /tmp/etc\fP
905 # \fBmount \-\-make\-shared /tmp/etc\fP
906 # \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
907 239 61 8:2 / /mnt ... shared:102
908 248 239 0:4 / /mnt/proc ... shared:5
909 267 40 8:2 /etc /tmp/etc ... shared:105 master:102
917 Again, the two mounts are initially in the same peer group,
925 # \fBmkdir \-p /mnt/tmp/etc\fP
926 # \fBmount \-\-bind /tmp/etc /mnt/tmp/etc\fP
927 # \fBmount \-\-make\-slave /mnt/tmp/etc\fP
928 # \fBcat /proc/self/mountinfo | egrep \[aq]/mnt|/tmp/\[aq] | sed \[aq]s/ \- .*//\[aq]\fP
929 239 61 8:2 / /mnt ... shared:102
930 248 239 0:4 / /mnt/proc ... shared:5
931 267 40 8:2 /etc /tmp/etc ... shared:105 master:102
932 273 239 8:2 /etc /mnt/tmp/etc ... master:105
936 From the above, we see that
938 is the master of the slave
940 which in turn is the master of the slave
947 directory, which renders the mount with ID 267 unreachable
948 from the (new) root directory:
956 When we examine the state of the mounts inside the chroot-ed environment,
957 we see the following:
961 # \fBcat /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
962 239 61 8:2 / / ... shared:102
963 248 239 0:4 / /proc ... shared:5
964 273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
968 Above, we see that the mount with ID 273
969 is a slave whose master is the peer group 105.
970 The mount point for that master is unreachable, and so a
972 tag is displayed, indicating that the closest dominant peer group
973 (i.e., the nearest reachable mount in the slave chain)
974 is the peer group with the ID 102 (corresponding to the
976 mount point before the
981 Mount namespaces first appeared in Linux 2.4.19.
983 Namespaces are a Linux-specific feature.
986 The propagation type assigned to a new mount depends
987 on the propagation type of the parent mount.
988 If the mount has a parent (i.e., it is a non-root mount
989 point) and the propagation type of the parent is
991 then the propagation type of the new mount is also
993 Otherwise, the propagation type of the new mount is
996 Notwithstanding the fact that the default propagation type
997 for new mount is in many cases
1000 is typically more useful.
1003 automatically remounts all mounts as
1006 Thus, on most modern systems, the default propagation type is in practice
1009 Since, when one uses
1011 to create a mount namespace,
1012 the goal is commonly to provide full isolation of the mounts
1013 in the new namespace,
1017 2.27) in turn reverses the step performed by
1019 by making all mounts private in the new namespace.
1022 performs the equivalent of the following in the new mount namespace:
1026 mount \-\-make\-rprivate /
1030 To prevent this, one can use the
1031 .I \-\-propagation\~unchanged
1035 An application that creates a new mount namespace directly using
1039 may desire to prevent propagation of mount events to other mount namespaces
1042 This can be done by changing the propagation type of
1043 mounts in the new namespace to either
1047 using a call such as the following:
1051 mount(NULL, "/", MS_SLAVE | MS_REC, NULL);
1055 For a discussion of propagation types when moving mounts
1057 and creating bind mounts
1060 .IR Documentation/filesystems/sharedsubtree.rst .
1062 .\" ============================================================
1064 .SS Restrictions on mount namespaces
1065 Note the following points with respect to mount namespaces:
1067 Each mount namespace has an owner user namespace.
1068 As explained above, when a new mount namespace is created,
1069 its mount list is initialized as a copy of the mount list
1070 of another mount namespace.
1071 If the new namespace and the namespace from which the mount list
1072 was copied are owned by different user namespaces,
1073 then the new mount namespace is considered
1074 .IR "less privileged" .
1076 When creating a less privileged mount namespace,
1077 shared mounts are reduced to slave mounts.
1078 This ensures that mappings performed in less
1079 privileged mount namespaces will not propagate to more privileged
1082 Mounts that come as a single unit from a more privileged mount namespace are
1083 locked together and may not be separated in a less privileged mount
1088 operation brings across all of the mounts from the original
1089 mount namespace as a single unit,
1090 and recursive mounts that propagate between
1091 mount namespaces propagate as a single unit.)
1093 In this context, "may not be separated" means that the mounts
1094 are locked so that they may not be individually unmounted.
1095 Consider the following example:
1100 # \fBmount \-\-bind /dev/null /etc/shadow\fP
1101 # \fBcat /etc/shadow\fP # Produces no output
1105 The above steps, performed in a more privileged mount namespace,
1106 have created a bind mount that
1107 obscures the contents of the shadow password file,
1109 For security reasons, it should not be possible to
1111 that mount in a less privileged mount namespace,
1112 since that would reveal the contents of
1115 Suppose we now create a new mount namespace
1116 owned by a new user namespace.
1117 The new mount namespace will inherit copies of all of the mounts
1118 from the previous mount namespace.
1119 However, those mounts will be locked because the new mount namespace
1121 Consequently, an attempt to
1123 the mount fails as show
1124 in the following step:
1128 # \fBunshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
1129 \fBstrace \-o /tmp/log \e\fP
1130 \fBumount /mnt/dir\fP
1131 umount: /etc/shadow: not mounted.
1132 # \fBgrep \[aq]\[ha]umount\[aq] /tmp/log\fP
1133 umount2("/etc/shadow", 0) = \-1 EINVAL (Invalid argument)
1137 The error message from
1139 is a little confusing, but the
1141 output reveals that the underlying
1143 system call failed with the error
1145 which is the error that the kernel returns to indicate that
1146 the mount is locked.
1148 Note, however, that it is possible to stack (and unstack) a
1149 mount on top of one of the inherited locked mounts in a
1150 less privileged mount namespace:
1154 # \fBecho \[aq]aaaaa\[aq] > /tmp/a\fP # File to mount onto /etc/shadow
1155 # \fBunshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
1156 \fBsh \-c \[aq]mount \-\-bind /tmp/a /etc/shadow; cat /etc/shadow\[aq]\fP
1158 # \fBumount /etc/shadow\fP
1164 command above, which is performed in the initial mount namespace,
1167 file once more visible in that namespace.
1169 Following on from point [3],
1170 note that it is possible to
1172 an entire subtree of mounts that
1173 propagated as a unit into a less privileged mount namespace,
1174 as illustrated in the following example.
1176 First, we create new user and mount namespaces using
1178 In the new mount namespace,
1179 the propagation type of all mounts is set to private.
1180 We then create a shared bind mount at
1182 and a small hierarchy of mounts underneath that mount.
1186 $ \fBPS1=\[aq]ns1# \[aq] sudo unshare \-\-user \-\-map\-root\-user \e\fP
1187 \fB\-\-mount \-\-propagation private bash\fP
1188 ns1# \fBecho $$\fP # We need the PID of this shell later
1190 ns1# \fBmount \-\-make\-shared \-\-bind /mnt /mnt\fP
1191 ns1# \fBmkdir /mnt/x\fP
1192 ns1# \fBmount \-\-make\-private \-t tmpfs none /mnt/x\fP
1193 ns1# \fBmkdir /mnt/x/y\fP
1194 ns1# \fBmount \-\-make\-private \-t tmpfs none /mnt/x/y\fP
1195 ns1# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
1196 986 83 8:5 /mnt /mnt rw,relatime shared:344
1197 989 986 0:56 / /mnt/x rw,relatime
1198 990 989 0:57 / /mnt/x/y rw,relatime
1202 Continuing in the same shell session,
1203 we then create a second shell in a new user namespace and a new
1204 (less privileged) mount namespace and
1205 check the state of the propagated mounts rooted at
1210 ns1# \fBPS1=\[aq]ns2# \[aq] unshare \-\-user \-\-map\-root\-user \e\fP
1211 \fB\-\-mount \-\-propagation unchanged bash\fP
1212 ns2# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
1213 1239 1204 8:5 /mnt /mnt rw,relatime master:344
1214 1240 1239 0:56 / /mnt/x rw,relatime
1215 1241 1240 0:57 / /mnt/x/y rw,relatime
1219 Of note in the above output is that the propagation type of the mount
1221 has been reduced to slave, as explained in point [2].
1222 This means that submount events will propagate from the master
1224 in "ns1", but propagation will not occur in the opposite direction.
1226 From a separate terminal window, we then use
1228 to enter the mount and user namespaces corresponding to "ns1".
1229 In that terminal window, we then recursively bind mount
1236 $ \fBPS1=\[aq]ns3# \[aq] sudo nsenter \-t 778501 \-\-user \-\-mount\fP
1237 ns3# \fBmount \-\-rbind \-\-make\-private /mnt/x /mnt/ppp\fP
1238 ns3# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
1239 986 83 8:5 /mnt /mnt rw,relatime shared:344
1240 989 986 0:56 / /mnt/x rw,relatime
1241 990 989 0:57 / /mnt/x/y rw,relatime
1242 1242 986 0:56 / /mnt/ppp rw,relatime
1243 1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518
1247 Because the propagation type of the parent mount,
1249 was shared, the recursive bind mount propagated a small subtree of
1250 mounts under the slave mount
1253 as can be verified by executing the following command in that shell session:
1257 ns2# \fBgrep /mnt /proc/self/mountinfo | sed \[aq]s/ \- .*//\[aq]\fP
1258 1239 1204 8:5 /mnt /mnt rw,relatime master:344
1259 1240 1239 0:56 / /mnt/x rw,relatime
1260 1241 1240 0:57 / /mnt/x/y rw,relatime
1261 1244 1239 0:56 / /mnt/ppp rw,relatime
1262 1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518
1266 While it is not possible to
1268 a part of the propagated subtree
1274 as shown by the following commands:
1278 ns2# \fBumount /mnt/ppp/y\fP
1279 umount: /mnt/ppp/y: not mounted.
1280 ns2# \fBumount \-l /mnt/ppp | sed \[aq]s/ \- .*//\[aq]\fP # Succeeds...
1281 ns2# \fBgrep /mnt /proc/self/mountinfo\fP
1282 1239 1204 8:5 /mnt /mnt rw,relatime master:344
1283 1240 1239 0:56 / /mnt/x rw,relatime
1284 1241 1240 0:57 / /mnt/x/y rw,relatime
1294 and the "atime" flags
1298 settings become locked
1299 .\" commit 9566d6742852c527bf5af38af5cbb878dad75705
1300 .\" Author: Eric W. Biederman <ebiederm@xmission.com>
1301 .\" Date: Mon Jul 28 17:26:07 2014 -0700
1303 .\" mnt: Correct permission checks in do_remount
1305 when propagated from a more privileged to
1306 a less privileged mount namespace,
1307 and may not be changed in the less privileged mount namespace.
1309 This point is illustrated in the following example where,
1310 in a more privileged mount namespace,
1311 we create a bind mount that is marked as read-only.
1312 For security reasons,
1313 it should not be possible to make the mount writable in
1314 a less privileged mount namespace, and indeed the kernel prevents this:
1318 $ \fBsudo mkdir /mnt/dir\fP
1319 $ \fBsudo mount \-\-bind \-o ro /some/path /mnt/dir\fP
1320 $ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
1321 \fBmount \-o remount,rw /mnt/dir\fP
1322 mount: /mnt/dir: permission denied.
1326 .\" (As of 3.18-rc1 (in Al Viro's 2014-08-30 vfs.git#for-next tree))
1327 A file or directory that is a mount point in one namespace that is not
1328 a mount point in another namespace, may be renamed, unlinked, or removed
1330 in the mount namespace in which it is not a mount point
1331 (subject to the usual permission checks).
1332 Consequently, the mount point is removed in the mount namespace
1333 where it was a mount point.
1335 Previously (before Linux 3.18),
1336 .\" mtk: The change was in Linux 3.18, I think, with this commit:
1337 .\" commit 8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe
1338 .\" Author: Eric W. Biederman <ebiederman@twitter.com>
1339 .\" Date: Tue Oct 1 18:33:48 2013 -0700
1341 .\" vfs: Lazily remove mounts on unlinked files and directories.
1342 attempting to unlink, rename, or remove a file or directory
1343 that was a mount point in another mount namespace would result in the error
1345 That behavior had technical problems of enforcement (e.g., for NFS)
1346 and permitted denial-of-service attacks against more privileged users
1347 (i.e., preventing individual files from being updated
1348 by bind mounting on top of them).
1356 .BR mount_setattr (2),
1363 .BR user_namespaces (7),
1366 .BR pam_namespace (8),
1370 .I Documentation/filesystems/sharedsubtree.rst
1371 in the kernel source tree.