1 .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
4 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
7 .TH pid_namespaces 7 (date) "Linux man-pages (unreleased)"
9 pid_namespaces \- overview of Linux PID namespaces
11 For an overview of namespaces, see
14 PID namespaces isolate the process ID number space,
15 meaning that processes in different PID namespaces can have the same PID.
16 PID namespaces allow containers to provide functionality
17 such as suspending/resuming the set of processes in the container and
18 migrating the container to a new host
19 while the processes inside the container maintain the same PIDs.
21 PIDs in a new PID namespace start at 1,
22 somewhat like a standalone system, and calls to
27 will produce processes with PIDs that are unique within the namespace.
29 Use of PID namespaces requires a kernel that is configured with the
33 .\" ============================================================
35 .SS The namespace "init" process
36 The first process created in a new namespace
37 (i.e., the process created using
41 flag, or the first child created by a process after a call to
45 flag) has the PID 1, and is the "init" process for the namespace (see
47 This process becomes the parent of any child processes that are orphaned
48 because a process that resides in this PID namespace terminated
49 (see below for further details).
51 If the "init" process of a PID namespace terminates,
52 the kernel terminates all of the processes in the namespace via a
55 This behavior reflects the fact that the "init" process
56 is essential for the correct operation of a PID namespace.
57 In this case, a subsequent
59 into this PID namespace fail with the error
61 it is not possible to create a new process in a PID namespace whose "init"
62 process has terminated.
63 Such scenarios can occur when, for example,
64 a process uses an open file descriptor for a
65 .IR /proc/ pid /ns/pid
66 file corresponding to a process that was in a namespace to
68 into that namespace after the "init" process has terminated.
69 Another possible scenario can occur after a call to
71 if the first child subsequently created by a
73 terminates, then subsequent calls to
78 Only signals for which the "init" process has established a signal handler
79 can be sent to the "init" process by other members of the PID namespace.
80 This restriction applies even to privileged processes,
81 and prevents other members of the PID namespace from
82 accidentally killing the "init" process.
84 Likewise, a process in an ancestor namespace
85 can\[em]subject to the usual permission checks described in
87 signals to the "init" process of a child PID namespace only
88 if the "init" process has established a handler for that signal.
89 (Within the handler, the
98 are treated exceptionally:
99 these signals are forcibly delivered when sent from an ancestor PID namespace.
100 Neither of these signals can be caught by the "init" process,
101 and so will result in the usual actions associated with those signals
102 (respectively, terminating and stopping the process).
104 Starting with Linux 3.4, the
106 system call causes a signal to be sent to the namespace "init" process.
111 .\" ============================================================
113 .SS Nesting PID namespaces
114 PID namespaces can be nested:
115 each PID namespace has a parent,
116 except for the initial ("root") PID namespace.
117 The parent of a PID namespace is the PID namespace of the process that
118 created the namespace using
122 PID namespaces thus form a tree,
123 with all namespaces ultimately tracing their ancestry to the root namespace.
125 .\" commit f2302505775fd13ba93f034206f1e2a587017929
126 .\" The kernel constant MAX_PID_NS_LEVEL
127 the kernel limits the maximum nesting depth for PID namespaces to 32.
129 A process is visible to other processes in its PID namespace,
130 and to the processes in each direct ancestor PID namespace
131 going back to the root PID namespace.
132 In this context, "visible" means that one process
133 can be the target of operations by another process using
134 system calls that specify a process ID.
135 Conversely, the processes in a child PID namespace can't see
136 processes in the parent and further removed ancestor namespaces.
137 More succinctly: a process can see (e.g., send signals with
141 etc.) only processes contained in its own PID namespace
142 and in descendants of that namespace.
144 A process has one process ID in each of the layers of the PID
145 namespace hierarchy in which is visible,
146 and walking back though each direct ancestor namespace
147 through to the root PID namespace.
148 System calls that operate on process IDs always
149 operate using the process ID that is visible in the
150 PID namespace of the caller.
153 always returns the PID associated with the namespace in which
154 the process was created.
156 Some processes in a PID namespace may have parents
157 that are outside of the namespace.
158 For example, the parent of the initial process in the namespace
161 process with PID 1) is necessarily in another namespace.
162 Likewise, the direct children of a process that uses
164 to cause its children to join a PID namespace are in a different
165 PID namespace from the caller of
169 for such processes return 0.
171 While processes may freely descend into child PID namespaces
174 with a PID namespace file descriptor),
175 they may not move in the other direction.
176 That is to say, processes may not enter any ancestor namespaces
177 (parent, grandparent, etc.).
178 Changing PID namespaces is a one-way operation.
183 operation can be used to discover the parental relationship
184 between PID namespaces; see
187 .\" ============================================================
189 .SS setns(2) and unshare(2) semantics
192 that specify a PID namespace file descriptor
197 flag cause children subsequently created
198 by the caller to be placed in a different PID namespace from the caller.
199 (Since Linux 4.12, that PID namespace is shown via the
200 .IR /proc/ pid /ns/pid_for_children
201 file, as described in
203 These calls do not, however,
204 change the PID namespace of the calling process,
205 because doing so would change the caller's idea of its own PID
208 which would break many applications and libraries.
210 To put things another way:
211 a process's PID namespace membership is determined when the process is created
212 and cannot be changed thereafter.
213 Among other things, this means that the parental relationship
214 between processes mirrors the parental relationship between PID namespaces:
215 the parent of a process is either in the same namespace
216 or resides in the immediate parent PID namespace.
223 After it has performed this operation, its
224 .IR /proc/ pid /ns/pid_for_children
225 symbolic link will be empty until the first child is created in the namespace.
227 .\" ============================================================
229 .SS Adoption of orphaned children
230 When a child process becomes orphaned, it is reparented to the "init"
231 process in the PID namespace of its parent
232 (unless one of the nearer ancestors of the parent employed the
234 .B PR_SET_CHILD_SUBREAPER
235 command to mark itself as the reaper of orphaned descendant processes).
236 Note that because of the
240 semantics described above, this may be the "init" process in the PID
241 namespace that is the
243 of the child's PID namespace,
244 rather than the "init" process in the child's own PID namespace.
245 .\" Furthermore, by definition, the parent of the "init" process
246 .\" of a PID namespace resides in the parent PID namespace.
248 .\" ============================================================
250 .SS Compatibility of CLONE_NEWPID with other CLONE_* flags
251 In current versions of Linux,
253 can't be combined with
255 Threads are required to be in the same PID namespace such that
256 the threads in a process can send signals to each other.
257 Similarly, it must be possible to see all of the threads
261 Additionally, if two threads were in different PID
262 namespaces, the process ID of the process sending a signal
263 could not be meaningfully encoded when a signal is sent
264 (see the description of the
268 Since this is computed when a signal is enqueued,
269 a signal queue shared by processes in multiple PID namespaces
272 .\" Note these restrictions were all introduced in
273 .\" 8382fcac1b813ad0a4e68a838fc7ae93fa39eda0
274 .\" when CLONE_NEWPID|CLONE_VM was disallowed
275 In earlier versions of Linux,
277 was additionally disallowed (failing with the error
281 .\" (restriction lifted in faf00da544045fdc1454f3b9e6d7f65c841de302)
282 (before Linux 4.3) as well as
283 .\" (restriction lifted in e79f525e99b04390ca4d2366309545a836c03bf1)
286 The changes that lifted these restrictions have also been ported to
287 earlier stable kernels.
289 .\" ============================================================
291 .SS /proc and PID namespaces
294 filesystem shows (in the
296 directories) only processes visible in the PID namespace
297 of the process that performed the mount, even if the
299 filesystem is viewed from processes in other namespaces.
301 After creating a new PID namespace,
302 it is useful for the child to change its root directory
303 and mount a new procfs instance at
305 so that tools such as
308 If a new mount namespace is simultaneously created by including
316 then it isn't necessary to change the root directory:
317 a new procfs instance can be mounted directly over
320 From a shell, the command to mount
326 $ mount \-t proc proc /proc
334 yields the process ID of the caller in the PID namespace of the procfs mount
335 (i.e., the PID namespace of the process that mounted the procfs).
336 This can be useful for introspection purposes,
337 when a process wants to discover its PID in other namespaces.
339 .\" ============================================================
343 .BR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)"
344 .\" commit b8f566b04d3cddd192cfd2418ae6d54ac6353792
346 (which is virtualized per PID namespace)
347 displays the last PID that was allocated in this PID namespace.
348 When the next PID is allocated,
349 the kernel will search for the lowest unallocated PID
350 that is greater than this value,
351 and when this file is subsequently read it will show that PID.
353 This file is writable by a process that has the
356 .B CAP_CHECKPOINT_RESTORE
357 capability inside the user namespace that owns the PID namespace.
358 .\" This ability is necessary to support checkpoint restore in user-space
359 This makes it possible to determine the PID that is allocated
360 to the next process that is created inside this PID namespace.
362 .\" ============================================================
365 When a process ID is passed over a UNIX domain socket to a
366 process in a different PID namespace (see the description of
370 it is translated into the corresponding PID value in
371 the receiving process's PID namespace.
376 .BR user_namespaces (7).
383 .BR capabilities (7),
385 .BR mount_namespaces (7),
387 .BR user_namespaces (7),