1 .\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
3 .\" %%%LICENSE_START(VERBATIM)
4 .\" Permission is granted to make and distribute verbatim copies of this
5 .\" manual provided the copyright notice and this permission notice are
6 .\" preserved on all copies.
8 .\" Permission is granted to copy and distribute modified versions of this
9 .\" manual under the conditions for verbatim copying, provided that the
10 .\" entire resulting derived work is distributed under the terms of a
11 .\" permission notice identical to this one.
13 .\" Since the Linux kernel and libraries are constantly changing, this
14 .\" manual page may be incorrect or out-of-date. The author(s) assume no
15 .\" responsibility for errors or omissions, or for damages resulting from
16 .\" the use of the information contained herein. The author(s) may not
17 .\" have taken the same level of care in the production of this manual,
18 .\" which is licensed free of charge, as they might when working
21 .\" Formatted or processed versions of this manual, if unaccompanied by
22 .\" the source, must acknowledge the copyright and authors of this work.
25 .\" 6 Aug 2002 - Initial Creation
26 .\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com>
27 .\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
28 .\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER
29 .\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
30 .\" 2008-07-15, Serge Hallyn <serue@us.bbm.com>
31 .\" Document file capabilities, per-process capability
32 .\" bounding set, changed semantics for CAP_SETPCAP,
33 .\" and other changes in 2.6.2[45].
34 .\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP.
36 .\" Add text describing circumstances in which CAP_SETPCAP
37 .\" (theoretically) permits a thread to change the
38 .\" capability sets of another thread.
39 .\" Add section describing rules for programmatically
40 .\" adjusting thread capability sets.
41 .\" Describe rationale for capability bounding set.
42 .\" Document "securebits" flags.
43 .\" Add text noting that if we set the effective flag for one file
44 .\" capability, then we must also set the effective flag for all
45 .\" other capabilities where the permitted or inheritable bit is set.
46 .\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG
48 .TH CAPABILITIES 7 2021-03-22 "Linux" "Linux Programmer's Manual"
50 capabilities \- overview of Linux capabilities
52 For the purpose of performing permission checks,
53 traditional UNIX implementations distinguish two categories of processes:
55 processes (whose effective user ID is 0, referred to as superuser or root),
58 processes (whose effective UID is nonzero).
59 Privileged processes bypass all kernel permission checks,
60 while unprivileged processes are subject to full permission
61 checking based on the process's credentials
62 (usually: effective UID, effective GID, and supplementary group list).
64 Starting with kernel 2.2, Linux divides the privileges traditionally
65 associated with superuser into distinct units, known as
67 which can be independently enabled and disabled.
68 Capabilities are a per-thread attribute.
71 The following list shows the capabilities implemented on Linux,
72 and the operations or behaviors that each capability permits:
74 .BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)"
75 Enable and disable kernel auditing; change auditing filter rules;
76 retrieve auditing status and filtering rules.
78 .BR CAP_AUDIT_READ " (since Linux 3.16)"
79 .\" commit a29b694aa1739f9d76538e34ae25524f9c549d59
80 .\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48
81 Allow reading the audit log via a multicast netlink socket.
83 .BR CAP_AUDIT_WRITE " (since Linux 2.6.11)"
84 Write records to kernel auditing log.
85 .\" FIXME Add FAN_ENABLE_AUDIT
87 .BR CAP_BLOCK_SUSPEND " (since Linux 3.5)"
88 Employ features that can block system suspend
91 .IR /proc/sys/wake_lock ).
93 .BR CAP_BPF " (since Linux 5.8)"
94 Employ privileged BPF operations; see
99 This capability was added in Linux 5.8 to separate out
100 BPF functionality from the overloaded
104 .BR CAP_CHECKPOINT_RESTORE " (since Linux 5.9)"
105 .\" commit 124ea650d3072b005457faed69909221c2905a1f
110 .I /proc/sys/kernel/ns_last_pid
112 .BR pid_namespaces (7));
118 .\" FIXME There is also some use case relating to
119 .\" prctl_set_mm_exe_file(); in the 5.9 sources, see
120 .\" prctl_set_mm_map().
122 read the contents of the symbolic links in
123 .IR /proc/[pid]/map_files
128 This capability was added in Linux 5.9 to separate out
129 checkpoint/restore functionality from the overloaded
134 Make arbitrary changes to file UIDs and GIDs (see
138 Bypass file read, write, and execute permission checks.
139 (DAC is an abbreviation of "discretionary access control".)
141 .B CAP_DAC_READ_SEARCH
145 Bypass file read permission checks and
146 directory read and execute permission checks;
149 .BR open_by_handle_at (2);
154 flag to create a link to a file referred to by a file descriptor.
162 Bypass permission checks on operations that normally
163 require the filesystem UID of the process to match the UID of
167 excluding those operations covered by
170 .BR CAP_DAC_READ_SEARCH ;
173 .BR ioctl_iflags (2))
176 set Access Control Lists (ACLs) on arbitrary files;
178 ignore directory sticky bit on file deletion;
182 extended attributes on sticky directory owned by any user;
186 for arbitrary files in
197 Don't clear set-user-ID and set-group-ID mode
198 bits when a file is modified;
200 set the set-group-ID bit for a file whose GID does not match
201 the filesystem or any of the supplementary GIDs of the calling process.
206 .\" FIXME . As at Linux 3.2, there are some strange uses of this capability
207 .\" in other places; they probably should be replaced with something else.
217 Allocate memory using huge pages
218 .RB ( memfd_create (2)
225 Bypass permission checks for operations on System V IPC objects.
228 Bypass permission checks for sending signals (see
230 This includes use of the
234 .\" FIXME . CAP_KILL also has an effect for threads + setting child
235 .\" termination signal to other than SIGCHLD: without this
236 .\" capability, the termination signal reverts to SIGCHLD
237 .\" if the child does an exec(). What is the rationale
240 .BR CAP_LEASE " (since Linux 2.4)"
241 Establish leases on arbitrary files (see
244 .B CAP_LINUX_IMMUTABLE
250 .BR ioctl_iflags (2)).
252 .BR CAP_MAC_ADMIN " (since Linux 2.6.25)"
253 Allow MAC configuration or state changes.
254 Implemented for the Smack Linux Security Module (LSM).
256 .BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)"
257 Override Mandatory Access Control (MAC).
258 Implemented for the Smack LSM.
260 .BR CAP_MKNOD " (since Linux 2.4)"
261 Create special files using
265 Perform various network-related operations:
269 interface configuration;
271 administration of IP firewall, masquerading, and accounting;
273 modify routing tables;
275 bind to any address for transparent proxying;
277 set type-of-service (TOS);
279 clear driver statistics;
281 set promiscuous mode;
283 enabling multicasting;
287 to set the following socket options:
291 (for a priority outside the range 0 to 6),
298 .B CAP_NET_BIND_SERVICE
299 Bind a socket to Internet domain privileged ports
300 (port numbers less than 1024).
303 (Unused) Make socket broadcasts, and listen to multicasts.
304 .\" FIXME Since Linux 4.2, there are use cases for netlink sockets
305 .\" commit 59324cf35aba5336b611074028777838a963d03b
311 Use RAW and PACKET sockets;
313 bind to any address for transparent proxying.
316 .\" Also various IP options and setsockopt(SO_BINDTODEVICE)
318 .BR CAP_PERFMON " (since Linux 5.8)"
319 Employ various performance-monitoring mechanisms, including:
324 .BR perf_event_open (2);
326 employ various BPF operations that have performance implications.
330 This capability was added in Linux 5.8 to separate out
331 performance monitoring functionality from the overloaded
334 See also the kernel source file
335 .IR Documentation/admin\-guide/perf\-security.rst .
341 Make arbitrary manipulations of process GIDs and supplementary GID list;
343 forge GID when passing socket credentials via UNIX domain sockets;
345 write a group ID mapping in a user namespace (see
346 .BR user_namespaces (7)).
350 .BR CAP_SETFCAP " (since Linux 2.6.24)"
351 Set arbitrary capabilities on a file.
354 If file capabilities are supported (i.e., since Linux 2.6.24):
355 add any capability from the calling thread's bounding set
356 to its inheritable set;
357 drop capabilities from the bounding set (via
359 .BR PR_CAPBSET_DROP );
364 If file capabilities are not supported (i.e., kernels before Linux 2.6.24):
365 grant or remove any capability in the
366 caller's permitted capability set to or from any other process.
369 is not available when the kernel is configured to support
370 file capabilities, since
372 has entirely different semantics for such kernels.)
378 Make arbitrary manipulations of process UIDs
384 forge UID when passing socket credentials via UNIX domain sockets;
386 write a user ID mapping in a user namespace (see
387 .BR user_namespaces (7)).
390 .\" FIXME CAP_SETUID also an effect in exec(); document this.
394 this capability is overloaded; see
395 .IR "Notes to kernel developers" ,
401 Perform a range of system administration operations including:
410 .BR setdomainname (2);
414 operations (since Linux 2.6.37,
416 should be used to permit such operations);
423 access the same checkpoint/restore functionality that is governed by
424 .BR CAP_CHECKPOINT_RESTORE
425 (but the latter, weaker capability is preferred for accessing
428 perform the same BPF operations as are governed by
430 (but the latter, weaker capability is preferred for accessing
433 employ the same performance monitoring mechanisms as are governed by
435 (but the latter, weaker capability is preferred for accessing
442 operations on arbitrary System V IPC objects;
448 perform operations on
452 extended attributes (see
456 .BR lookup_dcookie (2);
462 and (before Linux 2.6.25)
464 I/O scheduling classes;
466 forge PID when passing socket credentials via UNIX domain sockets;
469 .IR /proc/sys/fs/file\-max ,
470 the system-wide limit on the number of open files,
471 in system calls that open files (e.g.,
479 flags that create new namespaces with
483 (but, since Linux 3.8,
484 creating user namespaces does not require any capability);
499 .BR fanotify_init (2);
516 to insert characters into the input queue of a terminal other than
517 the caller's controlling terminal;
527 perform various privileged block-device
531 perform various privileged filesystem
544 filter without first having to set the
548 modify allow/deny rules for device control groups;
552 .B PTRACE_SECCOMP_GET_FILTER
553 operation to dump tracee's seccomp filters;
558 operation to suspend the tracee's seccomp protections (i.e., the
559 .B PTRACE_O_SUSPEND_SECCOMP
562 perform administrative operations on many device drivers;
564 modify autogroup nice values by writing to
565 .IR /proc/[pid]/autogroup
584 change mount namespaces using
593 Load and unload kernel modules
597 .BR delete_module (2));
599 in kernels before 2.6.25:
600 drop capabilities from the system-wide capability bounding set.
608 Lower the process nice value
611 and change the nice value for arbitrary processes;
613 set real-time scheduling policies for calling process,
614 and set scheduling policies and priorities for arbitrary processes
615 .RB ( sched_setscheduler (2),
616 .BR sched_setparam (2),
617 .BR sched_setattr (2));
619 set CPU affinity for arbitrary processes
620 .RB ( sched_setaffinity (2));
622 set I/O scheduling class and priority for arbitrary processes
623 .RB ( ioprio_set (2));
626 .BR migrate_pages (2)
627 to arbitrary processes and allow processes
628 to be migrated to arbitrary nodes;
629 .\" FIXME CAP_SYS_NICE also has the following effect for
630 .\" migrate_pages(2):
631 .\" do_migrate_pages(mm, &old, &new,
632 .\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
638 to arbitrary processes;
657 Trace arbitrary processes using
661 .BR get_robust_list (2)
662 to arbitrary processes;
664 transfer data to or from the memory of arbitrary processes using
665 .BR process_vm_readv (2)
667 .BR process_vm_writev (2);
669 inspect processes using
678 Perform I/O port operations
691 open devices for accessing x86 model-specific registers (MSRs, see
695 .IR /proc/sys/vm/mmap_min_addr ;
697 create memory mappings at addresses below the value specified by
698 .IR /proc/sys/vm/mmap_min_addr ;
708 perform various SCSI device commands;
710 perform certain operations on
716 perform a range of device-specific operations on other devices.
724 Use reserved space on ext2 filesystems;
728 calls controlling ext3 journaling;
730 override disk quota limits;
732 increase resource limits (see
739 override maximum number of consoles on console allocation;
741 override maximum number of keymaps;
743 allow more than 64hz interrupts from the real-time clock;
747 limit for a System V message queue above the limit in
748 .I /proc/sys/kernel/msgmnb
756 resource limit on the number of "in-flight" file descriptors
757 to be bypassed when passing file descriptors to another process
758 via a UNIX domain socket (see
762 .I /proc/sys/fs/pipe\-size\-max
763 limit when setting the capacity of a pipe using the
770 to increase the capacity of a pipe above the limit specified by
771 .IR /proc/sys/fs/pipe\-max\-size ;
774 .IR /proc/sys/fs/mqueue/queues_max ,
775 .IR /proc/sys/fs/mqueue/msg_max ,
777 .I /proc/sys/fs/mqueue/msgsize_max
778 limits when creating POSIX message queues (see
779 .BR mq_overview (7));
787 .IR /proc/[pid]/oom_score_adj
788 to a value lower than the value last set by a process with
789 .BR CAP_SYS_RESOURCE .
795 .RB ( settimeofday (2),
798 set real-time (hardware) clock.
800 .B CAP_SYS_TTY_CONFIG
803 employ various privileged
805 operations on virtual terminals.
807 .BR CAP_SYSLOG " (since Linux 2.6.37)"
816 for information on which operations require privilege.
818 View kernel addresses exposed via
820 and other interfaces when
821 .IR /proc/sys/kernel/kptr_restrict
823 (See the discussion of the
830 .BR CAP_WAKE_ALARM " (since Linux 3.0)"
831 Trigger something that will wake up the system (set
832 .B CLOCK_REALTIME_ALARM
834 .B CLOCK_BOOTTIME_ALARM
837 .SS Past and current implementation
838 A full implementation of capabilities requires that:
840 For all privileged operations,
841 the kernel must check whether the thread has the required
842 capability in its effective set.
844 The kernel must provide system calls allowing a thread's capability sets to
845 be changed and retrieved.
847 The filesystem must support attaching capabilities to an executable file,
848 so that a process gains those capabilities when the file is executed.
850 Before kernel 2.6.24, only the first two of these requirements are met;
851 since kernel 2.6.24, all three requirements are met.
853 .SS Notes to kernel developers
854 When adding a new kernel feature that should be governed by a capability,
855 consider the following points.
857 The goal of capabilities is divide the power of superuser into pieces,
858 such that if a program that has one or more capabilities is compromised,
859 its power to do damage to the system would be less than the same program
860 running with root privilege.
862 You have the choice of either creating a new capability for your new feature,
863 or associating the feature with one of the existing capabilities.
864 In order to keep the set of capabilities to a manageable size,
865 the latter option is preferable,
866 unless there are compelling reasons to take the former option.
867 (There is also a technical limit:
868 the size of capability sets is currently limited to 64 bits.)
870 To determine which existing capability might best be associated
871 with your new feature, review the list of capabilities above in order
872 to find a "silo" into which your new feature best fits.
873 One approach to take is to determine if there are other features
874 requiring capabilities that will always be used along with the new feature.
875 If the new feature is useless without these other features,
876 you should use the same capability as the other features.
881 if you can possibly avoid it!
882 A vast proportion of existing capability checks are associated
883 with this capability (see the partial list above).
884 It can plausibly be called "the new root",
885 since on the one hand, it confers a wide range of powers,
886 and on the other hand,
887 its broad scope means that this is the capability
888 that is required by many privileged programs.
889 Don't make the problem worse.
890 The only new features that should be associated with
894 match existing uses in that silo.
896 If you have determined that it really is necessary to create
897 a new capability for your feature,
898 don't make or name it as a "single-use" capability.
899 Thus, for example, the addition of the highly specific
901 was probably a mistake.
902 Instead, try to identify and name your new capability as a broader
903 silo into which other related future use cases might fit.
905 .SS Thread capability sets
906 Each thread has the following capability sets containing zero or more
907 of the above capabilities:
910 This is a limiting superset for the effective
911 capabilities that the thread may assume.
912 It is also a limiting superset for the capabilities that
913 may be added to the inheritable set by a thread that does not have the
915 capability in its effective set.
917 If a thread drops a capability from its permitted set,
918 it can never reacquire that capability (unless it
920 either a set-user-ID-root program, or
921 a program whose associated file capabilities grant that capability).
924 This is a set of capabilities preserved across an
926 Inheritable capabilities remain inheritable when executing any program,
927 and inheritable capabilities are added to the permitted set when executing
928 a program that has the corresponding bits set in the file inheritable set.
930 Because inheritable capabilities are not generally preserved across
932 when running as a non-root user, applications that wish to run helper
933 programs with elevated capabilities should consider using
934 ambient capabilities, described below.
937 This is the set of capabilities used by the kernel to
938 perform permission checks for the thread.
940 .IR Bounding " (per-thread since Linux 2.6.25)"
941 The capability bounding set is a mechanism that can be used
942 to limit the capabilities that are gained during
945 Since Linux 2.6.25, this is a per-thread capability set.
946 In older kernels, the capability bounding set was a system wide attribute
947 shared by all threads on the system.
949 For more details on the capability bounding set, see below.
951 .IR Ambient " (since Linux 4.3)"
952 .\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08
953 This is a set of capabilities that are preserved across an
955 of a program that is not privileged.
956 The ambient capability set obeys the invariant that no capability
957 can ever be ambient if it is not both permitted and inheritable.
959 The ambient capability set can be directly modified using
961 Ambient capabilities are automatically lowered if either of
962 the corresponding permitted or inheritable capabilities is lowered.
964 Executing a program that changes UID or GID due to the
965 set-user-ID or set-group-ID bits or executing a program that has
966 any file capabilities set will clear the ambient set.
967 Ambient capabilities are added to the permitted set and
968 assigned to the effective set when
971 If ambient capabilities cause a process's permitted and effective
972 capabilities to increase during an
974 this does not trigger the secure-execution mode described in
979 inherits copies of its parent's capability sets.
980 See below for a discussion of the treatment of capabilities during
985 a thread may manipulate its own capability sets (see below).
987 Since Linux 3.2, the file
988 .I /proc/sys/kernel/cap_last_cap
989 .\" commit 73efc0394e148d0e15583e13712637831f926720
990 exposes the numerical value of the highest capability
991 supported by the running kernel;
992 this can be used to determine the highest bit
993 that may be set in a capability set.
995 .SS File capabilities
996 Since kernel 2.6.24, the kernel supports
997 associating capability sets with an executable file using
999 The file capability sets are stored in an extended attribute (see
1004 .IR "security.capability" .
1005 Writing to this extended attribute requires the
1008 The file capability sets,
1009 in conjunction with the capability sets of the thread,
1010 determine the capabilities of a thread after an
1013 The three file capability sets are:
1015 .IR Permitted " (formerly known as " forced ):
1016 These capabilities are automatically permitted to the thread,
1017 regardless of the thread's inheritable capabilities.
1019 .IR Inheritable " (formerly known as " allowed ):
1020 This set is ANDed with the thread's inheritable set to determine which
1021 inheritable capabilities are enabled in the permitted set of
1022 the thread after the
1026 This is not a set, but rather just a single bit.
1027 If this bit is set, then during an
1029 all of the new permitted capabilities for the thread are
1030 also raised in the effective set.
1031 If this bit is not set, then after an
1033 none of the new permitted capabilities is in the new effective set.
1035 Enabling the file effective capability bit implies
1036 that any file permitted or inheritable capability that causes a
1037 thread to acquire the corresponding permitted capability during an
1039 (see the transformation rules described below) will also acquire that
1040 capability in its effective set.
1041 Therefore, when assigning capabilities to a file
1043 .BR cap_set_file (3),
1044 .BR cap_set_fd (3)),
1045 if we specify the effective flag as being enabled for any capability,
1046 then the effective flag must also be specified as enabled
1047 for all other capabilities for which the corresponding permitted or
1048 inheritable flags is enabled.
1050 .SS File capability extended attribute versioning
1051 To allow extensibility,
1052 the kernel supports a scheme to encode a version number inside the
1053 .I security.capability
1054 extended attribute that is used to implement file capabilities.
1055 These version numbers are internal to the implementation,
1056 and not directly visible to user-space applications.
1057 To date, the following versions are supported:
1059 .BR VFS_CAP_REVISION_1
1060 This was the original file capability implementation,
1061 which supported 32-bit masks for file capabilities.
1063 .BR VFS_CAP_REVISION_2 " (since Linux 2.6.25)"
1064 .\" commit e338d263a76af78fe8f38a72131188b58fceb591
1065 This version allows for file capability masks that are 64 bits in size,
1066 and was necessary as the number of supported capabilities grew beyond 32.
1067 The kernel transparently continues to support the execution of files
1068 that have 32-bit version 1 capability masks,
1069 but when adding capabilities to files that did not previously
1070 have capabilities, or modifying the capabilities of existing files,
1071 it automatically uses the version 2 scheme
1072 (or possibly the version 3 scheme, as described below).
1074 .BR VFS_CAP_REVISION_3 " (since Linux 4.14)"
1075 .\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340
1076 Version 3 file capabilities are provided
1077 to support namespaced file capabilities (described below).
1079 As with version 2 file capabilities,
1080 version 3 capability masks are 64 bits in size.
1081 But in addition, the root user ID of namespace is encoded in the
1082 .I security.capability
1084 (A namespace's root user ID is the value that user ID 0
1085 inside that namespace maps to in the initial user namespace.)
1087 Version 3 file capabilities are designed to coexist
1088 with version 2 capabilities;
1089 that is, on a modern Linux system,
1090 there may be some files with version 2 capabilities
1091 while others have version 3 capabilities.
1094 the only kind of file capability extended attribute
1095 that could be attached to a file was a
1096 .B VFS_CAP_REVISION_2
1100 .I security.capability
1101 extended attribute that is attached to a file
1102 depends on the circumstances in which the attribute was created.
1104 Starting with Linux 4.14, a
1105 .I security.capability
1106 extended attribute is automatically created as (or converted to)
1108 .RB ( VFS_CAP_REVISION_3 )
1109 attribute if both of the following are true:
1111 The thread writing the attribute resides in a noninitial user namespace.
1112 (More precisely: the thread resides in a user namespace other
1113 than the one from which the underlying filesystem was mounted.)
1117 capability over the file inode,
1118 meaning that (a) the thread has the
1120 capability in its own user namespace;
1121 and (b) the UID and GID of the file inode have mappings in
1122 the writer's user namespace.
1125 .BR VFS_CAP_REVISION_3
1126 .I security.capability
1127 extended attribute is created, the root user ID of the creating thread's
1128 user namespace is saved in the extended attribute.
1130 By contrast, creating or modifying a
1131 .I security.capability
1132 extended attribute from a privileged
1134 thread that resides in the
1135 namespace where the underlying filesystem was mounted
1136 (this normally means the initial user namespace)
1137 automatically results in the creation of a version 2
1138 .RB ( VFS_CAP_REVISION_2 )
1141 Note that the creation of a version 3
1142 .I security.capability
1143 extended attribute is automatic.
1144 That is to say, when a user-space application writes
1147 .I security.capability
1148 attribute in the version 2 format,
1149 the kernel will automatically create a version 3 attribute
1150 if the attribute is created in the circumstances described above.
1151 Correspondingly, when a version 3
1152 .I security.capability
1153 attribute is retrieved
1155 by a process that resides inside a user namespace that was created by the
1156 root user ID (or a descendant of that user namespace),
1157 the returned attribute is (automatically)
1158 simplified to appear as a version 2 attribute
1159 (i.e., the returned value is the size of a version 2 attribute and does
1160 not include the root user ID).
1161 These automatic translations mean that no changes are required to
1162 user-space tools (e.g.,
1166 in order for those tools to be used to create and retrieve version 3
1167 .I security.capability
1170 Note that a file can have either a version 2 or a version 3
1171 .I security.capability
1172 extended attribute associated with it, but not both:
1173 creation or modification of the
1174 .I security.capability
1175 extended attribute will automatically modify the version
1176 according to the circumstances in which the extended attribute is
1177 created or modified.
1179 .SS Transformation of capabilities during execve()
1182 the kernel calculates the new capabilities of
1183 the process using the following algorithm:
1187 P'(ambient) = (file is privileged) ? 0 : P(ambient)
1189 P'(permitted) = (P(inheritable) & F(inheritable)) |
1190 (F(permitted) & P(bounding)) | P'(ambient)
1192 P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
1194 P'(inheritable) = P(inheritable) [i.e., unchanged]
1196 P'(bounding) = P(bounding) [i.e., unchanged]
1203 denotes the value of a thread capability set before the
1206 denotes the value of a thread capability set after the
1209 denotes a file capability set
1212 Note the following details relating to the above capability
1213 transformation rules:
1215 The ambient capability set is present only since Linux 4.3.
1216 When determining the transformation of the ambient set during
1218 a privileged file is one that has capabilities or
1219 has the set-user-ID or set-group-ID bit set.
1221 Prior to Linux 2.6.25,
1222 the bounding set was a system-wide attribute shared by all threads.
1223 That system-wide value was employed to calculate the new permitted set during
1225 in the same manner as shown above for
1229 during the capability transitions described above,
1230 file capabilities may be ignored (treated as empty) for the same reasons
1231 that the set-user-ID and set-group-ID bits are ignored; see
1233 File capabilities are similarly ignored if the kernel was booted with the
1238 according to the rules above,
1239 if a process with nonzero user IDs performs an
1241 then any capabilities that are present in
1242 its permitted and effective sets will be cleared.
1243 For the treatment of capabilities when a process with a
1244 user ID of zero performs an
1247 .IR "Capabilities and execution of programs by root" .
1249 .SS Safety checking for capability-dumb binaries
1250 A capability-dumb binary is an application that has been
1251 marked to have file capabilities, but has not been converted to use the
1253 API to manipulate its capabilities.
1254 (In other words, this is a traditional set-user-ID-root program
1255 that has been switched to use file capabilities,
1256 but whose code has not been modified to understand capabilities.)
1257 For such applications,
1258 the effective capability bit is set on the file,
1259 so that the file permitted capabilities are automatically
1260 enabled in the process effective set when executing the file.
1261 The kernel recognizes a file which has the effective capability bit set
1262 as capability-dumb for the purpose of the check described here.
1264 When executing a capability-dumb binary,
1265 the kernel checks if the process obtained all permitted capabilities
1266 that were specified in the file permitted set,
1267 after the capability transformations described above have been performed.
1268 (The typical reason why this might
1270 occur is that the capability bounding set masked out some
1271 of the capabilities in the file permitted set.)
1272 If the process did not obtain the full set of
1273 file permitted capabilities, then
1275 fails with the error
1277 This prevents possible security risks that could arise when
1278 a capability-dumb application is executed with less privilege that it needs.
1279 Note that, by definition,
1280 the application could not itself recognize this problem,
1281 since it does not employ the
1285 .SS Capabilities and execution of programs by root
1286 .\" See cap_bprm_set_creds(), bprm_caps_from_vfs_cap() and
1287 .\" handle_privileged_root() in security/commoncap.c (Linux 5.0 source)
1288 In order to mirror traditional UNIX semantics,
1289 the kernel performs special treatment of file capabilities when
1290 a process with UID 0 (root) executes a program and
1291 when a set-user-ID-root program is executed.
1293 After having performed any changes to the process effective ID that
1294 were triggered by the set-user-ID mode bit of the binary\(eme.g.,
1295 switching the effective user ID to 0 (root) because
1296 a set-user-ID-root program was executed\(emthe
1297 kernel calculates the file capability sets as follows:
1299 If the real or effective user ID of the process is 0 (root),
1300 then the file inheritable and permitted sets are ignored;
1301 instead they are notionally considered to be all ones
1302 (i.e., all capabilities enabled).
1303 (There is one exception to this behavior, described below in
1304 .IR "Set-user-ID-root programs that have file capabilities" .)
1306 If the effective user ID of the process is 0 (root) or
1307 the file effective bit is in fact enabled,
1308 then the file effective bit is notionally defined to be one (enabled).
1310 These notional values for the file's capability sets are then used
1311 as described above to calculate the transformation of the process's
1315 Thus, when a process with nonzero UIDs
1317 a set-user-ID-root program that does not have capabilities attached,
1318 or when a process whose real and effective UIDs are zero
1320 a program, the calculation of the process's new
1321 permitted capabilities simplifies to:
1325 P'(permitted) = P(inheritable) | P(bounding)
1327 P'(effective) = P'(permitted)
1331 Consequently, the process gains all capabilities in its permitted and
1332 effective capability sets,
1333 except those masked out by the capability bounding set.
1334 (In the calculation of P'(permitted),
1335 the P'(ambient) term can be simplified away because it is by
1336 definition a proper subset of P(inheritable).)
1338 The special treatments of user ID 0 (root) described in this subsection
1339 can be disabled using the securebits mechanism described below.
1342 .SS Set-user-ID-root programs that have file capabilities
1343 There is one exception to the behavior described under
1344 .IR "Capabilities and execution of programs by root" .
1345 If (a) the binary that is being executed has capabilities attached and
1346 (b) the real user ID of the process is
1349 (c) the effective user ID of the process
1351 0 (root), then the file capability bits are honored
1352 (i.e., they are not notionally considered to be all ones).
1353 The usual way in which this situation can arise is when executing
1354 a set-UID-root program that also has file capabilities.
1355 When such a program is executed,
1356 the process gains just the capabilities granted by the program
1357 (i.e., not all capabilities,
1358 as would occur when executing a set-user-ID-root program
1359 that does not have any associated file capabilities).
1361 Note that one can assign empty capability sets to a program file,
1362 and thus it is possible to create a set-user-ID-root program that
1363 changes the effective and saved set-user-ID of the process
1364 that executes the program to 0,
1365 but confers no capabilities to that process.
1367 .SS Capability bounding set
1368 The capability bounding set is a security mechanism that can be used
1369 to limit the capabilities that can be gained during an
1371 The bounding set is used in the following ways:
1375 the capability bounding set is ANDed with the file permitted
1376 capability set, and the result of this operation is assigned to the
1377 thread's permitted capability set.
1378 The capability bounding set thus places a limit on the permitted
1379 capabilities that may be granted by an executable file.
1381 (Since Linux 2.6.25)
1382 The capability bounding set acts as a limiting superset for
1383 the capabilities that a thread can add to its inheritable set using
1385 This means that if a capability is not in the bounding set,
1386 then a thread can't add this capability to its
1387 inheritable set, even if it was in its permitted capabilities,
1388 and thereby cannot have this capability preserved in its
1389 permitted set when it
1391 a file that has the capability in its inheritable set.
1393 Note that the bounding set masks the file permitted capabilities,
1394 but not the inheritable capabilities.
1395 If a thread maintains a capability in its inheritable set
1396 that is not in its bounding set,
1397 then it can still gain that capability in its permitted set
1398 by executing a file that has the capability in its inheritable set.
1400 Depending on the kernel version, the capability bounding set is either
1401 a system-wide attribute, or a per-process attribute.
1403 .B "Capability bounding set from Linux 2.6.25 onward"
1405 From Linux 2.6.25, the
1406 .I "capability bounding set"
1407 is a per-thread attribute.
1408 (The system-wide capability bounding set described below no longer exists.)
1410 The bounding set is inherited at
1412 from the thread's parent, and is preserved across an
1415 A thread may remove capabilities from its capability bounding set using the
1418 operation, provided it has the
1421 Once a capability has been dropped from the bounding set,
1422 it cannot be restored to that set.
1423 A thread can determine if a capability is in its bounding set using the
1428 Removing capabilities from the bounding set is supported only if file
1429 capabilities are compiled into the kernel.
1430 In kernels before Linux 2.6.33,
1431 file capabilities were an optional feature configurable via the
1432 .B CONFIG_SECURITY_FILE_CAPABILITIES
1435 .\" commit b3a222e52e4d4be77cc4520a57af1a4a0d8222d1
1436 the configuration option has been removed
1437 and file capabilities are always part of the kernel.
1438 When file capabilities are compiled into the kernel, the
1440 process (the ancestor of all processes) begins with a full bounding set.
1441 If file capabilities are not compiled into the kernel, then
1443 begins with a full bounding set minus
1445 because this capability has a different meaning when there are
1446 no file capabilities.
1448 Removing a capability from the bounding set does not remove it
1449 from the thread's inheritable set.
1450 However it does prevent the capability from being added
1451 back into the thread's inheritable set in the future.
1453 .B "Capability bounding set prior to Linux 2.6.25"
1455 In kernels before 2.6.25, the capability bounding set is a system-wide
1456 attribute that affects all threads on the system.
1457 The bounding set is accessible via the file
1458 .IR /proc/sys/kernel/cap\-bound .
1459 (Confusingly, this bit mask parameter is expressed as a
1460 signed decimal number in
1461 .IR /proc/sys/kernel/cap\-bound .)
1465 process may set capabilities in the capability bounding set;
1466 other than that, the superuser (more precisely: a process with the
1468 capability) may only clear capabilities from this set.
1470 On a standard system the capability bounding set always masks out the
1473 To remove this restriction (dangerous!), modify the definition of
1476 .I include/linux/capability.h
1477 and rebuild the kernel.
1479 The system-wide capability bounding set feature was added
1480 to Linux starting with kernel version 2.2.11.
1484 .SS Effect of user ID changes on capabilities
1485 To preserve the traditional semantics for transitions between
1486 0 and nonzero user IDs,
1487 the kernel makes the following changes to a thread's capability
1488 sets on changes to the thread's real, effective, saved set,
1489 and filesystem user IDs (using
1494 If one or more of the real, effective, or saved set user IDs
1495 was previously 0, and as a result of the UID changes all of these IDs
1496 have a nonzero value,
1497 then all capabilities are cleared from the permitted, effective, and ambient
1500 If the effective user ID is changed from 0 to nonzero,
1501 then all capabilities are cleared from the effective set.
1503 If the effective user ID is changed from nonzero to 0,
1504 then the permitted set is copied to the effective set.
1506 If the filesystem user ID is changed from 0 to nonzero (see
1508 then the following capabilities are cleared from the effective set:
1510 .BR CAP_DAC_OVERRIDE ,
1511 .BR CAP_DAC_READ_SEARCH ,
1514 .B CAP_LINUX_IMMUTABLE
1515 (since Linux 2.6.30),
1516 .BR CAP_MAC_OVERRIDE ,
1519 (since Linux 2.6.30).
1520 If the filesystem UID is changed from nonzero to 0,
1521 then any of these capabilities that are enabled in the permitted set
1522 are enabled in the effective set.
1524 If a thread that has a 0 value for one or more of its user IDs wants
1525 to prevent its permitted capability set being cleared when it resets
1526 all of its user IDs to nonzero values, it can do so using the
1528 securebits flag described below.
1530 .SS Programmatically adjusting capability sets
1531 A thread can retrieve and change its permitted, effective, and inheritable
1532 capability sets using the
1538 .BR cap_get_proc (3)
1540 .BR cap_set_proc (3),
1541 both provided in the
1544 is preferred for this purpose.
1545 The following rules govern changes to the thread capability sets:
1547 If the caller does not have the
1550 the new inheritable set must be a subset of the combination
1551 of the existing inheritable and permitted sets.
1553 (Since Linux 2.6.25)
1554 The new inheritable set must be a subset of the combination of the
1555 existing inheritable set and the capability bounding set.
1557 The new permitted set must be a subset of the existing permitted set
1558 (i.e., it is not possible to acquire permitted capabilities
1559 that the thread does not currently have).
1561 The new effective set must be a subset of the new permitted set.
1562 .SS The securebits flags: establishing a capabilities-only environment
1563 .\" For some background:
1564 .\" see http://lwn.net/Articles/280279/ and
1565 .\" http://article.gmane.org/gmane.linux.kernel.lsm/5476/
1566 Starting with kernel 2.6.26,
1567 and with a kernel in which file capabilities are enabled,
1568 Linux implements a set of per-thread
1570 flags that can be used to disable special handling of capabilities for UID 0
1572 These flags are as follows:
1575 Setting this flag allows a thread that has one or more 0 UIDs to retain
1576 capabilities in its permitted set
1577 when it switches all of its UIDs to nonzero values.
1578 If this flag is not set,
1579 then such a UID switch causes the thread to lose all permitted capabilities.
1580 This flag is always cleared on an
1583 Note that even with the
1585 flag set, the effective capabilities of a thread are cleared when it
1586 switches its effective UID to a nonzero value.
1588 if the thread has set this flag and its effective UID is already nonzero,
1589 and the thread subsequently switches all other UIDs to nonzero values,
1590 then the effective capabilities will not be cleared.
1594 flag is ignored if the
1595 .B SECBIT_NO_SETUID_FIXUP
1597 (The latter flag provides a superset of the effect of the former flag.)
1599 This flag provides the same functionality as the older
1604 .B SECBIT_NO_SETUID_FIXUP
1605 Setting this flag stops the kernel from adjusting the process's
1606 permitted, effective, and ambient capability sets when
1607 the thread's effective and filesystem UIDs are switched between
1608 zero and nonzero values.
1610 .IR "Effect of user ID changes on capabilities" .)
1613 If this bit is set, then the kernel does not grant capabilities
1614 when a set-user-ID-root program is executed, or when a process with
1615 an effective or real UID of 0 calls
1618 .IR "Capabilities and execution of programs by root" .)
1620 .B SECBIT_NO_CAP_AMBIENT_RAISE
1621 Setting this flag disallows raising ambient capabilities via the
1623 .BR PR_CAP_AMBIENT_RAISE
1626 Each of the above "base" flags has a companion "locked" flag.
1627 Setting any of the "locked" flags is irreversible,
1628 and has the effect of preventing further changes to the
1629 corresponding "base" flag.
1630 The locked flags are:
1631 .BR SECBIT_KEEP_CAPS_LOCKED ,
1632 .BR SECBIT_NO_SETUID_FIXUP_LOCKED ,
1633 .BR SECBIT_NOROOT_LOCKED ,
1635 .BR SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED .
1639 flags can be modified and retrieved using the
1641 .B PR_SET_SECUREBITS
1643 .B PR_GET_SECUREBITS
1647 capability is required to modify the flags.
1650 constants are available only after including the
1651 .I <linux/securebits.h>
1656 flags are inherited by child processes.
1659 all of the flags are preserved, except
1661 which is always cleared.
1663 An application can use the following call to lock itself,
1664 and all of its descendants,
1665 into an environment where the only way of gaining capabilities
1666 is by executing a program with associated file capabilities:
1670 prctl(PR_SET_SECUREBITS,
1671 /* SECBIT_KEEP_CAPS off */
1672 SECBIT_KEEP_CAPS_LOCKED |
1673 SECBIT_NO_SETUID_FIXUP |
1674 SECBIT_NO_SETUID_FIXUP_LOCKED |
1676 SECBIT_NOROOT_LOCKED);
1677 /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
1683 .SS Per-user-namespace """set-user-ID-root""" programs
1684 A set-user-ID program whose UID matches the UID that
1685 created a user namespace will confer capabilities
1686 in the process's permitted and effective sets
1687 when executed by any process inside that namespace
1688 or any descendant user namespace.
1690 The rules about the transformation of the process's capabilities during the
1692 are exactly as described in the subsections
1693 .IR "Transformation of capabilities during execve()"
1695 .IR "Capabilities and execution of programs by root" ,
1696 with the difference that, in the latter subsection, "root"
1697 is the UID of the creator of the user namespace.
1700 .SS Namespaced file capabilities
1701 .\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340
1702 Traditional (i.e., version 2) file capabilities associate
1703 only a set of capability masks with a binary executable file.
1704 When a process executes a binary with such capabilities,
1705 it gains the associated capabilities (within its user namespace)
1706 as per the rules described above in
1707 "Transformation of capabilities during execve()".
1709 Because version 2 file capabilities confer capabilities to
1710 the executing process regardless of which user namespace it resides in,
1711 only privileged processes are permitted to associate capabilities with a file.
1712 Here, "privileged" means a process that has the
1714 capability in the user namespace where the filesystem was mounted
1715 (normally the initial user namespace).
1716 This limitation renders file capabilities useless for certain use cases.
1717 For example, in user-namespaced containers,
1718 it can be desirable to be able to create a binary that
1719 confers capabilities only to processes executed inside that container,
1720 but not to processes that are executed outside the container.
1722 Linux 4.14 added so-called namespaced file capabilities
1723 to support such use cases.
1724 Namespaced file capabilities are recorded as version 3 (i.e.,
1725 .BR VFS_CAP_REVISION_3 )
1726 .I security.capability
1727 extended attributes.
1728 Such an attribute is automatically created in the circumstances described
1729 above under "File capability extended attribute versioning".
1731 .I security.capability
1732 extended attribute is created,
1733 the kernel records not just the capability masks in the extended attribute,
1734 but also the namespace root user ID.
1736 As with a binary that has
1737 .BR VFS_CAP_REVISION_2
1738 file capabilities, a binary with
1739 .BR VFS_CAP_REVISION_3
1740 file capabilities confers capabilities to a process during
1742 However, capabilities are conferred only if the binary is executed by
1743 a process that resides in a user namespace whose
1744 UID 0 maps to the root user ID that is saved in the extended attribute,
1745 or when executed by a process that resides in a descendant of such a namespace.
1748 .SS Interaction with user namespaces
1749 For further information on the interaction of
1750 capabilities and user namespaces, see
1751 .BR user_namespaces (7).
1753 No standards govern capabilities, but the Linux capability implementation
1754 is based on the withdrawn POSIX.1e draft standard; see
1755 .UR https://archive.org\:/details\:/posix_1003.1e-990310
1760 binaries that have capabilities (or set-user-ID-root binaries),
1768 $ \fBsudo strace \-o trace.log \-u ceci ./myprivprog\fP
1772 From kernel 2.5.27 to kernel 2.6.26,
1773 .\" commit 5915eb53861c5776cfec33ca4fcc1fd20d66dd27 removed
1774 .\" CONFIG_SECURITY_CAPABILITIES
1775 capabilities were an optional kernel component,
1776 and could be enabled/disabled via the
1777 .B CONFIG_SECURITY_CAPABILITIES
1778 kernel configuration option.
1781 .I /proc/[pid]/task/TID/status
1782 file can be used to view the capability sets of a thread.
1784 .I /proc/[pid]/status
1785 file shows the capability sets of a process's main thread.
1786 Before Linux 3.8, nonexistent capabilities were shown as being
1787 enabled (1) in these sets.
1789 .\" 7b9a7ec565505699f503b4fcf61500dceb36e744
1790 all nonexistent capabilities (above
1792 are shown as disabled (0).
1796 package provides a suite of routines for setting and
1797 getting capabilities that is more comfortable and less likely
1798 to change than the interface provided by
1802 This package also provides the
1809 .UR https://git.kernel.org\:/pub\:/scm\:/libs\:/libcap\:/libcap.git\:/refs/
1812 Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if
1813 file capabilities are not enabled, a thread with the
1815 capability can manipulate the capabilities of threads other than itself.
1816 However, this is only theoretically possible,
1817 since no thread ever has
1819 in either of these cases:
1821 In the pre-2.6.25 implementation the system-wide capability bounding set,
1822 .IR /proc/sys/kernel/cap\-bound ,
1823 always masks out the
1825 capability, and this can not be changed
1826 without modifying the kernel source and rebuilding the kernel.
1828 If file capabilities are disabled (i.e., the kernel
1829 .B CONFIG_SECURITY_FILE_CAPABILITIES
1830 option is disabled), then
1834 capability removed from its per-process bounding
1835 set, and that bounding set is inherited by all other processes
1836 created on the system.
1843 .BR cap_copy_ext (3),
1844 .BR cap_from_text (3),
1845 .BR cap_get_file (3),
1846 .BR cap_get_proc (3),
1852 .BR credentials (7),
1854 .BR user_namespaces (7),
1855 .BR captest (8), \" from libcap-ng
1856 .BR filecap (8), \" from libcap-ng
1859 .BR netcap (8), \" from libcap-ng
1860 .BR pscap (8), \" from libcap-ng
1863 .I include/linux/capability.h
1864 in the Linux kernel source tree