1 .\" Copyright (c) 2021, IBM Corporation.
2 .\" Written by Mike Rapoport <rppt@linux.ibm.com>
4 .\" Based on memfd_create(2) man page
5 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
6 .\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
8 .\" SPDX-License-Identifier: GPL-2.0-or-later
10 .TH memfd_secret 2 (date) "Linux man-pages (unreleased)"
12 memfd_secret \- create an anonymous RAM-based file
13 to access secret memory regions
16 .RI ( libc ", " \-lc )
20 .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
21 .B #include <unistd.h>
23 .BI "int syscall(SYS_memfd_secret, unsigned int " flags );
27 glibc provides no wrapper for
29 necessitating the use of
33 creates an anonymous RAM-based file and returns a file descriptor
35 The file provides a way to create and access memory regions
36 with stronger protection than usual RAM-based files and
37 anonymous memory mappings.
38 Once all open references to the file are closed,
39 it is automatically released.
40 The initial size of the file is set to 0.
41 Following the call, the file size should be set using
44 The memory areas backing the file created with
46 are visible only to the processes that have access to the file descriptor.
47 The memory region is removed from the kernel page tables
48 and only the page tables of the processes holding the file descriptor
49 map the corresponding physical memory.
50 (Thus, the pages in the region can't be accessed by the kernel itself,
51 so that, for example, pointers to the region can't be passed to
54 The following values may be bitwise ORed in
56 to control the behavior of
60 Set the close-on-exec flag on the new file descriptor,
61 which causes the region to be removed from the process on
63 See the description of the
70 returns a new file descriptor that refers to an anonymous file.
71 This file descriptor is opened for both reading and writing
75 is set for the file descriptor.
81 the usual semantics apply for the file descriptor created by
83 A copy of the file descriptor is inherited by the child produced by
85 and refers to the same file.
86 The file descriptor is preserved across
88 unless the close-on-exec flag has been set.
90 The memory region is locked into memory in the same way as with
92 so that it will never be written into swap,
93 and hibernation is inhibited for as long as any
96 However the implementation of
98 will not try to populate the whole range during the
100 call that attaches the region into the process's address space;
101 instead, the pages are only actually allocated
102 as they are faulted in.
103 The amount of memory allowed for memory mappings
104 of the file descriptor obeys the same rules as
111 returns a new file descriptor.
112 On error, \-1 is returned and
114 is set to indicate the error.
119 included unknown bits.
122 The per-process limit on the number of open file descriptors has been reached.
125 The system-wide limit on the total number of open files has been reached.
128 There was insufficient memory to create a new anonymous file.
132 is not implemented on this architecture,
133 or has not been enabled on the kernel command-line with
134 .BR secretmem_enable =1.
138 system call first appeared in Linux 5.14.
142 system call is Linux-specific.
146 system call is designed to allow a user-space process
147 to create a range of memory that is inaccessible to anybody else -
149 There is no 100% guarantee that kernel won't be able to access
150 memory ranges backed by
152 in any circumstances, but nevertheless,
153 it is much harder to exfiltrate data from these regions.
156 provides the following protections:
159 (in conjunction with all the other in-kernel attack prevention systems)
161 Absence of any in-kernel primitive for accessing memory backed by
163 means that one-gadget ROP attack
164 can't work to perform data exfiltration.
165 The attacker would need to find enough ROP gadgets
166 to reconstruct the missing page table entries,
167 which significantly increases difficulty of the attack,
168 especially when other protections like the kernel stack size limit
169 and address space layout randomization are in place.
171 Prevent cross-process user-space memory exposures.
174 memory mapping is allocated,
175 the user can't accidentally pass it into the kernel
176 to be transmitted somewhere.
177 The memory pages in this region cannot be accessed via the direct map
178 and they are disallowed in get_user_pages.
180 Harden against exploited kernel flaws.
181 In order to access memory areas backed by
183 a kernel-side attack would need to
184 either walk the page tables and create new ones,
185 or spawn a new privileged user-space process to perform
186 secrets exfiltration using
191 allocates and locks the memory may impact overall system performance,
192 therefore the system call is disabled by default and only available
193 if the system administrator turned it on using
194 "secretmem.enable=y" kernel parameter.
196 To prevent potential data leaks of memory regions backed by
198 from a hybernation image,
199 hybernation is prevented when there are active
206 .BR memfd_create (2),