1 .\" Copyright (c) 2016, IBM Corporation.
2 .\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
3 .\" and Copyright (C) 2016 Michael Kerrisk <mtk.manpages@gmail.com>
5 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
8 .TH IOCTL_USERFAULTFD 2 2022-09-09 "Linux man-pages (unreleased)"
10 ioctl_userfaultfd \- create a file descriptor for handling page faults in user
14 .RI ( libc ", " \-lc )
17 .BR "#include <linux/userfaultfd.h>" " /* Definition of " UFFD* " constants */"
18 .B #include <sys/ioctl.h>
20 .BI "int ioctl(int " fd ", int " cmd ", ...);"
25 operations can be performed on a userfaultfd object (created by a call to
27 using calls of the form:
36 is a file descriptor referring to a userfaultfd object,
38 is one of the commands listed below, and
40 is a pointer to a data structure that is specific to
45 operations are described below.
51 operations are used to
54 These operations allow the caller to choose what features will be enabled and
55 what kinds of events will be delivered to the application.
56 The remaining operations are
59 These operations enable the calling application to resolve page-fault
64 Enable operation of the userfaultfd and perform API handshake.
68 argument is a pointer to a
70 structure, defined as:
75 __u64 api; /* Requested API version (input) */
76 __u64 features; /* Requested features (input/output) */
77 __u64 ioctls; /* Available ioctl() operations (output) */
84 field denotes the API version requested by the application.
86 The kernel verifies that it can support the requested API version,
91 fields to bit masks representing all the available features and the generic
95 For Linux kernel versions before 4.11, the
97 field must be initialized to zero before the call to
99 and zero (i.e., no feature bits) is placed in the
101 field by the kernel upon return from
104 Starting from Linux 4.11, the
106 field can be used to ask whether particular features are supported
107 and explicitly enable userfaultfd features that are disabled by default.
108 The kernel always reports all the available features in the
112 To enable userfaultfd features the application should set
113 a bit corresponding to each feature it wants to enable in the
116 If the kernel supports all the requested features it will enable them.
117 Otherwise it will zero out the returned
121 .\" FIXME add more details about feature negotiation and enablement
123 The following feature bits may be set:
125 .BR UFFD_FEATURE_EVENT_FORK " (since Linux 4.11)"
126 When this feature is enabled,
127 the userfaultfd objects associated with a parent process are duplicated
128 into the child process during
132 event is delivered to the userfaultfd monitor
134 .BR UFFD_FEATURE_EVENT_REMAP " (since Linux 4.11)"
135 If this feature is enabled,
136 when the faulting process invokes
138 the userfaultfd monitor will receive an event of type
139 .BR UFFD_EVENT_REMAP .
141 .BR UFFD_FEATURE_EVENT_REMOVE " (since Linux 4.11)"
142 If this feature is enabled,
143 when the faulting process calls
149 advice value to free a virtual memory area
150 the userfaultfd monitor will receive an event of type
151 .BR UFFD_EVENT_REMOVE .
153 .BR UFFD_FEATURE_EVENT_UNMAP " (since Linux 4.11)"
154 If this feature is enabled,
155 when the faulting process unmaps virtual memory either explicitly with
157 or implicitly during either
161 the userfaultfd monitor will receive an event of type
162 .BR UFFD_EVENT_UNMAP .
164 .BR UFFD_FEATURE_MISSING_HUGETLBFS " (since Linux 4.11)"
165 If this feature bit is set,
166 the kernel supports registering userfaultfd ranges on hugetlbfs
169 .BR UFFD_FEATURE_MISSING_SHMEM " (since Linux 4.11)"
170 If this feature bit is set,
171 the kernel supports registering userfaultfd ranges on shared memory areas.
172 This includes all kernel shared memory APIs:
173 System V shared memory,
181 .BR memfd_create (2),
184 .BR UFFD_FEATURE_SIGBUS " (since Linux 4.14)"
185 .\" commit 2d6d6f5a09a96cc1fec7ed992b825e05f64cb50e
186 If this feature bit is set, no page-fault events
187 .RB ( UFFD_EVENT_PAGEFAULT )
191 signal will be sent to the faulting process.
192 Applications using this
193 feature will not require the use of a userfaultfd monitor for processing
194 memory accesses to the regions registered with userfaultfd.
196 .BR UFFD_FEATURE_THREAD_ID " (since Linux 4.14)"
197 If this feature bit is set,
198 .I uffd_msg.pagefault.feat.ptid
199 will be set to the faulted thread ID for each page-fault message.
201 .BR UFFD_FEATURE_MINOR_HUGETLBFS " (since Linux 5.13)"
202 If this feature bit is set,
203 the kernel supports registering userfaultfd ranges
204 in minor mode on hugetlbfs-backed memory areas.
206 .BR UFFD_FEATURE_MINOR_SHMEM " (since Linux 5.14)"
207 If this feature bit is set,
208 the kernel supports registering userfaultfd ranges
209 in minor mode on shmem-backed memory areas.
213 field can contain the following bits:
214 .\" FIXME This user-space API seems not fully polished. Why are there
215 .\" not constants defined for each of the bit-mask values listed below?
220 operation is supported.
222 .B 1 << _UFFDIO_REGISTER
225 operation is supported.
227 .B 1 << _UFFDIO_UNREGISTER
230 operation is supported.
234 operation returns 0 on success.
235 On error, \-1 is returned and
237 is set to indicate the error.
238 Possible errors include:
242 refers to an address that is outside the calling process's
243 accessible address space.
246 The userfaultfd has already been enabled by a previous
251 The API version requested in the
253 field is not supported by this kernel, or the
255 field passed to the kernel includes feature bits that are not supported
256 by the current kernel version.
257 .\" FIXME In the above error case, the returned 'uffdio_api' structure is
258 .\" zeroed out. Why is this done? This should be explained in the manual page.
261 .\" In my understanding the uffdio_api
262 .\" structure is zeroed to allow the caller
263 .\" to distinguish the reasons for -EINVAL.
267 Register a memory address range with the userfaultfd object.
268 The pages in the range must be "compatible".
269 Please refer to the list of register modes below
270 for the compatible memory backends for each mode.
274 argument is a pointer to a
276 structure, defined as:
280 struct uffdio_range {
281 __u64 start; /* Start of range */
282 __u64 len; /* Length of range (bytes) */
285 struct uffdio_register {
286 struct uffdio_range range;
287 __u64 mode; /* Desired mode of operation (input) */
288 __u64 ioctls; /* Available ioctl() operations (output) */
295 field defines a memory range starting at
299 bytes that should be handled by the userfaultfd.
303 field defines the mode of operation desired for this memory region.
304 The following values may be bitwise ORed to set the userfaultfd mode for
307 .B UFFDIO_REGISTER_MODE_MISSING
308 Track page faults on missing pages.
310 only private anonymous ranges are compatible.
312 hugetlbfs and shared memory ranges are also compatible.
314 .B UFFDIO_REGISTER_MODE_WP
315 Track page faults on write-protected pages.
317 only private anonymous ranges are compatible.
319 .B UFFDIO_REGISTER_MODE_MINOR
320 Track minor page faults.
322 only hugetlbfs ranges are compatible.
324 compatiblity with shmem ranges was added.
326 If the operation is successful, the kernel modifies the
328 bit-mask field to indicate which
330 operations are available for the specified range.
331 This returned bit mask can contain the following bits:
336 operation is supported.
341 operation is supported.
343 .B 1 << _UFFDIO_WRITEPROTECT
345 .B UFFDIO_WRITEPROTECT
347 .B 1 << _UFFDIO_ZEROPAGE
350 operation is supported.
352 .B 1 << _UFFDIO_CONTINUE
355 operation is supported.
359 operation returns 0 on success.
360 On error, \-1 is returned and
362 is set to indicate the error.
363 Possible errors include:
364 .\" FIXME Is the following error list correct?
368 A mapping in the specified range is registered with another
373 refers to an address that is outside the calling process's
374 accessible address space.
377 An invalid or unsupported bit was specified in the
384 There is no mapping in the specified address range.
390 is not a multiple of the system page size; or,
392 is zero; or these fields are otherwise invalid.
395 There as an incompatible mapping in the specified address range.
397 .\" ENOMEM if the process is exiting and the
398 .\" mm_struct has gone by the time userfault grabs it.
399 .SS UFFDIO_UNREGISTER
401 Unregister a memory address range from userfaultfd.
402 The pages in the range must be "compatible" (see the description of
403 .BR UFFDIO_REGISTER .)
405 The address range to unregister is specified in the
407 structure pointed to by
412 operation returns 0 on success.
413 On error, \-1 is returned and
415 is set to indicate the error.
416 Possible errors include:
425 structure was not a multiple of the system page size; or the
427 field was zero; or these fields were otherwise invalid.
430 There as an incompatible mapping in the specified address range.
433 There was no mapping in the specified address range.
437 Atomically copy a continuous memory chunk into the userfault registered
438 range and optionally wake up the blocked thread.
439 The source and destination addresses and the number of bytes to copy are
441 .IR src ", " dst ", and " len
444 structure pointed to by
450 __u64 dst; /* Destination of copy */
451 __u64 src; /* Source of copy */
452 __u64 len; /* Number of bytes to copy */
453 __u64 mode; /* Flags controlling behavior of copy */
454 __s64 copy; /* Number of bytes copied, or negated error */
459 The following value may be bitwise ORed in
461 to change the behavior of the
465 .B UFFDIO_COPY_MODE_DONTWAKE
466 Do not wake up the thread that waits for page-fault resolution
468 .B UFFDIO_COPY_MODE_WP
469 Copy the page with read-only permission.
470 This allows the user to trap the next write to the page,
471 which will block and generate another write-protect userfault message.
472 This is used only when both
473 .B UFFDIO_REGISTER_MODE_MISSING
475 .B UFFDIO_REGISTER_MODE_WP
476 modes are enabled for the registered range.
480 field is used by the kernel to return the number of bytes
481 that was actually copied, or an error (a negated
484 .\" FIXME Above: Why is the 'copy' field used to return error values?
485 .\" This should be explained in the manual page.
486 If the value returned in
488 doesn't match the value that was specified in
490 the operation fails with the error
494 field is output-only;
495 it is not read by the
501 operation returns 0 on success.
502 In this case, the entire area was copied.
503 On error, \-1 is returned and
505 is set to indicate the error.
506 Possible errors include:
509 The number of bytes copied (i.e., the value returned in the
512 does not equal the value that was specified in the
521 was not a multiple of the system page size, or the range specified by
532 An invalid bit was specified in the
536 .BR ENOENT " (since Linux 4.11)"
537 The faulting process has changed
538 its virtual memory layout simultaneously with an outstanding
542 .BR ENOSPC " (from Linux 4.11 until Linux 4.13)"
543 The faulting process has exited at the time of a
547 .BR ESRCH " (since Linux 4.13)"
548 The faulting process has exited at the time of a
554 Zero out a memory range registered with userfaultfd.
556 The requested range is specified by the
560 structure pointed to by
565 struct uffdio_zeropage {
566 struct uffdio_range range;
567 __u64 mode; /* Flags controlling behavior of copy */
568 __s64 zeropage; /* Number of bytes zeroed, or negated error */
573 The following value may be bitwise ORed in
575 to change the behavior of the
579 .B UFFDIO_ZEROPAGE_MODE_DONTWAKE
580 Do not wake up the thread that waits for page-fault resolution.
584 field is used by the kernel to return the number of bytes
585 that was actually zeroed,
586 or an error in the same manner as
588 .\" FIXME Why is the 'zeropage' field used to return error values?
589 .\" This should be explained in the manual page.
590 If the value returned in the
592 field doesn't match the value that was specified in
594 the operation fails with the error
598 field is output-only;
599 it is not read by the
605 operation returns 0 on success.
606 In this case, the entire area was zeroed.
607 On error, \-1 is returned and
609 is set to indicate the error.
610 Possible errors include:
613 The number of bytes zeroed (i.e., the value returned in the
616 does not equal the value that was specified in the
625 was not a multiple of the system page size; or
627 was zero; or the range specified was invalid.
630 An invalid bit was specified in the
634 .BR ESRCH " (since Linux 4.13)"
635 The faulting process has exited at the time of a
641 Wake up the thread waiting for page-fault resolution on
642 a specified memory address range.
646 operation is used in conjunction with
650 operations that have the
651 .B UFFDIO_COPY_MODE_DONTWAKE
653 .B UFFDIO_ZEROPAGE_MODE_DONTWAKE
657 The userfault monitor can perform several
661 operations in a batch and then explicitly wake up the faulting thread using
666 argument is a pointer to a
668 structure (shown above) that specifies the address range.
672 operation returns 0 on success.
673 On error, \-1 is returned and
675 is set to indicate the error.
676 Possible errors include:
685 structure was not a multiple of the system page size; or
687 was zero; or the specified range was otherwise invalid.
688 .SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
689 Write-protect or write-unprotect a userfaultfd-registered memory range
691 .BR UFFDIO_REGISTER_MODE_WP .
695 argument is a pointer to a
697 structure as shown below:
701 struct uffdio_writeprotect {
702 struct uffdio_range range; /* Range to change write permission*/
703 __u64 mode; /* Mode to change write permission */
708 There are two mode bits that are supported in this structure:
710 .B UFFDIO_WRITEPROTECT_MODE_WP
711 When this mode bit is set,
712 the ioctl will be a write-protect operation upon the memory range specified by
714 Otherwise it will be a write-unprotect operation upon the specified range,
715 which can be used to resolve a userfaultfd write-protect page fault.
717 .B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
718 When this mode bit is set,
719 do not wake up any thread that waits for
720 page-fault resolution after the operation.
721 This can be specified only if
722 .B UFFDIO_WRITEPROTECT_MODE_WP
727 operation returns 0 on success.
728 On error, \-1 is returned and
730 is set to indicate the error.
731 Possible errors include:
740 structure was not a multiple of the system page size; or
742 was zero; or the specified range was otherwise invalid.
745 The process was interrupted; retry this call.
748 The range specified in
751 For example, the virtual address does not exist,
752 or not registered with userfaultfd write-protect mode.
755 Encountered a generic fault during processing.
759 Resolve a minor page fault
760 by installing page table entries
761 for existing pages in the page cache.
765 argument is a pointer to a
767 structure as shown below:
771 struct uffdio_continue {
772 struct uffdio_range range;
773 /* Range to install PTEs for and continue */
774 __u64 mode; /* Flags controlling the behavior of continue */
775 __s64 mapped; /* Number of bytes mapped, or negated error */
780 The following value may be bitwise ORed in
782 to change the behavior of the
786 .B UFFDIO_CONTINUE_MODE_DONTWAKE
787 Do not wake up the thread that waits for page-fault resolution.
791 field is used by the kernel
792 to return the number of bytes that were actually mapped,
793 or an error in the same manner as
795 If the value returned in the
797 field doesn't match the value that was specified in
799 the operation fails with the error
803 field is output-only;
804 it is not read by the
810 operation returns 0 on success.
812 the entire area was mapped.
813 On error, \-1 is returned and
815 is set to indicate the error.
816 Possible errors include:
819 The number of bytes mapped
820 (i.e., the value returned in the
823 does not equal the value that was specified in the
832 was not a multiple of the system page size; or
834 was zero; or the range specified was invalid.
837 An invalid bit was specified in the
842 One or more pages were already mapped in the given range.
845 The faulting process has changed its virtual memory layout simultaneously with
851 Allocating memory needed to setup the page table mappings failed.
854 No existing page could be found in the page cache for the given range.
857 The faulting process has exited at the time of a
862 See descriptions of the individual operations, above.
864 See descriptions of the individual operations, above.
865 In addition, the following general errors can occur for all of the
866 operations described above:
870 does not point to a valid memory address.
873 (For all operations except
875 The userfaultfd object has not yet been enabled (via the
881 operations are Linux-specific.
883 In order to detect available userfault features and
884 enable some subset of those features
885 the userfaultfd file descriptor must be closed after the first
887 operation that queries features availability and reopened before
890 operation that actually enables the desired features.
899 .I Documentation/admin\-guide/mm/userfaultfd.rst
900 in the Linux kernel source tree