1 .\" Copyright (c) 2016, IBM Corporation.
2 .\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
3 .\" and Copyright (C) 2016 Michael Kerrisk <mtk.manpages@gmail.com>
5 .\" %%%LICENSE_START(VERBATIM)
6 .\" Permission is granted to make and distribute verbatim copies of this
7 .\" manual provided the copyright notice and this permission notice are
8 .\" preserved on all copies.
10 .\" Permission is granted to copy and distribute modified versions of this
11 .\" manual under the conditions for verbatim copying, provided that the
12 .\" entire resulting derived work is distributed under the terms of a
13 .\" permission notice identical to this one.
15 .\" Since the Linux kernel and libraries are constantly changing, this
16 .\" manual page may be incorrect or out-of-date. The author(s) assume no
17 .\" responsibility for errors or omissions, or for damages resulting from
18 .\" the use of the information contained herein. The author(s) may not
19 .\" have taken the same level of care in the production of this manual,
20 .\" which is licensed free of charge, as they might when working
23 .\" Formatted or processed versions of this manual, if unaccompanied by
24 .\" the source, must acknowledge the copyright and authors of this work.
28 .TH IOCTL_USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual"
30 ioctl_userfaultfd \- create a file descriptor for handling page faults in user
34 .BR "#include <linux/userfaultfd.h>" " /* Definition of " UFFD* " constants */"
35 .B #include <sys/ioctl.h>
37 .BI "int ioctl(int " fd ", int " cmd ", ...);"
42 operations can be performed on a userfaultfd object (created by a call to
44 using calls of the form:
53 is a file descriptor referring to a userfaultfd object,
55 is one of the commands listed below, and
57 is a pointer to a data structure that is specific to
62 operations are described below.
68 operations are used to
71 These operations allow the caller to choose what features will be enabled and
72 what kinds of events will be delivered to the application.
73 The remaining operations are
76 These operations enable the calling application to resolve page-fault
81 Enable operation of the userfaultfd and perform API handshake.
85 argument is a pointer to a
87 structure, defined as:
92 __u64 api; /* Requested API version (input) */
93 __u64 features; /* Requested features (input/output) */
94 __u64 ioctls; /* Available ioctl() operations (output) */
101 field denotes the API version requested by the application.
103 The kernel verifies that it can support the requested API version,
108 fields to bit masks representing all the available features and the generic
110 operations available.
112 For Linux kernel versions before 4.11, the
114 field must be initialized to zero before the call to
116 and zero (i.e., no feature bits) is placed in the
118 field by the kernel upon return from
121 Starting from Linux 4.11, the
123 field can be used to ask whether particular features are supported
124 and explicitly enable userfaultfd features that are disabled by default.
125 The kernel always reports all the available features in the
129 To enable userfaultfd features the application should set
130 a bit corresponding to each feature it wants to enable in the
133 If the kernel supports all the requested features it will enable them.
134 Otherwise it will zero out the returned
138 .\" FIXME add more details about feature negotiation and enablement
140 The following feature bits may be set:
142 .BR UFFD_FEATURE_EVENT_FORK " (since Linux 4.11)"
143 When this feature is enabled,
144 the userfaultfd objects associated with a parent process are duplicated
145 into the child process during
149 event is delivered to the userfaultfd monitor
151 .BR UFFD_FEATURE_EVENT_REMAP " (since Linux 4.11)"
152 If this feature is enabled,
153 when the faulting process invokes
155 the userfaultfd monitor will receive an event of type
156 .BR UFFD_EVENT_REMAP .
158 .BR UFFD_FEATURE_EVENT_REMOVE " (since Linux 4.11)"
159 If this feature is enabled,
160 when the faulting process calls
166 advice value to free a virtual memory area
167 the userfaultfd monitor will receive an event of type
168 .BR UFFD_EVENT_REMOVE .
170 .BR UFFD_FEATURE_EVENT_UNMAP " (since Linux 4.11)"
171 If this feature is enabled,
172 when the faulting process unmaps virtual memory either explicitly with
174 or implicitly during either
178 the userfaultfd monitor will receive an event of type
179 .BR UFFD_EVENT_UNMAP .
181 .BR UFFD_FEATURE_MISSING_HUGETLBFS " (since Linux 4.11)"
182 If this feature bit is set,
183 the kernel supports registering userfaultfd ranges on hugetlbfs
186 .BR UFFD_FEATURE_MISSING_SHMEM " (since Linux 4.11)"
187 If this feature bit is set,
188 the kernel supports registering userfaultfd ranges on shared memory areas.
189 This includes all kernel shared memory APIs:
190 System V shared memory,
198 .BR memfd_create (2),
201 .BR UFFD_FEATURE_SIGBUS " (since Linux 4.14)"
202 .\" commit 2d6d6f5a09a96cc1fec7ed992b825e05f64cb50e
203 If this feature bit is set, no page-fault events
204 .RB ( UFFD_EVENT_PAGEFAULT )
208 signal will be sent to the faulting process.
209 Applications using this
210 feature will not require the use of a userfaultfd monitor for processing
211 memory accesses to the regions registered with userfaultfd.
213 .BR UFFD_FEATURE_THREAD_ID " (since Linux 4.14)"
214 If this feature bit is set,
215 .I uffd_msg.pagefault.feat.ptid
216 will be set to the faulted thread ID for each page-fault message.
220 field can contain the following bits:
221 .\" FIXME This user-space API seems not fully polished. Why are there
222 .\" not constants defined for each of the bit-mask values listed below?
227 operation is supported.
229 .B 1 << _UFFDIO_REGISTER
232 operation is supported.
234 .B 1 << _UFFDIO_UNREGISTER
237 operation is supported.
239 .B 1 << _UFFDIO_WRITEPROTECT
241 .B UFFDIO_WRITEPROTECT
242 operation is supported.
246 operation returns 0 on success.
247 On error, \-1 is returned and
249 is set to indicate the error.
250 Possible errors include:
254 refers to an address that is outside the calling process's
255 accessible address space.
258 The userfaultfd has already been enabled by a previous
263 The API version requested in the
265 field is not supported by this kernel, or the
267 field passed to the kernel includes feature bits that are not supported
268 by the current kernel version.
269 .\" FIXME In the above error case, the returned 'uffdio_api' structure is
270 .\" zeroed out. Why is this done? This should be explained in the manual page.
273 .\" In my understanding the uffdio_api
274 .\" structure is zeroed to allow the caller
275 .\" to distinguish the reasons for -EINVAL.
279 Register a memory address range with the userfaultfd object.
280 The pages in the range must be "compatible".
282 Up to Linux kernel 4.11,
283 only private anonymous ranges are compatible for registering with
284 .BR UFFDIO_REGISTER .
287 hugetlbfs and shared memory ranges are also compatible with
288 .BR UFFDIO_REGISTER .
292 argument is a pointer to a
294 structure, defined as:
298 struct uffdio_range {
299 __u64 start; /* Start of range */
300 __u64 len; /* Length of range (bytes) */
303 struct uffdio_register {
304 struct uffdio_range range;
305 __u64 mode; /* Desired mode of operation (input) */
306 __u64 ioctls; /* Available ioctl() operations (output) */
313 field defines a memory range starting at
317 bytes that should be handled by the userfaultfd.
321 field defines the mode of operation desired for this memory region.
322 The following values may be bitwise ORed to set the userfaultfd mode for
325 .B UFFDIO_REGISTER_MODE_MISSING
326 Track page faults on missing pages.
328 .B UFFDIO_REGISTER_MODE_WP
329 Track page faults on write-protected pages.
331 If the operation is successful, the kernel modifies the
333 bit-mask field to indicate which
335 operations are available for the specified range.
336 This returned bit mask is as for
341 operation returns 0 on success.
342 On error, \-1 is returned and
344 is set to indicate the error.
345 Possible errors include:
346 .\" FIXME Is the following error list correct?
350 A mapping in the specified range is registered with another
355 refers to an address that is outside the calling process's
356 accessible address space.
359 An invalid or unsupported bit was specified in the
366 There is no mapping in the specified address range.
372 is not a multiple of the system page size; or,
374 is zero; or these fields are otherwise invalid.
377 There as an incompatible mapping in the specified address range.
379 .\" ENOMEM if the process is exiting and the
380 .\" mm_struct has gone by the time userfault grabs it.
381 .SS UFFDIO_UNREGISTER
383 Unregister a memory address range from userfaultfd.
384 The pages in the range must be "compatible" (see the description of
385 .BR UFFDIO_REGISTER .)
387 The address range to unregister is specified in the
389 structure pointed to by
394 operation returns 0 on success.
395 On error, \-1 is returned and
397 is set to indicate the error.
398 Possible errors include:
407 structure was not a multiple of the system page size; or the
409 field was zero; or these fields were otherwise invalid.
412 There as an incompatible mapping in the specified address range.
415 There was no mapping in the specified address range.
419 Atomically copy a continuous memory chunk into the userfault registered
420 range and optionally wake up the blocked thread.
421 The source and destination addresses and the number of bytes to copy are
423 .IR src ", " dst ", and " len
426 structure pointed to by
432 __u64 dst; /* Destination of copy */
433 __u64 src; /* Source of copy */
434 __u64 len; /* Number of bytes to copy */
435 __u64 mode; /* Flags controlling behavior of copy */
436 __s64 copy; /* Number of bytes copied, or negated error */
441 The following value may be bitwise ORed in
443 to change the behavior of the
447 .B UFFDIO_COPY_MODE_DONTWAKE
448 Do not wake up the thread that waits for page-fault resolution
450 .B UFFDIO_COPY_MODE_WP
451 Copy the page with read-only permission.
452 This allows the user to trap the next write to the page,
453 which will block and generate another write-protect userfault message.
454 This is used only when both
455 .B UFFDIO_REGISTER_MODE_MISSING
457 .B UFFDIO_REGISTER_MODE_WP
458 modes are enabled for the registered range.
462 field is used by the kernel to return the number of bytes
463 that was actually copied, or an error (a negated
466 .\" FIXME Above: Why is the 'copy' field used to return error values?
467 .\" This should be explained in the manual page.
468 If the value returned in
470 doesn't match the value that was specified in
472 the operation fails with the error
476 field is output-only;
477 it is not read by the
483 operation returns 0 on success.
484 In this case, the entire area was copied.
485 On error, \-1 is returned and
487 is set to indicate the error.
488 Possible errors include:
491 The number of bytes copied (i.e., the value returned in the
494 does not equal the value that was specified in the
503 was not a multiple of the system page size, or the range specified by
514 An invalid bit was specified in the
518 .BR ENOENT " (since Linux 4.11)"
519 The faulting process has changed
520 its virtual memory layout simultaneously with an outstanding
524 .BR ENOSPC " (from Linux 4.11 until Linux 4.13)"
525 The faulting process has exited at the time of a
529 .BR ESRCH " (since Linux 4.13)"
530 The faulting process has exited at the time of a
536 Zero out a memory range registered with userfaultfd.
538 The requested range is specified by the
542 structure pointed to by
547 struct uffdio_zeropage {
548 struct uffdio_range range;
549 __u64 mode; /* Flags controlling behavior of copy */
550 __s64 zeropage; /* Number of bytes zeroed, or negated error */
555 The following value may be bitwise ORed in
557 to change the behavior of the
561 .B UFFDIO_ZEROPAGE_MODE_DONTWAKE
562 Do not wake up the thread that waits for page-fault resolution.
566 field is used by the kernel to return the number of bytes
567 that was actually zeroed,
568 or an error in the same manner as
570 .\" FIXME Why is the 'zeropage' field used to return error values?
571 .\" This should be explained in the manual page.
572 If the value returned in the
574 field doesn't match the value that was specified in
576 the operation fails with the error
580 field is output-only;
581 it is not read by the
587 operation returns 0 on success.
588 In this case, the entire area was zeroed.
589 On error, \-1 is returned and
591 is set to indicate the error.
592 Possible errors include:
595 The number of bytes zeroed (i.e., the value returned in the
598 does not equal the value that was specified in the
607 was not a multiple of the system page size; or
609 was zero; or the range specified was invalid.
612 An invalid bit was specified in the
616 .BR ESRCH " (since Linux 4.13)"
617 The faulting process has exited at the time of a
623 Wake up the thread waiting for page-fault resolution on
624 a specified memory address range.
628 operation is used in conjunction with
632 operations that have the
633 .BR UFFDIO_COPY_MODE_DONTWAKE
635 .BR UFFDIO_ZEROPAGE_MODE_DONTWAKE
639 The userfault monitor can perform several
643 operations in a batch and then explicitly wake up the faulting thread using
648 argument is a pointer to a
650 structure (shown above) that specifies the address range.
654 operation returns 0 on success.
655 On error, \-1 is returned and
657 is set to indicate the error.
658 Possible errors include:
667 structure was not a multiple of the system page size; or
669 was zero; or the specified range was otherwise invalid.
670 .SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
671 Write-protect or write-unprotect a userfaultfd-registered memory range
673 .BR UFFDIO_REGISTER_MODE_WP .
677 argument is a pointer to a
679 structure as shown below:
683 struct uffdio_writeprotect {
684 struct uffdio_range range; /* Range to change write permission*/
685 __u64 mode; /* Mode to change write permission */
690 There are two mode bits that are supported in this structure:
692 .B UFFDIO_WRITEPROTECT_MODE_WP
693 When this mode bit is set,
694 the ioctl will be a write-protect operation upon the memory range specified by
696 Otherwise it will be a write-unprotect operation upon the specified range,
697 which can be used to resolve a userfaultfd write-protect page fault.
699 .B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
700 When this mode bit is set,
701 do not wake up any thread that waits for
702 page-fault resolution after the operation.
703 This can be specified only if
704 .B UFFDIO_WRITEPROTECT_MODE_WP
709 operation returns 0 on success.
710 On error, \-1 is returned and
712 is set to indicate the error.
713 Possible errors include:
722 structure was not a multiple of the system page size; or
724 was zero; or the specified range was otherwise invalid.
727 The process was interrupted; retry this call.
730 The range specified in
733 For example, the virtual address does not exist,
734 or not registered with userfaultfd write-protect mode.
737 Encountered a generic fault during processing.
739 See descriptions of the individual operations, above.
741 See descriptions of the individual operations, above.
742 In addition, the following general errors can occur for all of the
743 operations described above:
747 does not point to a valid memory address.
750 (For all operations except
752 The userfaultfd object has not yet been enabled (via the
758 operations are Linux-specific.
760 In order to detect available userfault features and
761 enable some subset of those features
762 the userfaultfd file descriptor must be closed after the first
764 operation that queries features availability and reopened before
767 operation that actually enables the desired features.
776 .IR Documentation/admin\-guide/mm/userfaultfd.rst
777 in the Linux kernel source tree