1 \input texinfo @c -*-texinfo-*-
4 @setfilename libgomp.info
10 Copyright @copyright{} 2006-2020 Free Software Foundation, Inc.
12 Permission is granted to copy, distribute and/or modify this document
13 under the terms of the GNU Free Documentation License, Version 1.3 or
14 any later version published by the Free Software Foundation; with the
15 Invariant Sections being ``Funding Free Software'', the Front-Cover
16 texts being (a) (see below), and with the Back-Cover Texts being (b)
17 (see below). A copy of the license is included in the section entitled
18 ``GNU Free Documentation License''.
20 (a) The FSF's Front-Cover Text is:
24 (b) The FSF's Back-Cover Text is:
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
32 @dircategory GNU Libraries
34 * libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
37 This manual documents libgomp, the GNU Offloading and Multi Processing
38 Runtime library. This is the GNU implementation of the OpenMP and
39 OpenACC APIs for parallel and accelerator programming in C/C++ and
42 Published by the Free Software Foundation
43 51 Franklin Street, Fifth Floor
44 Boston, MA 02110-1301 USA
50 @setchapternewpage odd
53 @title GNU Offloading and Multi Processing Runtime Library
54 @subtitle The GNU OpenMP and OpenACC Implementation
56 @vskip 0pt plus 1filll
57 @comment For the @value{version-GCC} Version*
59 Published by the Free Software Foundation @*
60 51 Franklin Street, Fifth Floor@*
61 Boston, MA 02110-1301, USA@*
75 This manual documents the usage of libgomp, the GNU Offloading and
76 Multi Processing Runtime Library. This includes the GNU
77 implementation of the @uref{https://www.openmp.org, OpenMP} Application
78 Programming Interface (API) for multi-platform shared-memory parallel
79 programming in C/C++ and Fortran, and the GNU implementation of the
80 @uref{https://www.openacc.org, OpenACC} Application Programming
81 Interface (API) for offloading of code to accelerator devices in C/C++
84 Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
86 4's target construct) has been added later on, and the library's name
87 changed to GNU Offloading and Multi Processing Runtime Library.
92 @comment When you add a new menu item, please keep the right hand
93 @comment aligned to the same column. Do not use tabs. This provides
94 @comment better formatting.
97 * Enabling OpenMP:: How to enable OpenMP for your applications.
98 * OpenMP Runtime Library Routines: Runtime Library Routines.
99 The OpenMP runtime application programming
101 * OpenMP Environment Variables: Environment Variables.
102 Influencing OpenMP runtime behavior with
103 environment variables.
104 * Enabling OpenACC:: How to enable OpenACC for your
106 * OpenACC Runtime Library Routines:: The OpenACC runtime application
107 programming interface.
108 * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
109 environment variables.
110 * CUDA Streams Usage:: Notes on the implementation of
111 asynchronous operations.
112 * OpenACC Library Interoperability:: OpenACC library interoperability with the
113 NVIDIA CUBLAS library.
114 * OpenACC Profiling Interface::
115 * The libgomp ABI:: Notes on the external ABI presented by libgomp.
116 * Reporting Bugs:: How to report bugs in the GNU Offloading and
117 Multi Processing Runtime Library.
118 * Copying:: GNU general public license says
119 how you can copy and share libgomp.
120 * GNU Free Documentation License::
121 How you can copy and share this manual.
122 * Funding:: How to help assure continued work for free
124 * Library Index:: Index of this documentation.
128 @c ---------------------------------------------------------------------
130 @c ---------------------------------------------------------------------
132 @node Enabling OpenMP
133 @chapter Enabling OpenMP
135 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
136 flag @command{-fopenmp} must be specified. This enables the OpenMP directive
137 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
138 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
139 @code{!$} conditional compilation sentinels in free form and @code{c$},
140 @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
141 arranges for automatic linking of the OpenMP runtime library
142 (@ref{Runtime Library Routines}).
144 A complete description of all OpenMP directives accepted may be found in
145 the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
149 @c ---------------------------------------------------------------------
150 @c OpenMP Runtime Library Routines
151 @c ---------------------------------------------------------------------
153 @node Runtime Library Routines
154 @chapter OpenMP Runtime Library Routines
156 The runtime routines described here are defined by Section 3 of the OpenMP
157 specification in version 4.5. The routines are structured in following
161 Control threads, processors and the parallel environment. They have C
162 linkage, and do not throw exceptions.
164 * omp_get_active_level:: Number of active parallel regions
165 * omp_get_ancestor_thread_num:: Ancestor thread ID
166 * omp_get_cancellation:: Whether cancellation support is enabled
167 * omp_get_default_device:: Get the default device for target regions
168 * omp_get_dynamic:: Dynamic teams setting
169 * omp_get_initial_device:: Device number of host device
170 * omp_get_level:: Number of parallel regions
171 * omp_get_max_active_levels:: Current maximum number of active regions
172 * omp_get_max_task_priority:: Maximum task priority value that can be set
173 * omp_get_max_threads:: Maximum number of threads of parallel region
174 * omp_get_nested:: Nested parallel regions
175 * omp_get_num_devices:: Number of target devices
176 * omp_get_num_procs:: Number of processors online
177 * omp_get_num_teams:: Number of teams
178 * omp_get_num_threads:: Size of the active team
179 * omp_get_proc_bind:: Whether theads may be moved between CPUs
180 * omp_get_schedule:: Obtain the runtime scheduling method
181 * omp_get_supported_active_levels:: Maximum number of active regions supported
182 * omp_get_team_num:: Get team number
183 * omp_get_team_size:: Number of threads in a team
184 * omp_get_thread_limit:: Maximum number of threads
185 * omp_get_thread_num:: Current thread ID
186 * omp_in_parallel:: Whether a parallel region is active
187 * omp_in_final:: Whether in final or included task region
188 * omp_is_initial_device:: Whether executing on the host device
189 * omp_set_default_device:: Set the default device for target regions
190 * omp_set_dynamic:: Enable/disable dynamic teams
191 * omp_set_max_active_levels:: Limits the number of active parallel regions
192 * omp_set_nested:: Enable/disable nested parallel regions
193 * omp_set_num_threads:: Set upper team size limit
194 * omp_set_schedule:: Set the runtime scheduling method
196 Initialize, set, test, unset and destroy simple and nested locks.
198 * omp_init_lock:: Initialize simple lock
199 * omp_set_lock:: Wait for and set simple lock
200 * omp_test_lock:: Test and set simple lock if available
201 * omp_unset_lock:: Unset simple lock
202 * omp_destroy_lock:: Destroy simple lock
203 * omp_init_nest_lock:: Initialize nested lock
204 * omp_set_nest_lock:: Wait for and set simple lock
205 * omp_test_nest_lock:: Test and set nested lock if available
206 * omp_unset_nest_lock:: Unset nested lock
207 * omp_destroy_nest_lock:: Destroy nested lock
209 Portable, thread-based, wall clock timer.
211 * omp_get_wtick:: Get timer precision.
212 * omp_get_wtime:: Elapsed wall clock time.
217 @node omp_get_active_level
218 @section @code{omp_get_active_level} -- Number of parallel regions
220 @item @emph{Description}:
221 This function returns the nesting level for the active parallel blocks,
222 which enclose the calling call.
225 @multitable @columnfractions .20 .80
226 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
229 @item @emph{Fortran}:
230 @multitable @columnfractions .20 .80
231 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
234 @item @emph{See also}:
235 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
237 @item @emph{Reference}:
238 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
243 @node omp_get_ancestor_thread_num
244 @section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
246 @item @emph{Description}:
247 This function returns the thread identification number for the given
248 nesting level of the current thread. For values of @var{level} outside
249 zero to @code{omp_get_level} -1 is returned; if @var{level} is
250 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
253 @multitable @columnfractions .20 .80
254 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
257 @item @emph{Fortran}:
258 @multitable @columnfractions .20 .80
259 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
260 @item @tab @code{integer level}
263 @item @emph{See also}:
264 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
266 @item @emph{Reference}:
267 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
272 @node omp_get_cancellation
273 @section @code{omp_get_cancellation} -- Whether cancellation support is enabled
275 @item @emph{Description}:
276 This function returns @code{true} if cancellation is activated, @code{false}
277 otherwise. Here, @code{true} and @code{false} represent their language-specific
278 counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
282 @multitable @columnfractions .20 .80
283 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
286 @item @emph{Fortran}:
287 @multitable @columnfractions .20 .80
288 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
291 @item @emph{See also}:
292 @ref{OMP_CANCELLATION}
294 @item @emph{Reference}:
295 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
300 @node omp_get_default_device
301 @section @code{omp_get_default_device} -- Get the default device for target regions
303 @item @emph{Description}:
304 Get the default device for target regions without device clause.
307 @multitable @columnfractions .20 .80
308 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
311 @item @emph{Fortran}:
312 @multitable @columnfractions .20 .80
313 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
316 @item @emph{See also}:
317 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
319 @item @emph{Reference}:
320 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
325 @node omp_get_dynamic
326 @section @code{omp_get_dynamic} -- Dynamic teams setting
328 @item @emph{Description}:
329 This function returns @code{true} if enabled, @code{false} otherwise.
330 Here, @code{true} and @code{false} represent their language-specific
333 The dynamic team setting may be initialized at startup by the
334 @env{OMP_DYNAMIC} environment variable or at runtime using
335 @code{omp_set_dynamic}. If undefined, dynamic adjustment is
339 @multitable @columnfractions .20 .80
340 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
343 @item @emph{Fortran}:
344 @multitable @columnfractions .20 .80
345 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
348 @item @emph{See also}:
349 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
351 @item @emph{Reference}:
352 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
357 @node omp_get_initial_device
358 @section @code{omp_get_initial_device} -- Return device number of initial device
360 @item @emph{Description}:
361 This function returns a device number that represents the host device.
362 For OpenMP 5.1, this must be equal to the value returned by the
363 @code{omp_get_num_devices} function.
366 @multitable @columnfractions .20 .80
367 @item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
370 @item @emph{Fortran}:
371 @multitable @columnfractions .20 .80
372 @item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
375 @item @emph{See also}:
376 @ref{omp_get_num_devices}
378 @item @emph{Reference}:
379 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
385 @section @code{omp_get_level} -- Obtain the current nesting level
387 @item @emph{Description}:
388 This function returns the nesting level for the parallel blocks,
389 which enclose the calling call.
392 @multitable @columnfractions .20 .80
393 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
396 @item @emph{Fortran}:
397 @multitable @columnfractions .20 .80
398 @item @emph{Interface}: @tab @code{integer function omp_level()}
401 @item @emph{See also}:
402 @ref{omp_get_active_level}
404 @item @emph{Reference}:
405 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
410 @node omp_get_max_active_levels
411 @section @code{omp_get_max_active_levels} -- Current maximum number of active regions
413 @item @emph{Description}:
414 This function obtains the maximum allowed number of nested, active parallel regions.
417 @multitable @columnfractions .20 .80
418 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
421 @item @emph{Fortran}:
422 @multitable @columnfractions .20 .80
423 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
426 @item @emph{See also}:
427 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
429 @item @emph{Reference}:
430 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
434 @node omp_get_max_task_priority
435 @section @code{omp_get_max_task_priority} -- Maximum priority value
436 that can be set for tasks.
438 @item @emph{Description}:
439 This function obtains the maximum allowed priority number for tasks.
442 @multitable @columnfractions .20 .80
443 @item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
446 @item @emph{Fortran}:
447 @multitable @columnfractions .20 .80
448 @item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
451 @item @emph{Reference}:
452 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
456 @node omp_get_max_threads
457 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
459 @item @emph{Description}:
460 Return the maximum number of threads used for the current parallel region
461 that does not use the clause @code{num_threads}.
464 @multitable @columnfractions .20 .80
465 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
468 @item @emph{Fortran}:
469 @multitable @columnfractions .20 .80
470 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
473 @item @emph{See also}:
474 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
476 @item @emph{Reference}:
477 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
483 @section @code{omp_get_nested} -- Nested parallel regions
485 @item @emph{Description}:
486 This function returns @code{true} if nested parallel regions are
487 enabled, @code{false} otherwise. Here, @code{true} and @code{false}
488 represent their language-specific counterparts.
490 Nested parallel regions may be initialized at startup by the
491 @env{OMP_NESTED} environment variable or at runtime using
492 @code{omp_set_nested}. If undefined, nested parallel regions are
496 @multitable @columnfractions .20 .80
497 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
500 @item @emph{Fortran}:
501 @multitable @columnfractions .20 .80
502 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
505 @item @emph{See also}:
506 @ref{omp_set_nested}, @ref{OMP_NESTED}
508 @item @emph{Reference}:
509 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
514 @node omp_get_num_devices
515 @section @code{omp_get_num_devices} -- Number of target devices
517 @item @emph{Description}:
518 Returns the number of target devices.
521 @multitable @columnfractions .20 .80
522 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
525 @item @emph{Fortran}:
526 @multitable @columnfractions .20 .80
527 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
530 @item @emph{Reference}:
531 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
536 @node omp_get_num_procs
537 @section @code{omp_get_num_procs} -- Number of processors online
539 @item @emph{Description}:
540 Returns the number of processors online on that device.
543 @multitable @columnfractions .20 .80
544 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
547 @item @emph{Fortran}:
548 @multitable @columnfractions .20 .80
549 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
552 @item @emph{Reference}:
553 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
558 @node omp_get_num_teams
559 @section @code{omp_get_num_teams} -- Number of teams
561 @item @emph{Description}:
562 Returns the number of teams in the current team region.
565 @multitable @columnfractions .20 .80
566 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
569 @item @emph{Fortran}:
570 @multitable @columnfractions .20 .80
571 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
574 @item @emph{Reference}:
575 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
580 @node omp_get_num_threads
581 @section @code{omp_get_num_threads} -- Size of the active team
583 @item @emph{Description}:
584 Returns the number of threads in the current team. In a sequential section of
585 the program @code{omp_get_num_threads} returns 1.
587 The default team size may be initialized at startup by the
588 @env{OMP_NUM_THREADS} environment variable. At runtime, the size
589 of the current team may be set either by the @code{NUM_THREADS}
590 clause or by @code{omp_set_num_threads}. If none of the above were
591 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
592 one thread per CPU online is used.
595 @multitable @columnfractions .20 .80
596 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
599 @item @emph{Fortran}:
600 @multitable @columnfractions .20 .80
601 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
604 @item @emph{See also}:
605 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
607 @item @emph{Reference}:
608 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
613 @node omp_get_proc_bind
614 @section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
616 @item @emph{Description}:
617 This functions returns the currently active thread affinity policy, which is
618 set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
619 @code{omp_proc_bind_true}, @code{omp_proc_bind_master},
620 @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
623 @multitable @columnfractions .20 .80
624 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
627 @item @emph{Fortran}:
628 @multitable @columnfractions .20 .80
629 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
632 @item @emph{See also}:
633 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
635 @item @emph{Reference}:
636 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
641 @node omp_get_schedule
642 @section @code{omp_get_schedule} -- Obtain the runtime scheduling method
644 @item @emph{Description}:
645 Obtain the runtime scheduling method. The @var{kind} argument will be
646 set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
647 @code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
648 @var{chunk_size}, is set to the chunk size.
651 @multitable @columnfractions .20 .80
652 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
655 @item @emph{Fortran}:
656 @multitable @columnfractions .20 .80
657 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
658 @item @tab @code{integer(kind=omp_sched_kind) kind}
659 @item @tab @code{integer chunk_size}
662 @item @emph{See also}:
663 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
665 @item @emph{Reference}:
666 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
670 @node omp_get_supported_active_levels
671 @section @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
673 @item @emph{Description}:
674 This function returns the maximum number of nested, active parallel regions
675 supported by this implementation.
678 @multitable @columnfractions .20 .80
679 @item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
682 @item @emph{Fortran}:
683 @multitable @columnfractions .20 .80
684 @item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
687 @item @emph{See also}:
688 @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
690 @item @emph{Reference}:
691 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
696 @node omp_get_team_num
697 @section @code{omp_get_team_num} -- Get team number
699 @item @emph{Description}:
700 Returns the team number of the calling thread.
703 @multitable @columnfractions .20 .80
704 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
707 @item @emph{Fortran}:
708 @multitable @columnfractions .20 .80
709 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
712 @item @emph{Reference}:
713 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
718 @node omp_get_team_size
719 @section @code{omp_get_team_size} -- Number of threads in a team
721 @item @emph{Description}:
722 This function returns the number of threads in a thread team to which
723 either the current thread or its ancestor belongs. For values of @var{level}
724 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
725 1 is returned, and for @code{omp_get_level}, the result is identical
726 to @code{omp_get_num_threads}.
729 @multitable @columnfractions .20 .80
730 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
733 @item @emph{Fortran}:
734 @multitable @columnfractions .20 .80
735 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
736 @item @tab @code{integer level}
739 @item @emph{See also}:
740 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
742 @item @emph{Reference}:
743 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
748 @node omp_get_thread_limit
749 @section @code{omp_get_thread_limit} -- Maximum number of threads
751 @item @emph{Description}:
752 Return the maximum number of threads of the program.
755 @multitable @columnfractions .20 .80
756 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
759 @item @emph{Fortran}:
760 @multitable @columnfractions .20 .80
761 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
764 @item @emph{See also}:
765 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
767 @item @emph{Reference}:
768 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
773 @node omp_get_thread_num
774 @section @code{omp_get_thread_num} -- Current thread ID
776 @item @emph{Description}:
777 Returns a unique thread identification number within the current team.
778 In a sequential parts of the program, @code{omp_get_thread_num}
779 always returns 0. In parallel regions the return value varies
780 from 0 to @code{omp_get_num_threads}-1 inclusive. The return
781 value of the master thread of a team is always 0.
784 @multitable @columnfractions .20 .80
785 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
788 @item @emph{Fortran}:
789 @multitable @columnfractions .20 .80
790 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
793 @item @emph{See also}:
794 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
796 @item @emph{Reference}:
797 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
802 @node omp_in_parallel
803 @section @code{omp_in_parallel} -- Whether a parallel region is active
805 @item @emph{Description}:
806 This function returns @code{true} if currently running in parallel,
807 @code{false} otherwise. Here, @code{true} and @code{false} represent
808 their language-specific counterparts.
811 @multitable @columnfractions .20 .80
812 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
815 @item @emph{Fortran}:
816 @multitable @columnfractions .20 .80
817 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
820 @item @emph{Reference}:
821 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
826 @section @code{omp_in_final} -- Whether in final or included task region
828 @item @emph{Description}:
829 This function returns @code{true} if currently running in a final
830 or included task region, @code{false} otherwise. Here, @code{true}
831 and @code{false} represent their language-specific counterparts.
834 @multitable @columnfractions .20 .80
835 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
838 @item @emph{Fortran}:
839 @multitable @columnfractions .20 .80
840 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
843 @item @emph{Reference}:
844 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
849 @node omp_is_initial_device
850 @section @code{omp_is_initial_device} -- Whether executing on the host device
852 @item @emph{Description}:
853 This function returns @code{true} if currently running on the host device,
854 @code{false} otherwise. Here, @code{true} and @code{false} represent
855 their language-specific counterparts.
858 @multitable @columnfractions .20 .80
859 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
862 @item @emph{Fortran}:
863 @multitable @columnfractions .20 .80
864 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
867 @item @emph{Reference}:
868 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
873 @node omp_set_default_device
874 @section @code{omp_set_default_device} -- Set the default device for target regions
876 @item @emph{Description}:
877 Set the default device for target regions without device clause. The argument
878 shall be a nonnegative device number.
881 @multitable @columnfractions .20 .80
882 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
885 @item @emph{Fortran}:
886 @multitable @columnfractions .20 .80
887 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
888 @item @tab @code{integer device_num}
891 @item @emph{See also}:
892 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
894 @item @emph{Reference}:
895 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
900 @node omp_set_dynamic
901 @section @code{omp_set_dynamic} -- Enable/disable dynamic teams
903 @item @emph{Description}:
904 Enable or disable the dynamic adjustment of the number of threads
905 within a team. The function takes the language-specific equivalent
906 of @code{true} and @code{false}, where @code{true} enables dynamic
907 adjustment of team sizes and @code{false} disables it.
910 @multitable @columnfractions .20 .80
911 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
914 @item @emph{Fortran}:
915 @multitable @columnfractions .20 .80
916 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
917 @item @tab @code{logical, intent(in) :: dynamic_threads}
920 @item @emph{See also}:
921 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
923 @item @emph{Reference}:
924 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
929 @node omp_set_max_active_levels
930 @section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
932 @item @emph{Description}:
933 This function limits the maximum allowed number of nested, active
934 parallel regions. @var{max_levels} must be less or equal to
935 the value returned by @code{omp_get_supported_active_levels}.
938 @multitable @columnfractions .20 .80
939 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
942 @item @emph{Fortran}:
943 @multitable @columnfractions .20 .80
944 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
945 @item @tab @code{integer max_levels}
948 @item @emph{See also}:
949 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
950 @ref{omp_get_supported_active_levels}
952 @item @emph{Reference}:
953 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
959 @section @code{omp_set_nested} -- Enable/disable nested parallel regions
961 @item @emph{Description}:
962 Enable or disable nested parallel regions, i.e., whether team members
963 are allowed to create new teams. The function takes the language-specific
964 equivalent of @code{true} and @code{false}, where @code{true} enables
965 dynamic adjustment of team sizes and @code{false} disables it.
968 @multitable @columnfractions .20 .80
969 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
972 @item @emph{Fortran}:
973 @multitable @columnfractions .20 .80
974 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
975 @item @tab @code{logical, intent(in) :: nested}
978 @item @emph{See also}:
979 @ref{OMP_NESTED}, @ref{omp_get_nested}
981 @item @emph{Reference}:
982 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
987 @node omp_set_num_threads
988 @section @code{omp_set_num_threads} -- Set upper team size limit
990 @item @emph{Description}:
991 Specifies the number of threads used by default in subsequent parallel
992 sections, if those do not specify a @code{num_threads} clause. The
993 argument of @code{omp_set_num_threads} shall be a positive integer.
996 @multitable @columnfractions .20 .80
997 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
1000 @item @emph{Fortran}:
1001 @multitable @columnfractions .20 .80
1002 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
1003 @item @tab @code{integer, intent(in) :: num_threads}
1006 @item @emph{See also}:
1007 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
1009 @item @emph{Reference}:
1010 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
1015 @node omp_set_schedule
1016 @section @code{omp_set_schedule} -- Set the runtime scheduling method
1018 @item @emph{Description}:
1019 Sets the runtime scheduling method. The @var{kind} argument can have the
1020 value @code{omp_sched_static}, @code{omp_sched_dynamic},
1021 @code{omp_sched_guided} or @code{omp_sched_auto}. Except for
1022 @code{omp_sched_auto}, the chunk size is set to the value of
1023 @var{chunk_size} if positive, or to the default value if zero or negative.
1024 For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
1027 @multitable @columnfractions .20 .80
1028 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
1031 @item @emph{Fortran}:
1032 @multitable @columnfractions .20 .80
1033 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
1034 @item @tab @code{integer(kind=omp_sched_kind) kind}
1035 @item @tab @code{integer chunk_size}
1038 @item @emph{See also}:
1039 @ref{omp_get_schedule}
1042 @item @emph{Reference}:
1043 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
1049 @section @code{omp_init_lock} -- Initialize simple lock
1051 @item @emph{Description}:
1052 Initialize a simple lock. After initialization, the lock is in
1056 @multitable @columnfractions .20 .80
1057 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1060 @item @emph{Fortran}:
1061 @multitable @columnfractions .20 .80
1062 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1063 @item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1066 @item @emph{See also}:
1067 @ref{omp_destroy_lock}
1069 @item @emph{Reference}:
1070 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1076 @section @code{omp_set_lock} -- Wait for and set simple lock
1078 @item @emph{Description}:
1079 Before setting a simple lock, the lock variable must be initialized by
1080 @code{omp_init_lock}. The calling thread is blocked until the lock
1081 is available. If the lock is already held by the current thread,
1085 @multitable @columnfractions .20 .80
1086 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1089 @item @emph{Fortran}:
1090 @multitable @columnfractions .20 .80
1091 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1092 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1095 @item @emph{See also}:
1096 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1098 @item @emph{Reference}:
1099 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1105 @section @code{omp_test_lock} -- Test and set simple lock if available
1107 @item @emph{Description}:
1108 Before setting a simple lock, the lock variable must be initialized by
1109 @code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1110 does not block if the lock is not available. This function returns
1111 @code{true} upon success, @code{false} otherwise. Here, @code{true} and
1112 @code{false} represent their language-specific counterparts.
1115 @multitable @columnfractions .20 .80
1116 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1119 @item @emph{Fortran}:
1120 @multitable @columnfractions .20 .80
1121 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1122 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1125 @item @emph{See also}:
1126 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1128 @item @emph{Reference}:
1129 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1134 @node omp_unset_lock
1135 @section @code{omp_unset_lock} -- Unset simple lock
1137 @item @emph{Description}:
1138 A simple lock about to be unset must have been locked by @code{omp_set_lock}
1139 or @code{omp_test_lock} before. In addition, the lock must be held by the
1140 thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
1141 or more threads attempted to set the lock before, one of them is chosen to,
1142 again, set the lock to itself.
1145 @multitable @columnfractions .20 .80
1146 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1149 @item @emph{Fortran}:
1150 @multitable @columnfractions .20 .80
1151 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1152 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1155 @item @emph{See also}:
1156 @ref{omp_set_lock}, @ref{omp_test_lock}
1158 @item @emph{Reference}:
1159 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1164 @node omp_destroy_lock
1165 @section @code{omp_destroy_lock} -- Destroy simple lock
1167 @item @emph{Description}:
1168 Destroy a simple lock. In order to be destroyed, a simple lock must be
1169 in the unlocked state.
1172 @multitable @columnfractions .20 .80
1173 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1176 @item @emph{Fortran}:
1177 @multitable @columnfractions .20 .80
1178 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1179 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1182 @item @emph{See also}:
1185 @item @emph{Reference}:
1186 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1191 @node omp_init_nest_lock
1192 @section @code{omp_init_nest_lock} -- Initialize nested lock
1194 @item @emph{Description}:
1195 Initialize a nested lock. After initialization, the lock is in
1196 an unlocked state and the nesting count is set to zero.
1199 @multitable @columnfractions .20 .80
1200 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1203 @item @emph{Fortran}:
1204 @multitable @columnfractions .20 .80
1205 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1206 @item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1209 @item @emph{See also}:
1210 @ref{omp_destroy_nest_lock}
1212 @item @emph{Reference}:
1213 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1217 @node omp_set_nest_lock
1218 @section @code{omp_set_nest_lock} -- Wait for and set nested lock
1220 @item @emph{Description}:
1221 Before setting a nested lock, the lock variable must be initialized by
1222 @code{omp_init_nest_lock}. The calling thread is blocked until the lock
1223 is available. If the lock is already held by the current thread, the
1224 nesting count for the lock is incremented.
1227 @multitable @columnfractions .20 .80
1228 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1231 @item @emph{Fortran}:
1232 @multitable @columnfractions .20 .80
1233 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1234 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1237 @item @emph{See also}:
1238 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1240 @item @emph{Reference}:
1241 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1246 @node omp_test_nest_lock
1247 @section @code{omp_test_nest_lock} -- Test and set nested lock if available
1249 @item @emph{Description}:
1250 Before setting a nested lock, the lock variable must be initialized by
1251 @code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
1252 @code{omp_test_nest_lock} does not block if the lock is not available.
1253 If the lock is already held by the current thread, the new nesting count
1254 is returned. Otherwise, the return value equals zero.
1257 @multitable @columnfractions .20 .80
1258 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1261 @item @emph{Fortran}:
1262 @multitable @columnfractions .20 .80
1263 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1264 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1268 @item @emph{See also}:
1269 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1271 @item @emph{Reference}:
1272 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1277 @node omp_unset_nest_lock
1278 @section @code{omp_unset_nest_lock} -- Unset nested lock
1280 @item @emph{Description}:
1281 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1282 or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
1283 thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
1284 lock becomes unlocked. If one ore more threads attempted to set the lock before,
1285 one of them is chosen to, again, set the lock to itself.
1288 @multitable @columnfractions .20 .80
1289 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1292 @item @emph{Fortran}:
1293 @multitable @columnfractions .20 .80
1294 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1295 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1298 @item @emph{See also}:
1299 @ref{omp_set_nest_lock}
1301 @item @emph{Reference}:
1302 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1307 @node omp_destroy_nest_lock
1308 @section @code{omp_destroy_nest_lock} -- Destroy nested lock
1310 @item @emph{Description}:
1311 Destroy a nested lock. In order to be destroyed, a nested lock must be
1312 in the unlocked state and its nesting count must equal zero.
1315 @multitable @columnfractions .20 .80
1316 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1319 @item @emph{Fortran}:
1320 @multitable @columnfractions .20 .80
1321 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1322 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1325 @item @emph{See also}:
1328 @item @emph{Reference}:
1329 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1335 @section @code{omp_get_wtick} -- Get timer precision
1337 @item @emph{Description}:
1338 Gets the timer precision, i.e., the number of seconds between two
1339 successive clock ticks.
1342 @multitable @columnfractions .20 .80
1343 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1346 @item @emph{Fortran}:
1347 @multitable @columnfractions .20 .80
1348 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1351 @item @emph{See also}:
1354 @item @emph{Reference}:
1355 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
1361 @section @code{omp_get_wtime} -- Elapsed wall clock time
1363 @item @emph{Description}:
1364 Elapsed wall clock time in seconds. The time is measured per thread, no
1365 guarantee can be made that two distinct threads measure the same time.
1366 Time is measured from some "time in the past", which is an arbitrary time
1367 guaranteed not to change during the execution of the program.
1370 @multitable @columnfractions .20 .80
1371 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1374 @item @emph{Fortran}:
1375 @multitable @columnfractions .20 .80
1376 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1379 @item @emph{See also}:
1382 @item @emph{Reference}:
1383 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
1388 @c ---------------------------------------------------------------------
1389 @c OpenMP Environment Variables
1390 @c ---------------------------------------------------------------------
1392 @node Environment Variables
1393 @chapter OpenMP Environment Variables
1395 The environment variables which beginning with @env{OMP_} are defined by
1396 section 4 of the OpenMP specification in version 4.5, while those
1397 beginning with @env{GOMP_} are GNU extensions.
1400 * OMP_CANCELLATION:: Set whether cancellation is activated
1401 * OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
1402 * OMP_DEFAULT_DEVICE:: Set the device used in target regions
1403 * OMP_DYNAMIC:: Dynamic adjustment of threads
1404 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
1405 * OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
1406 * OMP_NESTED:: Nested parallel regions
1407 * OMP_NUM_THREADS:: Specifies the number of threads to use
1408 * OMP_PROC_BIND:: Whether theads may be moved between CPUs
1409 * OMP_PLACES:: Specifies on which CPUs the theads should be placed
1410 * OMP_STACKSIZE:: Set default thread stack size
1411 * OMP_SCHEDULE:: How threads are scheduled
1412 * OMP_TARGET_OFFLOAD:: Controls offloading behaviour
1413 * OMP_THREAD_LIMIT:: Set the maximum number of threads
1414 * OMP_WAIT_POLICY:: How waiting threads are handled
1415 * GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
1416 * GOMP_DEBUG:: Enable debugging output
1417 * GOMP_STACKSIZE:: Set default thread stack size
1418 * GOMP_SPINCOUNT:: Set the busy-wait spin count
1419 * GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
1423 @node OMP_CANCELLATION
1424 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1425 @cindex Environment Variable
1427 @item @emph{Description}:
1428 If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
1429 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1431 @item @emph{See also}:
1432 @ref{omp_get_cancellation}
1434 @item @emph{Reference}:
1435 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
1440 @node OMP_DISPLAY_ENV
1441 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1442 @cindex Environment Variable
1444 @item @emph{Description}:
1445 If set to @code{TRUE}, the OpenMP version number and the values
1446 associated with the OpenMP environment variables are printed to @code{stderr}.
1447 If set to @code{VERBOSE}, it additionally shows the value of the environment
1448 variables which are GNU extensions. If undefined or set to @code{FALSE},
1449 this information will not be shown.
1452 @item @emph{Reference}:
1453 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
1458 @node OMP_DEFAULT_DEVICE
1459 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1460 @cindex Environment Variable
1462 @item @emph{Description}:
1463 Set to choose the device which is used in a @code{target} region, unless the
1464 value is overridden by @code{omp_set_default_device} or by a @code{device}
1465 clause. The value shall be the nonnegative device number. If no device with
1466 the given device number exists, the code is executed on the host. If unset,
1467 device number 0 will be used.
1470 @item @emph{See also}:
1471 @ref{omp_get_default_device}, @ref{omp_set_default_device},
1473 @item @emph{Reference}:
1474 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
1480 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1481 @cindex Environment Variable
1483 @item @emph{Description}:
1484 Enable or disable the dynamic adjustment of the number of threads
1485 within a team. The value of this environment variable shall be
1486 @code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
1487 disabled by default.
1489 @item @emph{See also}:
1490 @ref{omp_set_dynamic}
1492 @item @emph{Reference}:
1493 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
1498 @node OMP_MAX_ACTIVE_LEVELS
1499 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
1500 @cindex Environment Variable
1502 @item @emph{Description}:
1503 Specifies the initial value for the maximum number of nested parallel
1504 regions. The value of this variable shall be a positive integer.
1505 If undefined, the number of active levels is unlimited.
1507 @item @emph{See also}:
1508 @ref{omp_set_max_active_levels}
1510 @item @emph{Reference}:
1511 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
1516 @node OMP_MAX_TASK_PRIORITY
1517 @section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
1518 number that can be set for a task.
1519 @cindex Environment Variable
1521 @item @emph{Description}:
1522 Specifies the initial value for the maximum priority value that can be
1523 set for a task. The value of this variable shall be a non-negative
1524 integer, and zero is allowed. If undefined, the default priority is
1527 @item @emph{See also}:
1528 @ref{omp_get_max_task_priority}
1530 @item @emph{Reference}:
1531 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
1537 @section @env{OMP_NESTED} -- Nested parallel regions
1538 @cindex Environment Variable
1539 @cindex Implementation specific setting
1541 @item @emph{Description}:
1542 Enable or disable nested parallel regions, i.e., whether team members
1543 are allowed to create new teams. The value of this environment variable
1544 shall be @code{TRUE} or @code{FALSE}. If undefined, nested parallel
1545 regions are disabled by default.
1547 @item @emph{See also}:
1548 @ref{omp_set_nested}
1550 @item @emph{Reference}:
1551 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
1556 @node OMP_NUM_THREADS
1557 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
1558 @cindex Environment Variable
1559 @cindex Implementation specific setting
1561 @item @emph{Description}:
1562 Specifies the default number of threads to use in parallel regions. The
1563 value of this variable shall be a comma-separated list of positive integers;
1564 the value specified the number of threads to use for the corresponding nested
1565 level. If undefined one thread per CPU is used.
1567 @item @emph{See also}:
1568 @ref{omp_set_num_threads}
1570 @item @emph{Reference}:
1571 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
1577 @section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
1578 @cindex Environment Variable
1580 @item @emph{Description}:
1581 Specifies whether threads may be moved between processors. If set to
1582 @code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
1583 they may be moved. Alternatively, a comma separated list with the
1584 values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
1585 the thread affinity policy for the corresponding nesting level. With
1586 @code{MASTER} the worker threads are in the same place partition as the
1587 master thread. With @code{CLOSE} those are kept close to the master thread
1588 in contiguous place partitions. And with @code{SPREAD} a sparse distribution
1589 across the place partitions is used.
1591 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
1592 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
1594 @item @emph{See also}:
1595 @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
1597 @item @emph{Reference}:
1598 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
1604 @section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
1605 @cindex Environment Variable
1607 @item @emph{Description}:
1608 The thread placement can be either specified using an abstract name or by an
1609 explicit list of the places. The abstract names @code{threads}, @code{cores}
1610 and @code{sockets} can be optionally followed by a positive number in
1611 parentheses, which denotes the how many places shall be created. With
1612 @code{threads} each place corresponds to a single hardware thread; @code{cores}
1613 to a single core with the corresponding number of hardware threads; and with
1614 @code{sockets} the place corresponds to a single socket. The resulting
1615 placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
1618 Alternatively, the placement can be specified explicitly as comma-separated
1619 list of places. A place is specified by set of nonnegative numbers in curly
1620 braces, denoting the denoting the hardware threads. The hardware threads
1621 belonging to a place can either be specified as comma-separated list of
1622 nonnegative thread numbers or using an interval. Multiple places can also be
1623 either specified by a comma-separated list of places or by an interval. To
1624 specify an interval, a colon followed by the count is placed after after
1625 the hardware thread number or the place. Optionally, the length can be
1626 followed by a colon and the stride number -- otherwise a unit stride is
1627 assumed. For instance, the following specifies the same places list:
1628 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
1629 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
1631 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
1632 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
1633 between CPUs following no placement policy.
1635 @item @emph{See also}:
1636 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
1637 @ref{OMP_DISPLAY_ENV}
1639 @item @emph{Reference}:
1640 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
1646 @section @env{OMP_STACKSIZE} -- Set default thread stack size
1647 @cindex Environment Variable
1649 @item @emph{Description}:
1650 Set the default thread stack size in kilobytes, unless the number
1651 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
1652 case the size is, respectively, in bytes, kilobytes, megabytes
1653 or gigabytes. This is different from @code{pthread_attr_setstacksize}
1654 which gets the number of bytes as an argument. If the stack size cannot
1655 be set due to system constraints, an error is reported and the initial
1656 stack size is left unchanged. If undefined, the stack size is system
1659 @item @emph{Reference}:
1660 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
1666 @section @env{OMP_SCHEDULE} -- How threads are scheduled
1667 @cindex Environment Variable
1668 @cindex Implementation specific setting
1670 @item @emph{Description}:
1671 Allows to specify @code{schedule type} and @code{chunk size}.
1672 The value of the variable shall have the form: @code{type[,chunk]} where
1673 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
1674 The optional @code{chunk} size shall be a positive integer. If undefined,
1675 dynamic scheduling and a chunk size of 1 is used.
1677 @item @emph{See also}:
1678 @ref{omp_set_schedule}
1680 @item @emph{Reference}:
1681 @uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
1686 @node OMP_TARGET_OFFLOAD
1687 @section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behaviour
1688 @cindex Environment Variable
1689 @cindex Implementation specific setting
1691 @item @emph{Description}:
1692 Specifies the behaviour with regard to offloading code to a device. This
1693 variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
1696 If set to @code{MANDATORY}, the program will terminate with an error if
1697 the offload device is not present or is not supported. If set to
1698 @code{DISABLED}, then offloading is disabled and all code will run on the
1699 host. If set to @code{DEFAULT}, the program will try offloading to the
1700 device first, then fall back to running code on the host if it cannot.
1702 If undefined, then the program will behave as if @code{DEFAULT} was set.
1704 @item @emph{Reference}:
1705 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17
1710 @node OMP_THREAD_LIMIT
1711 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
1712 @cindex Environment Variable
1714 @item @emph{Description}:
1715 Specifies the number of threads to use for the whole program. The
1716 value of this variable shall be a positive integer. If undefined,
1717 the number of threads is not limited.
1719 @item @emph{See also}:
1720 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
1722 @item @emph{Reference}:
1723 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
1728 @node OMP_WAIT_POLICY
1729 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
1730 @cindex Environment Variable
1732 @item @emph{Description}:
1733 Specifies whether waiting threads should be active or passive. If
1734 the value is @code{PASSIVE}, waiting threads should not consume CPU
1735 power while waiting; while the value is @code{ACTIVE} specifies that
1736 they should. If undefined, threads wait actively for a short time
1737 before waiting passively.
1739 @item @emph{See also}:
1740 @ref{GOMP_SPINCOUNT}
1742 @item @emph{Reference}:
1743 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
1748 @node GOMP_CPU_AFFINITY
1749 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
1750 @cindex Environment Variable
1752 @item @emph{Description}:
1753 Binds threads to specific CPUs. The variable should contain a space-separated
1754 or comma-separated list of CPUs. This list may contain different kinds of
1755 entries: either single CPU numbers in any order, a range of CPUs (M-N)
1756 or a range with some stride (M-N:S). CPU numbers are zero based. For example,
1757 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
1758 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
1759 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
1760 and 14 respectively and then start assigning back from the beginning of
1761 the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
1763 There is no libgomp library routine to determine whether a CPU affinity
1764 specification is in effect. As a workaround, language-specific library
1765 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
1766 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
1767 environment variable. A defined CPU affinity on startup cannot be changed
1768 or disabled during the runtime of the application.
1770 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
1771 @env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
1772 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
1773 @code{FALSE}, the host system will handle the assignment of threads to CPUs.
1775 @item @emph{See also}:
1776 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
1782 @section @env{GOMP_DEBUG} -- Enable debugging output
1783 @cindex Environment Variable
1785 @item @emph{Description}:
1786 Enable debugging output. The variable should be set to @code{0}
1787 (disabled, also the default if not set), or @code{1} (enabled).
1789 If enabled, some debugging output will be printed during execution.
1790 This is currently not specified in more detail, and subject to change.
1795 @node GOMP_STACKSIZE
1796 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
1797 @cindex Environment Variable
1798 @cindex Implementation specific setting
1800 @item @emph{Description}:
1801 Set the default thread stack size in kilobytes. This is different from
1802 @code{pthread_attr_setstacksize} which gets the number of bytes as an
1803 argument. If the stack size cannot be set due to system constraints, an
1804 error is reported and the initial stack size is left unchanged. If undefined,
1805 the stack size is system dependent.
1807 @item @emph{See also}:
1810 @item @emph{Reference}:
1811 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
1812 GCC Patches Mailinglist},
1813 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
1814 GCC Patches Mailinglist}
1819 @node GOMP_SPINCOUNT
1820 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
1821 @cindex Environment Variable
1822 @cindex Implementation specific setting
1824 @item @emph{Description}:
1825 Determines how long a threads waits actively with consuming CPU power
1826 before waiting passively without consuming CPU power. The value may be
1827 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
1828 integer which gives the number of spins of the busy-wait loop. The
1829 integer may optionally be followed by the following suffixes acting
1830 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
1831 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
1832 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
1833 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
1834 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
1835 If there are more OpenMP threads than available CPUs, 1000 and 100
1836 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
1837 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
1838 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
1840 @item @emph{See also}:
1841 @ref{OMP_WAIT_POLICY}
1846 @node GOMP_RTEMS_THREAD_POOLS
1847 @section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
1848 @cindex Environment Variable
1849 @cindex Implementation specific setting
1851 @item @emph{Description}:
1852 This environment variable is only used on the RTEMS real-time operating system.
1853 It determines the scheduler instance specific thread pools. The format for
1854 @env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
1855 @code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
1856 separated by @code{:} where:
1858 @item @code{<thread-pool-count>} is the thread pool count for this scheduler
1860 @item @code{$<priority>} is an optional priority for the worker threads of a
1861 thread pool according to @code{pthread_setschedparam}. In case a priority
1862 value is omitted, then a worker thread will inherit the priority of the OpenMP
1863 master thread that created it. The priority of the worker thread is not
1864 changed after creation, even if a new OpenMP master thread using the worker has
1865 a different priority.
1866 @item @code{@@<scheduler-name>} is the scheduler instance name according to the
1867 RTEMS application configuration.
1869 In case no thread pool configuration is specified for a scheduler instance,
1870 then each OpenMP master thread of this scheduler instance will use its own
1871 dynamically allocated thread pool. To limit the worker thread count of the
1872 thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
1873 @item @emph{Example}:
1874 Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
1875 @code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
1876 @code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
1877 scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
1878 one thread pool available. Since no priority is specified for this scheduler
1879 instance, the worker thread inherits the priority of the OpenMP master thread
1880 that created it. In the scheduler instance @code{WRK1} there are three thread
1881 pools available and their worker threads run at priority four.
1886 @c ---------------------------------------------------------------------
1888 @c ---------------------------------------------------------------------
1890 @node Enabling OpenACC
1891 @chapter Enabling OpenACC
1893 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
1894 flag @option{-fopenacc} must be specified. This enables the OpenACC directive
1895 @code{#pragma acc} in C/C++ and @code{!$acc} directives in free form,
1896 @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
1897 @code{!$} conditional compilation sentinels in free form and @code{c$},
1898 @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
1899 arranges for automatic linking of the OpenACC runtime library
1900 (@ref{OpenACC Runtime Library Routines}).
1902 See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
1904 A complete description of all OpenACC directives accepted may be found in
1905 the @uref{https://www.openacc.org, OpenACC} Application Programming
1906 Interface manual, version 2.6.
1910 @c ---------------------------------------------------------------------
1911 @c OpenACC Runtime Library Routines
1912 @c ---------------------------------------------------------------------
1914 @node OpenACC Runtime Library Routines
1915 @chapter OpenACC Runtime Library Routines
1917 The runtime routines described here are defined by section 3 of the OpenACC
1918 specifications in version 2.6.
1919 They have C linkage, and do not throw exceptions.
1920 Generally, they are available only for the host, with the exception of
1921 @code{acc_on_device}, which is available for both the host and the
1922 acceleration device.
1925 * acc_get_num_devices:: Get number of devices for the given device
1927 * acc_set_device_type:: Set type of device accelerator to use.
1928 * acc_get_device_type:: Get type of device accelerator to be used.
1929 * acc_set_device_num:: Set device number to use.
1930 * acc_get_device_num:: Get device number to be used.
1931 * acc_get_property:: Get device property.
1932 * acc_async_test:: Tests for completion of a specific asynchronous
1934 * acc_async_test_all:: Tests for completion of all asynchronous
1936 * acc_wait:: Wait for completion of a specific asynchronous
1938 * acc_wait_all:: Waits for completion of all asynchronous
1940 * acc_wait_all_async:: Wait for completion of all asynchronous
1942 * acc_wait_async:: Wait for completion of asynchronous operations.
1943 * acc_init:: Initialize runtime for a specific device type.
1944 * acc_shutdown:: Shuts down the runtime for a specific device
1946 * acc_on_device:: Whether executing on a particular device
1947 * acc_malloc:: Allocate device memory.
1948 * acc_free:: Free device memory.
1949 * acc_copyin:: Allocate device memory and copy host memory to
1951 * acc_present_or_copyin:: If the data is not present on the device,
1952 allocate device memory and copy from host
1954 * acc_create:: Allocate device memory and map it to host
1956 * acc_present_or_create:: If the data is not present on the device,
1957 allocate device memory and map it to host
1959 * acc_copyout:: Copy device memory to host memory.
1960 * acc_delete:: Free device memory.
1961 * acc_update_device:: Update device memory from mapped host memory.
1962 * acc_update_self:: Update host memory from mapped device memory.
1963 * acc_map_data:: Map previously allocated device memory to host
1965 * acc_unmap_data:: Unmap device memory from host memory.
1966 * acc_deviceptr:: Get device pointer associated with specific
1968 * acc_hostptr:: Get host pointer associated with specific
1970 * acc_is_present:: Indicate whether host variable / array is
1972 * acc_memcpy_to_device:: Copy host memory to device memory.
1973 * acc_memcpy_from_device:: Copy device memory to host memory.
1974 * acc_attach:: Let device pointer point to device-pointer target.
1975 * acc_detach:: Let device pointer point to host-pointer target.
1977 API routines for target platforms.
1979 * acc_get_current_cuda_device:: Get CUDA device handle.
1980 * acc_get_current_cuda_context::Get CUDA context handle.
1981 * acc_get_cuda_stream:: Get CUDA stream handle.
1982 * acc_set_cuda_stream:: Set CUDA stream handle.
1984 API routines for the OpenACC Profiling Interface.
1986 * acc_prof_register:: Register callbacks.
1987 * acc_prof_unregister:: Unregister callbacks.
1988 * acc_prof_lookup:: Obtain inquiry functions.
1989 * acc_register_library:: Library registration.
1994 @node acc_get_num_devices
1995 @section @code{acc_get_num_devices} -- Get number of devices for given device type
1997 @item @emph{Description}
1998 This function returns a value indicating the number of devices available
1999 for the device type specified in @var{devicetype}.
2002 @multitable @columnfractions .20 .80
2003 @item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
2006 @item @emph{Fortran}:
2007 @multitable @columnfractions .20 .80
2008 @item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
2009 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2012 @item @emph{Reference}:
2013 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2019 @node acc_set_device_type
2020 @section @code{acc_set_device_type} -- Set type of device accelerator to use.
2022 @item @emph{Description}
2023 This function indicates to the runtime library which device type, specified
2024 in @var{devicetype}, to use when executing a parallel or kernels region.
2027 @multitable @columnfractions .20 .80
2028 @item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
2031 @item @emph{Fortran}:
2032 @multitable @columnfractions .20 .80
2033 @item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
2034 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2037 @item @emph{Reference}:
2038 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2044 @node acc_get_device_type
2045 @section @code{acc_get_device_type} -- Get type of device accelerator to be used.
2047 @item @emph{Description}
2048 This function returns what device type will be used when executing a
2049 parallel or kernels region.
2051 This function returns @code{acc_device_none} if
2052 @code{acc_get_device_type} is called from
2053 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
2054 callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
2055 Interface}), that is, if the device is currently being initialized.
2058 @multitable @columnfractions .20 .80
2059 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
2062 @item @emph{Fortran}:
2063 @multitable @columnfractions .20 .80
2064 @item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
2065 @item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
2068 @item @emph{Reference}:
2069 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2075 @node acc_set_device_num
2076 @section @code{acc_set_device_num} -- Set device number to use.
2078 @item @emph{Description}
2079 This function will indicate to the runtime which device number,
2080 specified by @var{devicenum}, associated with the specified device
2081 type @var{devicetype}.
2084 @multitable @columnfractions .20 .80
2085 @item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
2088 @item @emph{Fortran}:
2089 @multitable @columnfractions .20 .80
2090 @item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
2091 @item @tab @code{integer devicenum}
2092 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2095 @item @emph{Reference}:
2096 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2102 @node acc_get_device_num
2103 @section @code{acc_get_device_num} -- Get device number to be used.
2105 @item @emph{Description}
2106 This function returns which device number associated with the specified device
2107 type @var{devicetype}, will be used when executing a parallel or kernels
2111 @multitable @columnfractions .20 .80
2112 @item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
2115 @item @emph{Fortran}:
2116 @multitable @columnfractions .20 .80
2117 @item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
2118 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2119 @item @tab @code{integer acc_get_device_num}
2122 @item @emph{Reference}:
2123 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2129 @node acc_get_property
2130 @section @code{acc_get_property} -- Get device property.
2131 @cindex acc_get_property
2132 @cindex acc_get_property_string
2134 @item @emph{Description}
2135 These routines return the value of the specified @var{property} for the
2136 device being queried according to @var{devicenum} and @var{devicetype}.
2137 Integer-valued and string-valued properties are returned by
2138 @code{acc_get_property} and @code{acc_get_property_string} respectively.
2139 The Fortran @code{acc_get_property_string} subroutine returns the string
2140 retrieved in its fourth argument while the remaining entry points are
2141 functions, which pass the return value as their result.
2143 Note for Fortran, only: the OpenACC technical committee corrected and, hence,
2144 modified the interface introduced in OpenACC 2.6. The kind-value parameter
2145 @code{acc_device_property} has been renamed to @code{acc_device_property_kind}
2146 for consistency and the return type of the @code{acc_get_property} function is
2147 now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
2148 The parameter @code{acc_device_property} will continue to be provided,
2149 but might be removed in a future version of GCC.
2152 @multitable @columnfractions .20 .80
2153 @item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
2154 @item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
2157 @item @emph{Fortran}:
2158 @multitable @columnfractions .20 .80
2159 @item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
2160 @item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
2161 @item @tab @code{use ISO_C_Binding, only: c_size_t}
2162 @item @tab @code{integer devicenum}
2163 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2164 @item @tab @code{integer(kind=acc_device_property_kind) property}
2165 @item @tab @code{integer(kind=c_size_t) acc_get_property}
2166 @item @tab @code{character(*) string}
2169 @item @emph{Reference}:
2170 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2176 @node acc_async_test
2177 @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
2179 @item @emph{Description}
2180 This function tests for completion of the asynchronous operation specified
2181 in @var{arg}. In C/C++, a non-zero value will be returned to indicate
2182 the specified asynchronous operation has completed. While Fortran will return
2183 a @code{true}. If the asynchronous operation has not completed, C/C++ returns
2184 a zero and Fortran returns a @code{false}.
2187 @multitable @columnfractions .20 .80
2188 @item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
2191 @item @emph{Fortran}:
2192 @multitable @columnfractions .20 .80
2193 @item @emph{Interface}: @tab @code{function acc_async_test(arg)}
2194 @item @tab @code{integer(kind=acc_handle_kind) arg}
2195 @item @tab @code{logical acc_async_test}
2198 @item @emph{Reference}:
2199 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2205 @node acc_async_test_all
2206 @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
2208 @item @emph{Description}
2209 This function tests for completion of all asynchronous operations.
2210 In C/C++, a non-zero value will be returned to indicate all asynchronous
2211 operations have completed. While Fortran will return a @code{true}. If
2212 any asynchronous operation has not completed, C/C++ returns a zero and
2213 Fortran returns a @code{false}.
2216 @multitable @columnfractions .20 .80
2217 @item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
2220 @item @emph{Fortran}:
2221 @multitable @columnfractions .20 .80
2222 @item @emph{Interface}: @tab @code{function acc_async_test()}
2223 @item @tab @code{logical acc_get_device_num}
2226 @item @emph{Reference}:
2227 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2234 @section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
2236 @item @emph{Description}
2237 This function waits for completion of the asynchronous operation
2238 specified in @var{arg}.
2241 @multitable @columnfractions .20 .80
2242 @item @emph{Prototype}: @tab @code{acc_wait(arg);}
2243 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
2246 @item @emph{Fortran}:
2247 @multitable @columnfractions .20 .80
2248 @item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
2249 @item @tab @code{integer(acc_handle_kind) arg}
2250 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
2251 @item @tab @code{integer(acc_handle_kind) arg}
2254 @item @emph{Reference}:
2255 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2262 @section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
2264 @item @emph{Description}
2265 This function waits for the completion of all asynchronous operations.
2268 @multitable @columnfractions .20 .80
2269 @item @emph{Prototype}: @tab @code{acc_wait_all(void);}
2270 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
2273 @item @emph{Fortran}:
2274 @multitable @columnfractions .20 .80
2275 @item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
2276 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
2279 @item @emph{Reference}:
2280 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2286 @node acc_wait_all_async
2287 @section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
2289 @item @emph{Description}
2290 This function enqueues a wait operation on the queue @var{async} for any
2291 and all asynchronous operations that have been previously enqueued on
2295 @multitable @columnfractions .20 .80
2296 @item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
2299 @item @emph{Fortran}:
2300 @multitable @columnfractions .20 .80
2301 @item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
2302 @item @tab @code{integer(acc_handle_kind) async}
2305 @item @emph{Reference}:
2306 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2312 @node acc_wait_async
2313 @section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
2315 @item @emph{Description}
2316 This function enqueues a wait operation on queue @var{async} for any and all
2317 asynchronous operations enqueued on queue @var{arg}.
2320 @multitable @columnfractions .20 .80
2321 @item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
2324 @item @emph{Fortran}:
2325 @multitable @columnfractions .20 .80
2326 @item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
2327 @item @tab @code{integer(acc_handle_kind) arg, async}
2330 @item @emph{Reference}:
2331 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2338 @section @code{acc_init} -- Initialize runtime for a specific device type.
2340 @item @emph{Description}
2341 This function initializes the runtime for the device type specified in
2345 @multitable @columnfractions .20 .80
2346 @item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
2349 @item @emph{Fortran}:
2350 @multitable @columnfractions .20 .80
2351 @item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
2352 @item @tab @code{integer(acc_device_kind) devicetype}
2355 @item @emph{Reference}:
2356 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2363 @section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
2365 @item @emph{Description}
2366 This function shuts down the runtime for the device type specified in
2370 @multitable @columnfractions .20 .80
2371 @item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
2374 @item @emph{Fortran}:
2375 @multitable @columnfractions .20 .80
2376 @item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
2377 @item @tab @code{integer(acc_device_kind) devicetype}
2380 @item @emph{Reference}:
2381 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2388 @section @code{acc_on_device} -- Whether executing on a particular device
2390 @item @emph{Description}:
2391 This function returns whether the program is executing on a particular
2392 device specified in @var{devicetype}. In C/C++ a non-zero value is
2393 returned to indicate the device is executing on the specified device type.
2394 In Fortran, @code{true} will be returned. If the program is not executing
2395 on the specified device type C/C++ will return a zero, while Fortran will
2396 return @code{false}.
2399 @multitable @columnfractions .20 .80
2400 @item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
2403 @item @emph{Fortran}:
2404 @multitable @columnfractions .20 .80
2405 @item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
2406 @item @tab @code{integer(acc_device_kind) devicetype}
2407 @item @tab @code{logical acc_on_device}
2411 @item @emph{Reference}:
2412 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2419 @section @code{acc_malloc} -- Allocate device memory.
2421 @item @emph{Description}
2422 This function allocates @var{len} bytes of device memory. It returns
2423 the device address of the allocated memory.
2426 @multitable @columnfractions .20 .80
2427 @item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
2430 @item @emph{Reference}:
2431 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2438 @section @code{acc_free} -- Free device memory.
2440 @item @emph{Description}
2441 Free previously allocated device memory at the device address @code{a}.
2444 @multitable @columnfractions .20 .80
2445 @item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
2448 @item @emph{Reference}:
2449 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2456 @section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
2458 @item @emph{Description}
2459 In C/C++, this function allocates @var{len} bytes of device memory
2460 and maps it to the specified host address in @var{a}. The device
2461 address of the newly allocated device memory is returned.
2463 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2464 a contiguous array section. The second form @var{a} specifies a
2465 variable or array element and @var{len} specifies the length in bytes.
2468 @multitable @columnfractions .20 .80
2469 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
2470 @item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
2473 @item @emph{Fortran}:
2474 @multitable @columnfractions .20 .80
2475 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
2476 @item @tab @code{type, dimension(:[,:]...) :: a}
2477 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
2478 @item @tab @code{type, dimension(:[,:]...) :: a}
2479 @item @tab @code{integer len}
2480 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
2481 @item @tab @code{type, dimension(:[,:]...) :: a}
2482 @item @tab @code{integer(acc_handle_kind) :: async}
2483 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
2484 @item @tab @code{type, dimension(:[,:]...) :: a}
2485 @item @tab @code{integer len}
2486 @item @tab @code{integer(acc_handle_kind) :: async}
2489 @item @emph{Reference}:
2490 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2496 @node acc_present_or_copyin
2497 @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
2499 @item @emph{Description}
2500 This function tests if the host data specified by @var{a} and of length
2501 @var{len} is present or not. If it is not present, then device memory
2502 will be allocated and the host memory copied. The device address of
2503 the newly allocated device memory is returned.
2505 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2506 a contiguous array section. The second form @var{a} specifies a variable or
2507 array element and @var{len} specifies the length in bytes.
2509 Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
2510 backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
2513 @multitable @columnfractions .20 .80
2514 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
2515 @item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
2518 @item @emph{Fortran}:
2519 @multitable @columnfractions .20 .80
2520 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
2521 @item @tab @code{type, dimension(:[,:]...) :: a}
2522 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
2523 @item @tab @code{type, dimension(:[,:]...) :: a}
2524 @item @tab @code{integer len}
2525 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
2526 @item @tab @code{type, dimension(:[,:]...) :: a}
2527 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
2528 @item @tab @code{type, dimension(:[,:]...) :: a}
2529 @item @tab @code{integer len}
2532 @item @emph{Reference}:
2533 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2540 @section @code{acc_create} -- Allocate device memory and map it to host memory.
2542 @item @emph{Description}
2543 This function allocates device memory and maps it to host memory specified
2544 by the host address @var{a} with a length of @var{len} bytes. In C/C++,
2545 the function returns the device address of the allocated device memory.
2547 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2548 a contiguous array section. The second form @var{a} specifies a variable or
2549 array element and @var{len} specifies the length in bytes.
2552 @multitable @columnfractions .20 .80
2553 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
2554 @item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
2557 @item @emph{Fortran}:
2558 @multitable @columnfractions .20 .80
2559 @item @emph{Interface}: @tab @code{subroutine acc_create(a)}
2560 @item @tab @code{type, dimension(:[,:]...) :: a}
2561 @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
2562 @item @tab @code{type, dimension(:[,:]...) :: a}
2563 @item @tab @code{integer len}
2564 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
2565 @item @tab @code{type, dimension(:[,:]...) :: a}
2566 @item @tab @code{integer(acc_handle_kind) :: async}
2567 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
2568 @item @tab @code{type, dimension(:[,:]...) :: a}
2569 @item @tab @code{integer len}
2570 @item @tab @code{integer(acc_handle_kind) :: async}
2573 @item @emph{Reference}:
2574 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2580 @node acc_present_or_create
2581 @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
2583 @item @emph{Description}
2584 This function tests if the host data specified by @var{a} and of length
2585 @var{len} is present or not. If it is not present, then device memory
2586 will be allocated and mapped to host memory. In C/C++, the device address
2587 of the newly allocated device memory is returned.
2589 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2590 a contiguous array section. The second form @var{a} specifies a variable or
2591 array element and @var{len} specifies the length in bytes.
2593 Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
2594 backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
2597 @multitable @columnfractions .20 .80
2598 @item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
2599 @item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
2602 @item @emph{Fortran}:
2603 @multitable @columnfractions .20 .80
2604 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
2605 @item @tab @code{type, dimension(:[,:]...) :: a}
2606 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
2607 @item @tab @code{type, dimension(:[,:]...) :: a}
2608 @item @tab @code{integer len}
2609 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
2610 @item @tab @code{type, dimension(:[,:]...) :: a}
2611 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
2612 @item @tab @code{type, dimension(:[,:]...) :: a}
2613 @item @tab @code{integer len}
2616 @item @emph{Reference}:
2617 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2624 @section @code{acc_copyout} -- Copy device memory to host memory.
2626 @item @emph{Description}
2627 This function copies mapped device memory to host memory which is specified
2628 by host address @var{a} for a length @var{len} bytes in C/C++.
2630 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2631 a contiguous array section. The second form @var{a} specifies a variable or
2632 array element and @var{len} specifies the length in bytes.
2635 @multitable @columnfractions .20 .80
2636 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
2637 @item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
2638 @item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
2639 @item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
2642 @item @emph{Fortran}:
2643 @multitable @columnfractions .20 .80
2644 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
2645 @item @tab @code{type, dimension(:[,:]...) :: a}
2646 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
2647 @item @tab @code{type, dimension(:[,:]...) :: a}
2648 @item @tab @code{integer len}
2649 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
2650 @item @tab @code{type, dimension(:[,:]...) :: a}
2651 @item @tab @code{integer(acc_handle_kind) :: async}
2652 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
2653 @item @tab @code{type, dimension(:[,:]...) :: a}
2654 @item @tab @code{integer len}
2655 @item @tab @code{integer(acc_handle_kind) :: async}
2656 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
2657 @item @tab @code{type, dimension(:[,:]...) :: a}
2658 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
2659 @item @tab @code{type, dimension(:[,:]...) :: a}
2660 @item @tab @code{integer len}
2661 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
2662 @item @tab @code{type, dimension(:[,:]...) :: a}
2663 @item @tab @code{integer(acc_handle_kind) :: async}
2664 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
2665 @item @tab @code{type, dimension(:[,:]...) :: a}
2666 @item @tab @code{integer len}
2667 @item @tab @code{integer(acc_handle_kind) :: async}
2670 @item @emph{Reference}:
2671 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2678 @section @code{acc_delete} -- Free device memory.
2680 @item @emph{Description}
2681 This function frees previously allocated device memory specified by
2682 the device address @var{a} and the length of @var{len} bytes.
2684 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2685 a contiguous array section. The second form @var{a} specifies a variable or
2686 array element and @var{len} specifies the length in bytes.
2689 @multitable @columnfractions .20 .80
2690 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
2691 @item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
2692 @item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
2693 @item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
2696 @item @emph{Fortran}:
2697 @multitable @columnfractions .20 .80
2698 @item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
2699 @item @tab @code{type, dimension(:[,:]...) :: a}
2700 @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
2701 @item @tab @code{type, dimension(:[,:]...) :: a}
2702 @item @tab @code{integer len}
2703 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
2704 @item @tab @code{type, dimension(:[,:]...) :: a}
2705 @item @tab @code{integer(acc_handle_kind) :: async}
2706 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
2707 @item @tab @code{type, dimension(:[,:]...) :: a}
2708 @item @tab @code{integer len}
2709 @item @tab @code{integer(acc_handle_kind) :: async}
2710 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
2711 @item @tab @code{type, dimension(:[,:]...) :: a}
2712 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
2713 @item @tab @code{type, dimension(:[,:]...) :: a}
2714 @item @tab @code{integer len}
2715 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
2716 @item @tab @code{type, dimension(:[,:]...) :: a}
2717 @item @tab @code{integer(acc_handle_kind) :: async}
2718 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
2719 @item @tab @code{type, dimension(:[,:]...) :: a}
2720 @item @tab @code{integer len}
2721 @item @tab @code{integer(acc_handle_kind) :: async}
2724 @item @emph{Reference}:
2725 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2731 @node acc_update_device
2732 @section @code{acc_update_device} -- Update device memory from mapped host memory.
2734 @item @emph{Description}
2735 This function updates the device copy from the previously mapped host memory.
2736 The host memory is specified with the host address @var{a} and a length of
2739 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2740 a contiguous array section. The second form @var{a} specifies a variable or
2741 array element and @var{len} specifies the length in bytes.
2744 @multitable @columnfractions .20 .80
2745 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
2746 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
2749 @item @emph{Fortran}:
2750 @multitable @columnfractions .20 .80
2751 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
2752 @item @tab @code{type, dimension(:[,:]...) :: a}
2753 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
2754 @item @tab @code{type, dimension(:[,:]...) :: a}
2755 @item @tab @code{integer len}
2756 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
2757 @item @tab @code{type, dimension(:[,:]...) :: a}
2758 @item @tab @code{integer(acc_handle_kind) :: async}
2759 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
2760 @item @tab @code{type, dimension(:[,:]...) :: a}
2761 @item @tab @code{integer len}
2762 @item @tab @code{integer(acc_handle_kind) :: async}
2765 @item @emph{Reference}:
2766 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2772 @node acc_update_self
2773 @section @code{acc_update_self} -- Update host memory from mapped device memory.
2775 @item @emph{Description}
2776 This function updates the host copy from the previously mapped device memory.
2777 The host memory is specified with the host address @var{a} and a length of
2780 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2781 a contiguous array section. The second form @var{a} specifies a variable or
2782 array element and @var{len} specifies the length in bytes.
2785 @multitable @columnfractions .20 .80
2786 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
2787 @item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
2790 @item @emph{Fortran}:
2791 @multitable @columnfractions .20 .80
2792 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
2793 @item @tab @code{type, dimension(:[,:]...) :: a}
2794 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
2795 @item @tab @code{type, dimension(:[,:]...) :: a}
2796 @item @tab @code{integer len}
2797 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
2798 @item @tab @code{type, dimension(:[,:]...) :: a}
2799 @item @tab @code{integer(acc_handle_kind) :: async}
2800 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
2801 @item @tab @code{type, dimension(:[,:]...) :: a}
2802 @item @tab @code{integer len}
2803 @item @tab @code{integer(acc_handle_kind) :: async}
2806 @item @emph{Reference}:
2807 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2814 @section @code{acc_map_data} -- Map previously allocated device memory to host memory.
2816 @item @emph{Description}
2817 This function maps previously allocated device and host memory. The device
2818 memory is specified with the device address @var{d}. The host memory is
2819 specified with the host address @var{h} and a length of @var{len}.
2822 @multitable @columnfractions .20 .80
2823 @item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
2826 @item @emph{Reference}:
2827 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2833 @node acc_unmap_data
2834 @section @code{acc_unmap_data} -- Unmap device memory from host memory.
2836 @item @emph{Description}
2837 This function unmaps previously mapped device and host memory. The latter
2838 specified by @var{h}.
2841 @multitable @columnfractions .20 .80
2842 @item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
2845 @item @emph{Reference}:
2846 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2853 @section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
2855 @item @emph{Description}
2856 This function returns the device address that has been mapped to the
2857 host address specified by @var{h}.
2860 @multitable @columnfractions .20 .80
2861 @item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
2864 @item @emph{Reference}:
2865 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2872 @section @code{acc_hostptr} -- Get host pointer associated with specific device address.
2874 @item @emph{Description}
2875 This function returns the host address that has been mapped to the
2876 device address specified by @var{d}.
2879 @multitable @columnfractions .20 .80
2880 @item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
2883 @item @emph{Reference}:
2884 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2890 @node acc_is_present
2891 @section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
2893 @item @emph{Description}
2894 This function indicates whether the specified host address in @var{a} and a
2895 length of @var{len} bytes is present on the device. In C/C++, a non-zero
2896 value is returned to indicate the presence of the mapped memory on the
2897 device. A zero is returned to indicate the memory is not mapped on the
2900 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2901 a contiguous array section. The second form @var{a} specifies a variable or
2902 array element and @var{len} specifies the length in bytes. If the host
2903 memory is mapped to device memory, then a @code{true} is returned. Otherwise,
2904 a @code{false} is return to indicate the mapped memory is not present.
2907 @multitable @columnfractions .20 .80
2908 @item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
2911 @item @emph{Fortran}:
2912 @multitable @columnfractions .20 .80
2913 @item @emph{Interface}: @tab @code{function acc_is_present(a)}
2914 @item @tab @code{type, dimension(:[,:]...) :: a}
2915 @item @tab @code{logical acc_is_present}
2916 @item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
2917 @item @tab @code{type, dimension(:[,:]...) :: a}
2918 @item @tab @code{integer len}
2919 @item @tab @code{logical acc_is_present}
2922 @item @emph{Reference}:
2923 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2929 @node acc_memcpy_to_device
2930 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
2932 @item @emph{Description}
2933 This function copies host memory specified by host address of @var{src} to
2934 device memory specified by the device address @var{dest} for a length of
2938 @multitable @columnfractions .20 .80
2939 @item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
2942 @item @emph{Reference}:
2943 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2949 @node acc_memcpy_from_device
2950 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
2952 @item @emph{Description}
2953 This function copies host memory specified by host address of @var{src} from
2954 device memory specified by the device address @var{dest} for a length of
2958 @multitable @columnfractions .20 .80
2959 @item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
2962 @item @emph{Reference}:
2963 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2970 @section @code{acc_attach} -- Let device pointer point to device-pointer target.
2972 @item @emph{Description}
2973 This function updates a pointer on the device from pointing to a host-pointer
2974 address to pointing to the corresponding device data.
2977 @multitable @columnfractions .20 .80
2978 @item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
2979 @item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
2982 @item @emph{Reference}:
2983 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2990 @section @code{acc_detach} -- Let device pointer point to host-pointer target.
2992 @item @emph{Description}
2993 This function updates a pointer on the device from pointing to a device-pointer
2994 address to pointing to the corresponding host data.
2997 @multitable @columnfractions .20 .80
2998 @item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
2999 @item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
3000 @item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
3001 @item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
3004 @item @emph{Reference}:
3005 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3011 @node acc_get_current_cuda_device
3012 @section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
3014 @item @emph{Description}
3015 This function returns the CUDA device handle. This handle is the same
3016 as used by the CUDA Runtime or Driver API's.
3019 @multitable @columnfractions .20 .80
3020 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
3023 @item @emph{Reference}:
3024 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3030 @node acc_get_current_cuda_context
3031 @section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
3033 @item @emph{Description}
3034 This function returns the CUDA context handle. This handle is the same
3035 as used by the CUDA Runtime or Driver API's.
3038 @multitable @columnfractions .20 .80
3039 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
3042 @item @emph{Reference}:
3043 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3049 @node acc_get_cuda_stream
3050 @section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
3052 @item @emph{Description}
3053 This function returns the CUDA stream handle for the queue @var{async}.
3054 This handle is the same as used by the CUDA Runtime or Driver API's.
3057 @multitable @columnfractions .20 .80
3058 @item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
3061 @item @emph{Reference}:
3062 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3068 @node acc_set_cuda_stream
3069 @section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
3071 @item @emph{Description}
3072 This function associates the stream handle specified by @var{stream} with
3073 the queue @var{async}.
3075 This cannot be used to change the stream handle associated with
3076 @code{acc_async_sync}.
3078 The return value is not specified.
3081 @multitable @columnfractions .20 .80
3082 @item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
3085 @item @emph{Reference}:
3086 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3092 @node acc_prof_register
3093 @section @code{acc_prof_register} -- Register callbacks.
3095 @item @emph{Description}:
3096 This function registers callbacks.
3099 @multitable @columnfractions .20 .80
3100 @item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
3103 @item @emph{See also}:
3104 @ref{OpenACC Profiling Interface}
3106 @item @emph{Reference}:
3107 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3113 @node acc_prof_unregister
3114 @section @code{acc_prof_unregister} -- Unregister callbacks.
3116 @item @emph{Description}:
3117 This function unregisters callbacks.
3120 @multitable @columnfractions .20 .80
3121 @item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
3124 @item @emph{See also}:
3125 @ref{OpenACC Profiling Interface}
3127 @item @emph{Reference}:
3128 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3134 @node acc_prof_lookup
3135 @section @code{acc_prof_lookup} -- Obtain inquiry functions.
3137 @item @emph{Description}:
3138 Function to obtain inquiry functions.
3141 @multitable @columnfractions .20 .80
3142 @item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
3145 @item @emph{See also}:
3146 @ref{OpenACC Profiling Interface}
3148 @item @emph{Reference}:
3149 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3155 @node acc_register_library
3156 @section @code{acc_register_library} -- Library registration.
3158 @item @emph{Description}:
3159 Function for library registration.
3162 @multitable @columnfractions .20 .80
3163 @item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
3166 @item @emph{See also}:
3167 @ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
3169 @item @emph{Reference}:
3170 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3176 @c ---------------------------------------------------------------------
3177 @c OpenACC Environment Variables
3178 @c ---------------------------------------------------------------------
3180 @node OpenACC Environment Variables
3181 @chapter OpenACC Environment Variables
3183 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
3184 are defined by section 4 of the OpenACC specification in version 2.0.
3185 The variable @env{ACC_PROFLIB}
3186 is defined by section 4 of the OpenACC specification in version 2.6.
3187 The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
3198 @node ACC_DEVICE_TYPE
3199 @section @code{ACC_DEVICE_TYPE}
3201 @item @emph{Reference}:
3202 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3208 @node ACC_DEVICE_NUM
3209 @section @code{ACC_DEVICE_NUM}
3211 @item @emph{Reference}:
3212 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3219 @section @code{ACC_PROFLIB}
3221 @item @emph{See also}:
3222 @ref{acc_register_library}, @ref{OpenACC Profiling Interface}
3224 @item @emph{Reference}:
3225 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3231 @node GCC_ACC_NOTIFY
3232 @section @code{GCC_ACC_NOTIFY}
3234 @item @emph{Description}:
3235 Print debug information pertaining to the accelerator.
3240 @c ---------------------------------------------------------------------
3241 @c CUDA Streams Usage
3242 @c ---------------------------------------------------------------------
3244 @node CUDA Streams Usage
3245 @chapter CUDA Streams Usage
3247 This applies to the @code{nvptx} plugin only.
3249 The library provides elements that perform asynchronous movement of
3250 data and asynchronous operation of computing constructs. This
3251 asynchronous functionality is implemented by making use of CUDA
3252 streams@footnote{See "Stream Management" in "CUDA Driver API",
3253 TRM-06703-001, Version 5.5, for additional information}.
3255 The primary means by that the asynchronous functionality is accessed
3256 is through the use of those OpenACC directives which make use of the
3257 @code{async} and @code{wait} clauses. When the @code{async} clause is
3258 first used with a directive, it creates a CUDA stream. If an
3259 @code{async-argument} is used with the @code{async} clause, then the
3260 stream is associated with the specified @code{async-argument}.
3262 Following the creation of an association between a CUDA stream and the
3263 @code{async-argument} of an @code{async} clause, both the @code{wait}
3264 clause and the @code{wait} directive can be used. When either the
3265 clause or directive is used after stream creation, it creates a
3266 rendezvous point whereby execution waits until all operations
3267 associated with the @code{async-argument}, that is, stream, have
3270 Normally, the management of the streams that are created as a result of
3271 using the @code{async} clause, is done without any intervention by the
3272 caller. This implies the association between the @code{async-argument}
3273 and the CUDA stream will be maintained for the lifetime of the program.
3274 However, this association can be changed through the use of the library
3275 function @code{acc_set_cuda_stream}. When the function
3276 @code{acc_set_cuda_stream} is called, the CUDA stream that was
3277 originally associated with the @code{async} clause will be destroyed.
3278 Caution should be taken when changing the association as subsequent
3279 references to the @code{async-argument} refer to a different
3284 @c ---------------------------------------------------------------------
3285 @c OpenACC Library Interoperability
3286 @c ---------------------------------------------------------------------
3288 @node OpenACC Library Interoperability
3289 @chapter OpenACC Library Interoperability
3291 @section Introduction
3293 The OpenACC library uses the CUDA Driver API, and may interact with
3294 programs that use the Runtime library directly, or another library
3295 based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
3296 "Interactions with the CUDA Driver API" in
3297 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
3298 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
3299 for additional information on library interoperability.}.
3300 This chapter describes the use cases and what changes are
3301 required in order to use both the OpenACC library and the CUBLAS and Runtime
3302 libraries within a program.
3304 @section First invocation: NVIDIA CUBLAS library API
3306 In this first use case (see below), a function in the CUBLAS library is called
3307 prior to any of the functions in the OpenACC library. More specifically, the
3308 function @code{cublasCreate()}.
3310 When invoked, the function initializes the library and allocates the
3311 hardware resources on the host and the device on behalf of the caller. Once
3312 the initialization and allocation has completed, a handle is returned to the
3313 caller. The OpenACC library also requires initialization and allocation of
3314 hardware resources. Since the CUBLAS library has already allocated the
3315 hardware resources for the device, all that is left to do is to initialize
3316 the OpenACC library and acquire the hardware resources on the host.
3318 Prior to calling the OpenACC function that initializes the library and
3319 allocate the host hardware resources, you need to acquire the device number
3320 that was allocated during the call to @code{cublasCreate()}. The invoking of the
3321 runtime library function @code{cudaGetDevice()} accomplishes this. Once
3322 acquired, the device number is passed along with the device type as
3323 parameters to the OpenACC library function @code{acc_set_device_num()}.
3325 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
3326 library uses the context that was created during the call to
3327 @code{cublasCreate()}. In other words, both libraries will be sharing the
3331 /* Create the handle */
3332 s = cublasCreate(&h);
3333 if (s != CUBLAS_STATUS_SUCCESS)
3335 fprintf(stderr, "cublasCreate failed %d\n", s);
3339 /* Get the device number */
3340 e = cudaGetDevice(&dev);
3341 if (e != cudaSuccess)
3343 fprintf(stderr, "cudaGetDevice failed %d\n", e);
3347 /* Initialize OpenACC library and use device 'dev' */
3348 acc_set_device_num(dev, acc_device_nvidia);
3353 @section First invocation: OpenACC library API
3355 In this second use case (see below), a function in the OpenACC library is
3356 called prior to any of the functions in the CUBLAS library. More specificially,
3357 the function @code{acc_set_device_num()}.
3359 In the use case presented here, the function @code{acc_set_device_num()}
3360 is used to both initialize the OpenACC library and allocate the hardware
3361 resources on the host and the device. In the call to the function, the
3362 call parameters specify which device to use and what device
3363 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
3364 is but one method to initialize the OpenACC library and allocate the
3365 appropriate hardware resources. Other methods are available through the
3366 use of environment variables and these will be discussed in the next section.
3368 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
3369 functions can be called as seen with multiple calls being made to
3370 @code{acc_copyin()}. In addition, calls can be made to functions in the
3371 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
3372 subsequent to the calls to @code{acc_copyin()}.
3373 As seen in the previous use case, a call to @code{cublasCreate()}
3374 initializes the CUBLAS library and allocates the hardware resources on the
3375 host and the device. However, since the device has already been allocated,
3376 @code{cublasCreate()} will only initialize the CUBLAS library and allocate
3377 the appropriate hardware resources on the host. The context that was created
3378 as part of the OpenACC initialization is shared with the CUBLAS library,
3379 similarly to the first use case.
3384 acc_set_device_num(dev, acc_device_nvidia);
3386 /* Copy the first set to the device */
3387 d_X = acc_copyin(&h_X[0], N * sizeof (float));
3390 fprintf(stderr, "copyin error h_X\n");
3394 /* Copy the second set to the device */
3395 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
3398 fprintf(stderr, "copyin error h_Y1\n");
3402 /* Create the handle */
3403 s = cublasCreate(&h);
3404 if (s != CUBLAS_STATUS_SUCCESS)
3406 fprintf(stderr, "cublasCreate failed %d\n", s);
3410 /* Perform saxpy using CUBLAS library function */
3411 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
3412 if (s != CUBLAS_STATUS_SUCCESS)
3414 fprintf(stderr, "cublasSaxpy failed %d\n", s);
3418 /* Copy the results from the device */
3419 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
3424 @section OpenACC library and environment variables
3426 There are two environment variables associated with the OpenACC library
3427 that may be used to control the device type and device number:
3428 @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
3429 environment variables can be used as an alternative to calling
3430 @code{acc_set_device_num()}. As seen in the second use case, the device
3431 type and device number were specified using @code{acc_set_device_num()}.
3432 If however, the aforementioned environment variables were set, then the
3433 call to @code{acc_set_device_num()} would not be required.
3436 The use of the environment variables is only relevant when an OpenACC function
3437 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
3438 is called prior to a call to an OpenACC function, then you must call
3439 @code{acc_set_device_num()}@footnote{More complete information
3440 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
3441 sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
3442 Application Programming Interface”, Version 2.6.}
3446 @c ---------------------------------------------------------------------
3447 @c OpenACC Profiling Interface
3448 @c ---------------------------------------------------------------------
3450 @node OpenACC Profiling Interface
3451 @chapter OpenACC Profiling Interface
3453 @section Implementation Status and Implementation-Defined Behavior
3455 We're implementing the OpenACC Profiling Interface as defined by the
3456 OpenACC 2.6 specification. We're clarifying some aspects here as
3457 @emph{implementation-defined behavior}, while they're still under
3458 discussion within the OpenACC Technical Committee.
3460 This implementation is tuned to keep the performance impact as low as
3461 possible for the (very common) case that the Profiling Interface is
3462 not enabled. This is relevant, as the Profiling Interface affects all
3463 the @emph{hot} code paths (in the target code, not in the offloaded
3464 code). Users of the OpenACC Profiling Interface can be expected to
3465 understand that performance will be impacted to some degree once the
3466 Profiling Interface has gotten enabled: for example, because of the
3467 @emph{runtime} (libgomp) calling into a third-party @emph{library} for
3468 every event that has been registered.
3470 We're not yet accounting for the fact that @cite{OpenACC events may
3471 occur during event processing}.
3472 We just handle one case specially, as required by CUDA 9.0
3473 @command{nvprof}, that @code{acc_get_device_type}
3474 (@ref{acc_get_device_type})) may be called from
3475 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3478 We're not yet implementing initialization via a
3479 @code{acc_register_library} function that is either statically linked
3480 in, or dynamically via @env{LD_PRELOAD}.
3481 Initialization via @code{acc_register_library} functions dynamically
3482 loaded via the @env{ACC_PROFLIB} environment variable does work, as
3483 does directly calling @code{acc_prof_register},
3484 @code{acc_prof_unregister}, @code{acc_prof_lookup}.
3486 As currently there are no inquiry functions defined, calls to
3487 @code{acc_prof_lookup} will always return @code{NULL}.
3489 There aren't separate @emph{start}, @emph{stop} events defined for the
3490 event types @code{acc_ev_create}, @code{acc_ev_delete},
3491 @code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
3492 should be triggered before or after the actual device-specific call is
3493 made. We trigger them after.
3495 Remarks about data provided to callbacks:
3499 @item @code{acc_prof_info.event_type}
3500 It's not clear if for @emph{nested} event callbacks (for example,
3501 @code{acc_ev_enqueue_launch_start} as part of a parent compute
3502 construct), this should be set for the nested event
3503 (@code{acc_ev_enqueue_launch_start}), or if the value of the parent
3504 construct should remain (@code{acc_ev_compute_construct_start}). In
3505 this implementation, the value will generally correspond to the
3506 innermost nested event type.
3508 @item @code{acc_prof_info.device_type}
3512 For @code{acc_ev_compute_construct_start}, and in presence of an
3513 @code{if} clause with @emph{false} argument, this will still refer to
3514 the offloading device type.
3515 It's not clear if that's the expected behavior.
3518 Complementary to the item before, for
3519 @code{acc_ev_compute_construct_end}, this is set to
3520 @code{acc_device_host} in presence of an @code{if} clause with
3521 @emph{false} argument.
3522 It's not clear if that's the expected behavior.
3526 @item @code{acc_prof_info.thread_id}
3527 Always @code{-1}; not yet implemented.
3529 @item @code{acc_prof_info.async}
3533 Not yet implemented correctly for
3534 @code{acc_ev_compute_construct_start}.
3537 In a compute construct, for host-fallback
3538 execution/@code{acc_device_host} it will always be
3539 @code{acc_async_sync}.
3540 It's not clear if that's the expected behavior.
3543 For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
3544 it will always be @code{acc_async_sync}.
3545 It's not clear if that's the expected behavior.
3549 @item @code{acc_prof_info.async_queue}
3550 There is no @cite{limited number of asynchronous queues} in libgomp.
3551 This will always have the same value as @code{acc_prof_info.async}.
3553 @item @code{acc_prof_info.src_file}
3554 Always @code{NULL}; not yet implemented.
3556 @item @code{acc_prof_info.func_name}
3557 Always @code{NULL}; not yet implemented.
3559 @item @code{acc_prof_info.line_no}
3560 Always @code{-1}; not yet implemented.
3562 @item @code{acc_prof_info.end_line_no}
3563 Always @code{-1}; not yet implemented.
3565 @item @code{acc_prof_info.func_line_no}
3566 Always @code{-1}; not yet implemented.
3568 @item @code{acc_prof_info.func_end_line_no}
3569 Always @code{-1}; not yet implemented.
3571 @item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
3572 Relating to @code{acc_prof_info.event_type} discussed above, in this
3573 implementation, this will always be the same value as
3574 @code{acc_prof_info.event_type}.
3576 @item @code{acc_event_info.*.parent_construct}
3580 Will be @code{acc_construct_parallel} for all OpenACC compute
3581 constructs as well as many OpenACC Runtime API calls; should be the
3582 one matching the actual construct, or
3583 @code{acc_construct_runtime_api}, respectively.
3586 Will be @code{acc_construct_enter_data} or
3587 @code{acc_construct_exit_data} when processing variable mappings
3588 specified in OpenACC @emph{declare} directives; should be
3589 @code{acc_construct_declare}.
3592 For implicit @code{acc_ev_device_init_start},
3593 @code{acc_ev_device_init_end}, and explicit as well as implicit
3594 @code{acc_ev_alloc}, @code{acc_ev_free},
3595 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
3596 @code{acc_ev_enqueue_download_start}, and
3597 @code{acc_ev_enqueue_download_end}, will be
3598 @code{acc_construct_parallel}; should reflect the real parent
3603 @item @code{acc_event_info.*.implicit}
3604 For @code{acc_ev_alloc}, @code{acc_ev_free},
3605 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
3606 @code{acc_ev_enqueue_download_start}, and
3607 @code{acc_ev_enqueue_download_end}, this currently will be @code{1}
3608 also for explicit usage.
3610 @item @code{acc_event_info.data_event.var_name}
3611 Always @code{NULL}; not yet implemented.
3613 @item @code{acc_event_info.data_event.host_ptr}
3614 For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
3617 @item @code{typedef union acc_api_info}
3618 @dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
3619 Information}. This should obviously be @code{typedef @emph{struct}
3622 @item @code{acc_api_info.device_api}
3623 Possibly not yet implemented correctly for
3624 @code{acc_ev_compute_construct_start},
3625 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
3626 will always be @code{acc_device_api_none} for these event types.
3627 For @code{acc_ev_enter_data_start}, it will be
3628 @code{acc_device_api_none} in some cases.
3630 @item @code{acc_api_info.device_type}
3631 Always the same as @code{acc_prof_info.device_type}.
3633 @item @code{acc_api_info.vendor}
3634 Always @code{-1}; not yet implemented.
3636 @item @code{acc_api_info.device_handle}
3637 Always @code{NULL}; not yet implemented.
3639 @item @code{acc_api_info.context_handle}
3640 Always @code{NULL}; not yet implemented.
3642 @item @code{acc_api_info.async_handle}
3643 Always @code{NULL}; not yet implemented.
3647 Remarks about certain event types:
3651 @item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3655 @c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
3656 @c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
3657 @c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
3658 Whan a compute construct triggers implicit
3659 @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
3660 events, they currently aren't @emph{nested within} the corresponding
3661 @code{acc_ev_compute_construct_start} and
3662 @code{acc_ev_compute_construct_end}, but they're currently observed
3663 @emph{before} @code{acc_ev_compute_construct_start}.
3664 It's not clear what to do: the standard asks us provide a lot of
3665 details to the @code{acc_ev_compute_construct_start} callback, without
3666 (implicitly) initializing a device before?
3669 Callbacks for these event types will not be invoked for calls to the
3670 @code{acc_set_device_type} and @code{acc_set_device_num} functions.
3671 It's not clear if they should be.
3675 @item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
3679 Callbacks for these event types will also be invoked for OpenACC
3680 @emph{host_data} constructs.
3681 It's not clear if they should be.
3684 Callbacks for these event types will also be invoked when processing
3685 variable mappings specified in OpenACC @emph{declare} directives.
3686 It's not clear if they should be.
3692 Callbacks for the following event types will be invoked, but dispatch
3693 and information provided therein has not yet been thoroughly reviewed:
3696 @item @code{acc_ev_alloc}
3697 @item @code{acc_ev_free}
3698 @item @code{acc_ev_update_start}, @code{acc_ev_update_end}
3699 @item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
3700 @item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
3703 During device initialization, and finalization, respectively,
3704 callbacks for the following event types will not yet be invoked:
3707 @item @code{acc_ev_alloc}
3708 @item @code{acc_ev_free}
3711 Callbacks for the following event types have not yet been implemented,
3712 so currently won't be invoked:
3715 @item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
3716 @item @code{acc_ev_runtime_shutdown}
3717 @item @code{acc_ev_create}, @code{acc_ev_delete}
3718 @item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
3721 For the following runtime library functions, not all expected
3722 callbacks will be invoked (mostly concerning implicit device
3726 @item @code{acc_get_num_devices}
3727 @item @code{acc_set_device_type}
3728 @item @code{acc_get_device_type}
3729 @item @code{acc_set_device_num}
3730 @item @code{acc_get_device_num}
3731 @item @code{acc_init}
3732 @item @code{acc_shutdown}
3735 Aside from implicit device initialization, for the following runtime
3736 library functions, no callbacks will be invoked for shared-memory
3737 offloading devices (it's not clear if they should be):
3740 @item @code{acc_malloc}
3741 @item @code{acc_free}
3742 @item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
3743 @item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
3744 @item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
3745 @item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
3746 @item @code{acc_update_device}, @code{acc_update_device_async}
3747 @item @code{acc_update_self}, @code{acc_update_self_async}
3748 @item @code{acc_map_data}, @code{acc_unmap_data}
3749 @item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
3750 @item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
3755 @c ---------------------------------------------------------------------
3757 @c ---------------------------------------------------------------------
3759 @node The libgomp ABI
3760 @chapter The libgomp ABI
3762 The following sections present notes on the external ABI as
3763 presented by libgomp. Only maintainers should need them.
3766 * Implementing MASTER construct::
3767 * Implementing CRITICAL construct::
3768 * Implementing ATOMIC construct::
3769 * Implementing FLUSH construct::
3770 * Implementing BARRIER construct::
3771 * Implementing THREADPRIVATE construct::
3772 * Implementing PRIVATE clause::
3773 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
3774 * Implementing REDUCTION clause::
3775 * Implementing PARALLEL construct::
3776 * Implementing FOR construct::
3777 * Implementing ORDERED construct::
3778 * Implementing SECTIONS construct::
3779 * Implementing SINGLE construct::
3780 * Implementing OpenACC's PARALLEL construct::
3784 @node Implementing MASTER construct
3785 @section Implementing MASTER construct
3788 if (omp_get_thread_num () == 0)
3792 Alternately, we generate two copies of the parallel subfunction
3793 and only include this in the version run by the master thread.
3794 Surely this is not worthwhile though...
3798 @node Implementing CRITICAL construct
3799 @section Implementing CRITICAL construct
3801 Without a specified name,
3804 void GOMP_critical_start (void);
3805 void GOMP_critical_end (void);
3808 so that we don't get COPY relocations from libgomp to the main
3811 With a specified name, use omp_set_lock and omp_unset_lock with
3812 name being transformed into a variable declared like
3815 omp_lock_t gomp_critical_user_<name> __attribute__((common))
3818 Ideally the ABI would specify that all zero is a valid unlocked
3819 state, and so we wouldn't need to initialize this at
3824 @node Implementing ATOMIC construct
3825 @section Implementing ATOMIC construct
3827 The target should implement the @code{__sync} builtins.
3829 Failing that we could add
3832 void GOMP_atomic_enter (void)
3833 void GOMP_atomic_exit (void)
3836 which reuses the regular lock code, but with yet another lock
3837 object private to the library.
3841 @node Implementing FLUSH construct
3842 @section Implementing FLUSH construct
3844 Expands to the @code{__sync_synchronize} builtin.
3848 @node Implementing BARRIER construct
3849 @section Implementing BARRIER construct
3852 void GOMP_barrier (void)
3856 @node Implementing THREADPRIVATE construct
3857 @section Implementing THREADPRIVATE construct
3859 In _most_ cases we can map this directly to @code{__thread}. Except
3860 that OMP allows constructors for C++ objects. We can either
3861 refuse to support this (how often is it used?) or we can
3862 implement something akin to .ctors.
3864 Even more ideally, this ctor feature is handled by extensions
3865 to the main pthreads library. Failing that, we can have a set
3866 of entry points to register ctor functions to be called.
3870 @node Implementing PRIVATE clause
3871 @section Implementing PRIVATE clause
3873 In association with a PARALLEL, or within the lexical extent
3874 of a PARALLEL block, the variable becomes a local variable in
3875 the parallel subfunction.
3877 In association with FOR or SECTIONS blocks, create a new
3878 automatic variable within the current function. This preserves
3879 the semantic of new variable creation.
3883 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3884 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3886 This seems simple enough for PARALLEL blocks. Create a private
3887 struct for communicating between the parent and subfunction.
3888 In the parent, copy in values for scalar and "small" structs;
3889 copy in addresses for others TREE_ADDRESSABLE types. In the
3890 subfunction, copy the value into the local variable.
3892 It is not clear what to do with bare FOR or SECTION blocks.
3893 The only thing I can figure is that we do something like:
3896 #pragma omp for firstprivate(x) lastprivate(y)
3897 for (int i = 0; i < n; ++i)
3914 where the "x=x" and "y=y" assignments actually have different
3915 uids for the two variables, i.e. not something you could write
3916 directly in C. Presumably this only makes sense if the "outer"
3917 x and y are global variables.
3919 COPYPRIVATE would work the same way, except the structure
3920 broadcast would have to happen via SINGLE machinery instead.
3924 @node Implementing REDUCTION clause
3925 @section Implementing REDUCTION clause
3927 The private struct mentioned in the previous section should have
3928 a pointer to an array of the type of the variable, indexed by the
3929 thread's @var{team_id}. The thread stores its final value into the
3930 array, and after the barrier, the master thread iterates over the
3931 array to collect the values.
3934 @node Implementing PARALLEL construct
3935 @section Implementing PARALLEL construct
3938 #pragma omp parallel
3947 void subfunction (void *data)
3954 GOMP_parallel_start (subfunction, &data, num_threads);
3955 subfunction (&data);
3956 GOMP_parallel_end ();
3960 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
3963 The @var{FN} argument is the subfunction to be run in parallel.
3965 The @var{DATA} argument is a pointer to a structure used to
3966 communicate data in and out of the subfunction, as discussed
3967 above with respect to FIRSTPRIVATE et al.
3969 The @var{NUM_THREADS} argument is 1 if an IF clause is present
3970 and false, or the value of the NUM_THREADS clause, if
3973 The function needs to create the appropriate number of
3974 threads and/or launch them from the dock. It needs to
3975 create the team structure and assign team ids.
3978 void GOMP_parallel_end (void)
3981 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
3985 @node Implementing FOR construct
3986 @section Implementing FOR construct
3989 #pragma omp parallel for
3990 for (i = lb; i <= ub; i++)
3997 void subfunction (void *data)
4000 while (GOMP_loop_static_next (&_s0, &_e0))
4003 for (i = _s0; i < _e1; i++)
4006 GOMP_loop_end_nowait ();
4009 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
4011 GOMP_parallel_end ();
4015 #pragma omp for schedule(runtime)
4016 for (i = 0; i < n; i++)
4025 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
4028 for (i = _s0, i < _e0; i++)
4030 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
4035 Note that while it looks like there is trickiness to propagating
4036 a non-constant STEP, there isn't really. We're explicitly allowed
4037 to evaluate it as many times as we want, and any variables involved
4038 should automatically be handled as PRIVATE or SHARED like any other
4039 variables. So the expression should remain evaluable in the
4040 subfunction. We can also pull it into a local variable if we like,
4041 but since its supposed to remain unchanged, we can also not if we like.
4043 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
4044 able to get away with no work-sharing context at all, since we can
4045 simply perform the arithmetic directly in each thread to divide up
4046 the iterations. Which would mean that we wouldn't need to call any
4049 There are separate routines for handling loops with an ORDERED
4050 clause. Bookkeeping for that is non-trivial...
4054 @node Implementing ORDERED construct
4055 @section Implementing ORDERED construct
4058 void GOMP_ordered_start (void)
4059 void GOMP_ordered_end (void)
4064 @node Implementing SECTIONS construct
4065 @section Implementing SECTIONS construct
4070 #pragma omp sections
4084 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
4101 @node Implementing SINGLE construct
4102 @section Implementing SINGLE construct
4116 if (GOMP_single_start ())
4124 #pragma omp single copyprivate(x)
4131 datap = GOMP_single_copy_start ();
4136 GOMP_single_copy_end (&data);
4145 @node Implementing OpenACC's PARALLEL construct
4146 @section Implementing OpenACC's PARALLEL construct
4149 void GOACC_parallel ()
4154 @c ---------------------------------------------------------------------
4156 @c ---------------------------------------------------------------------
4158 @node Reporting Bugs
4159 @chapter Reporting Bugs
4161 Bugs in the GNU Offloading and Multi Processing Runtime Library should
4162 be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
4163 "openacc", or "openmp", or both to the keywords field in the bug
4164 report, as appropriate.
4168 @c ---------------------------------------------------------------------
4169 @c GNU General Public License
4170 @c ---------------------------------------------------------------------
4172 @include gpl_v3.texi
4176 @c ---------------------------------------------------------------------
4177 @c GNU Free Documentation License
4178 @c ---------------------------------------------------------------------
4184 @c ---------------------------------------------------------------------
4185 @c Funding Free Software
4186 @c ---------------------------------------------------------------------
4188 @include funding.texi
4190 @c ---------------------------------------------------------------------
4192 @c ---------------------------------------------------------------------
4195 @unnumbered Library Index