1 \input texinfo @c -*-texinfo-*-
4 @setfilename libgomp.info
10 Copyright @copyright{} 2006-2019 Free Software Foundation, Inc.
12 Permission is granted to copy, distribute and/or modify this document
13 under the terms of the GNU Free Documentation License, Version 1.3 or
14 any later version published by the Free Software Foundation; with the
15 Invariant Sections being ``Funding Free Software'', the Front-Cover
16 texts being (a) (see below), and with the Back-Cover Texts being (b)
17 (see below). A copy of the license is included in the section entitled
18 ``GNU Free Documentation License''.
20 (a) The FSF's Front-Cover Text is:
24 (b) The FSF's Back-Cover Text is:
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
32 @dircategory GNU Libraries
34 * libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
37 This manual documents libgomp, the GNU Offloading and Multi Processing
38 Runtime library. This is the GNU implementation of the OpenMP and
39 OpenACC APIs for parallel and accelerator programming in C/C++ and
42 Published by the Free Software Foundation
43 51 Franklin Street, Fifth Floor
44 Boston, MA 02110-1301 USA
50 @setchapternewpage odd
53 @title GNU Offloading and Multi Processing Runtime Library
54 @subtitle The GNU OpenMP and OpenACC Implementation
56 @vskip 0pt plus 1filll
57 @comment For the @value{version-GCC} Version*
59 Published by the Free Software Foundation @*
60 51 Franklin Street, Fifth Floor@*
61 Boston, MA 02110-1301, USA@*
75 This manual documents the usage of libgomp, the GNU Offloading and
76 Multi Processing Runtime Library. This includes the GNU
77 implementation of the @uref{https://www.openmp.org, OpenMP} Application
78 Programming Interface (API) for multi-platform shared-memory parallel
79 programming in C/C++ and Fortran, and the GNU implementation of the
80 @uref{https://www.openacc.org, OpenACC} Application Programming
81 Interface (API) for offloading of code to accelerator devices in C/C++
84 Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
86 4's target construct) has been added later on, and the library's name
87 changed to GNU Offloading and Multi Processing Runtime Library.
92 @comment When you add a new menu item, please keep the right hand
93 @comment aligned to the same column. Do not use tabs. This provides
94 @comment better formatting.
97 * Enabling OpenMP:: How to enable OpenMP for your applications.
98 * OpenMP Runtime Library Routines: Runtime Library Routines.
99 The OpenMP runtime application programming
101 * OpenMP Environment Variables: Environment Variables.
102 Influencing OpenMP runtime behavior with
103 environment variables.
104 * Enabling OpenACC:: How to enable OpenACC for your
106 * OpenACC Runtime Library Routines:: The OpenACC runtime application
107 programming interface.
108 * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
109 environment variables.
110 * CUDA Streams Usage:: Notes on the implementation of
111 asynchronous operations.
112 * OpenACC Library Interoperability:: OpenACC library interoperability with the
113 NVIDIA CUBLAS library.
114 * The libgomp ABI:: Notes on the external ABI presented by libgomp.
115 * Reporting Bugs:: How to report bugs in the GNU Offloading and
116 Multi Processing Runtime Library.
117 * Copying:: GNU general public license says
118 how you can copy and share libgomp.
119 * GNU Free Documentation License::
120 How you can copy and share this manual.
121 * Funding:: How to help assure continued work for free
123 * Library Index:: Index of this documentation.
127 @c ---------------------------------------------------------------------
129 @c ---------------------------------------------------------------------
131 @node Enabling OpenMP
132 @chapter Enabling OpenMP
134 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
135 flag @command{-fopenmp} must be specified. This enables the OpenMP directive
136 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
137 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
138 @code{!$} conditional compilation sentinels in free form and @code{c$},
139 @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
140 arranges for automatic linking of the OpenMP runtime library
141 (@ref{Runtime Library Routines}).
143 A complete description of all OpenMP directives accepted may be found in
144 the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
148 @c ---------------------------------------------------------------------
149 @c OpenMP Runtime Library Routines
150 @c ---------------------------------------------------------------------
152 @node Runtime Library Routines
153 @chapter OpenMP Runtime Library Routines
155 The runtime routines described here are defined by Section 3 of the OpenMP
156 specification in version 4.5. The routines are structured in following
160 Control threads, processors and the parallel environment. They have C
161 linkage, and do not throw exceptions.
163 * omp_get_active_level:: Number of active parallel regions
164 * omp_get_ancestor_thread_num:: Ancestor thread ID
165 * omp_get_cancellation:: Whether cancellation support is enabled
166 * omp_get_default_device:: Get the default device for target regions
167 * omp_get_dynamic:: Dynamic teams setting
168 * omp_get_level:: Number of parallel regions
169 * omp_get_max_active_levels:: Maximum number of active regions
170 * omp_get_max_task_priority:: Maximum task priority value that can be set
171 * omp_get_max_threads:: Maximum number of threads of parallel region
172 * omp_get_nested:: Nested parallel regions
173 * omp_get_num_devices:: Number of target devices
174 * omp_get_num_procs:: Number of processors online
175 * omp_get_num_teams:: Number of teams
176 * omp_get_num_threads:: Size of the active team
177 * omp_get_proc_bind:: Whether theads may be moved between CPUs
178 * omp_get_schedule:: Obtain the runtime scheduling method
179 * omp_get_team_num:: Get team number
180 * omp_get_team_size:: Number of threads in a team
181 * omp_get_thread_limit:: Maximum number of threads
182 * omp_get_thread_num:: Current thread ID
183 * omp_in_parallel:: Whether a parallel region is active
184 * omp_in_final:: Whether in final or included task region
185 * omp_is_initial_device:: Whether executing on the host device
186 * omp_set_default_device:: Set the default device for target regions
187 * omp_set_dynamic:: Enable/disable dynamic teams
188 * omp_set_max_active_levels:: Limits the number of active parallel regions
189 * omp_set_nested:: Enable/disable nested parallel regions
190 * omp_set_num_threads:: Set upper team size limit
191 * omp_set_schedule:: Set the runtime scheduling method
193 Initialize, set, test, unset and destroy simple and nested locks.
195 * omp_init_lock:: Initialize simple lock
196 * omp_set_lock:: Wait for and set simple lock
197 * omp_test_lock:: Test and set simple lock if available
198 * omp_unset_lock:: Unset simple lock
199 * omp_destroy_lock:: Destroy simple lock
200 * omp_init_nest_lock:: Initialize nested lock
201 * omp_set_nest_lock:: Wait for and set simple lock
202 * omp_test_nest_lock:: Test and set nested lock if available
203 * omp_unset_nest_lock:: Unset nested lock
204 * omp_destroy_nest_lock:: Destroy nested lock
206 Portable, thread-based, wall clock timer.
208 * omp_get_wtick:: Get timer precision.
209 * omp_get_wtime:: Elapsed wall clock time.
214 @node omp_get_active_level
215 @section @code{omp_get_active_level} -- Number of parallel regions
217 @item @emph{Description}:
218 This function returns the nesting level for the active parallel blocks,
219 which enclose the calling call.
222 @multitable @columnfractions .20 .80
223 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
226 @item @emph{Fortran}:
227 @multitable @columnfractions .20 .80
228 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
231 @item @emph{See also}:
232 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
234 @item @emph{Reference}:
235 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
240 @node omp_get_ancestor_thread_num
241 @section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
243 @item @emph{Description}:
244 This function returns the thread identification number for the given
245 nesting level of the current thread. For values of @var{level} outside
246 zero to @code{omp_get_level} -1 is returned; if @var{level} is
247 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
250 @multitable @columnfractions .20 .80
251 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
254 @item @emph{Fortran}:
255 @multitable @columnfractions .20 .80
256 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
257 @item @tab @code{integer level}
260 @item @emph{See also}:
261 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
263 @item @emph{Reference}:
264 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
269 @node omp_get_cancellation
270 @section @code{omp_get_cancellation} -- Whether cancellation support is enabled
272 @item @emph{Description}:
273 This function returns @code{true} if cancellation is activated, @code{false}
274 otherwise. Here, @code{true} and @code{false} represent their language-specific
275 counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
279 @multitable @columnfractions .20 .80
280 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
283 @item @emph{Fortran}:
284 @multitable @columnfractions .20 .80
285 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
288 @item @emph{See also}:
289 @ref{OMP_CANCELLATION}
291 @item @emph{Reference}:
292 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
297 @node omp_get_default_device
298 @section @code{omp_get_default_device} -- Get the default device for target regions
300 @item @emph{Description}:
301 Get the default device for target regions without device clause.
304 @multitable @columnfractions .20 .80
305 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
308 @item @emph{Fortran}:
309 @multitable @columnfractions .20 .80
310 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
313 @item @emph{See also}:
314 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
316 @item @emph{Reference}:
317 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
322 @node omp_get_dynamic
323 @section @code{omp_get_dynamic} -- Dynamic teams setting
325 @item @emph{Description}:
326 This function returns @code{true} if enabled, @code{false} otherwise.
327 Here, @code{true} and @code{false} represent their language-specific
330 The dynamic team setting may be initialized at startup by the
331 @env{OMP_DYNAMIC} environment variable or at runtime using
332 @code{omp_set_dynamic}. If undefined, dynamic adjustment is
336 @multitable @columnfractions .20 .80
337 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
340 @item @emph{Fortran}:
341 @multitable @columnfractions .20 .80
342 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
345 @item @emph{See also}:
346 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
348 @item @emph{Reference}:
349 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
355 @section @code{omp_get_level} -- Obtain the current nesting level
357 @item @emph{Description}:
358 This function returns the nesting level for the parallel blocks,
359 which enclose the calling call.
362 @multitable @columnfractions .20 .80
363 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
366 @item @emph{Fortran}:
367 @multitable @columnfractions .20 .80
368 @item @emph{Interface}: @tab @code{integer function omp_level()}
371 @item @emph{See also}:
372 @ref{omp_get_active_level}
374 @item @emph{Reference}:
375 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
380 @node omp_get_max_active_levels
381 @section @code{omp_get_max_active_levels} -- Maximum number of active regions
383 @item @emph{Description}:
384 This function obtains the maximum allowed number of nested, active parallel regions.
387 @multitable @columnfractions .20 .80
388 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
391 @item @emph{Fortran}:
392 @multitable @columnfractions .20 .80
393 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
396 @item @emph{See also}:
397 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
399 @item @emph{Reference}:
400 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
404 @node omp_get_max_task_priority
405 @section @code{omp_get_max_task_priority} -- Maximum priority value
406 that can be set for tasks.
408 @item @emph{Description}:
409 This function obtains the maximum allowed priority number for tasks.
412 @multitable @columnfractions .20 .80
413 @item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
416 @item @emph{Fortran}:
417 @multitable @columnfractions .20 .80
418 @item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
421 @item @emph{Reference}:
422 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
426 @node omp_get_max_threads
427 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
429 @item @emph{Description}:
430 Return the maximum number of threads used for the current parallel region
431 that does not use the clause @code{num_threads}.
434 @multitable @columnfractions .20 .80
435 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
438 @item @emph{Fortran}:
439 @multitable @columnfractions .20 .80
440 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
443 @item @emph{See also}:
444 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
446 @item @emph{Reference}:
447 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
453 @section @code{omp_get_nested} -- Nested parallel regions
455 @item @emph{Description}:
456 This function returns @code{true} if nested parallel regions are
457 enabled, @code{false} otherwise. Here, @code{true} and @code{false}
458 represent their language-specific counterparts.
460 Nested parallel regions may be initialized at startup by the
461 @env{OMP_NESTED} environment variable or at runtime using
462 @code{omp_set_nested}. If undefined, nested parallel regions are
466 @multitable @columnfractions .20 .80
467 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
470 @item @emph{Fortran}:
471 @multitable @columnfractions .20 .80
472 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
475 @item @emph{See also}:
476 @ref{omp_set_nested}, @ref{OMP_NESTED}
478 @item @emph{Reference}:
479 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
484 @node omp_get_num_devices
485 @section @code{omp_get_num_devices} -- Number of target devices
487 @item @emph{Description}:
488 Returns the number of target devices.
491 @multitable @columnfractions .20 .80
492 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
495 @item @emph{Fortran}:
496 @multitable @columnfractions .20 .80
497 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
500 @item @emph{Reference}:
501 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
506 @node omp_get_num_procs
507 @section @code{omp_get_num_procs} -- Number of processors online
509 @item @emph{Description}:
510 Returns the number of processors online on that device.
513 @multitable @columnfractions .20 .80
514 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
517 @item @emph{Fortran}:
518 @multitable @columnfractions .20 .80
519 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
522 @item @emph{Reference}:
523 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
528 @node omp_get_num_teams
529 @section @code{omp_get_num_teams} -- Number of teams
531 @item @emph{Description}:
532 Returns the number of teams in the current team region.
535 @multitable @columnfractions .20 .80
536 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
539 @item @emph{Fortran}:
540 @multitable @columnfractions .20 .80
541 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
544 @item @emph{Reference}:
545 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
550 @node omp_get_num_threads
551 @section @code{omp_get_num_threads} -- Size of the active team
553 @item @emph{Description}:
554 Returns the number of threads in the current team. In a sequential section of
555 the program @code{omp_get_num_threads} returns 1.
557 The default team size may be initialized at startup by the
558 @env{OMP_NUM_THREADS} environment variable. At runtime, the size
559 of the current team may be set either by the @code{NUM_THREADS}
560 clause or by @code{omp_set_num_threads}. If none of the above were
561 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
562 one thread per CPU online is used.
565 @multitable @columnfractions .20 .80
566 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
569 @item @emph{Fortran}:
570 @multitable @columnfractions .20 .80
571 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
574 @item @emph{See also}:
575 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
577 @item @emph{Reference}:
578 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
583 @node omp_get_proc_bind
584 @section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
586 @item @emph{Description}:
587 This functions returns the currently active thread affinity policy, which is
588 set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
589 @code{omp_proc_bind_true}, @code{omp_proc_bind_master},
590 @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
593 @multitable @columnfractions .20 .80
594 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
597 @item @emph{Fortran}:
598 @multitable @columnfractions .20 .80
599 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
602 @item @emph{See also}:
603 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
605 @item @emph{Reference}:
606 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
611 @node omp_get_schedule
612 @section @code{omp_get_schedule} -- Obtain the runtime scheduling method
614 @item @emph{Description}:
615 Obtain the runtime scheduling method. The @var{kind} argument will be
616 set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
617 @code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
618 @var{chunk_size}, is set to the chunk size.
621 @multitable @columnfractions .20 .80
622 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
625 @item @emph{Fortran}:
626 @multitable @columnfractions .20 .80
627 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
628 @item @tab @code{integer(kind=omp_sched_kind) kind}
629 @item @tab @code{integer chunk_size}
632 @item @emph{See also}:
633 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
635 @item @emph{Reference}:
636 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
641 @node omp_get_team_num
642 @section @code{omp_get_team_num} -- Get team number
644 @item @emph{Description}:
645 Returns the team number of the calling thread.
648 @multitable @columnfractions .20 .80
649 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
652 @item @emph{Fortran}:
653 @multitable @columnfractions .20 .80
654 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
657 @item @emph{Reference}:
658 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
663 @node omp_get_team_size
664 @section @code{omp_get_team_size} -- Number of threads in a team
666 @item @emph{Description}:
667 This function returns the number of threads in a thread team to which
668 either the current thread or its ancestor belongs. For values of @var{level}
669 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
670 1 is returned, and for @code{omp_get_level}, the result is identical
671 to @code{omp_get_num_threads}.
674 @multitable @columnfractions .20 .80
675 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
678 @item @emph{Fortran}:
679 @multitable @columnfractions .20 .80
680 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
681 @item @tab @code{integer level}
684 @item @emph{See also}:
685 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
687 @item @emph{Reference}:
688 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
693 @node omp_get_thread_limit
694 @section @code{omp_get_thread_limit} -- Maximum number of threads
696 @item @emph{Description}:
697 Return the maximum number of threads of the program.
700 @multitable @columnfractions .20 .80
701 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
704 @item @emph{Fortran}:
705 @multitable @columnfractions .20 .80
706 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
709 @item @emph{See also}:
710 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
712 @item @emph{Reference}:
713 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
718 @node omp_get_thread_num
719 @section @code{omp_get_thread_num} -- Current thread ID
721 @item @emph{Description}:
722 Returns a unique thread identification number within the current team.
723 In a sequential parts of the program, @code{omp_get_thread_num}
724 always returns 0. In parallel regions the return value varies
725 from 0 to @code{omp_get_num_threads}-1 inclusive. The return
726 value of the master thread of a team is always 0.
729 @multitable @columnfractions .20 .80
730 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
733 @item @emph{Fortran}:
734 @multitable @columnfractions .20 .80
735 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
738 @item @emph{See also}:
739 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
741 @item @emph{Reference}:
742 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
747 @node omp_in_parallel
748 @section @code{omp_in_parallel} -- Whether a parallel region is active
750 @item @emph{Description}:
751 This function returns @code{true} if currently running in parallel,
752 @code{false} otherwise. Here, @code{true} and @code{false} represent
753 their language-specific counterparts.
756 @multitable @columnfractions .20 .80
757 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
760 @item @emph{Fortran}:
761 @multitable @columnfractions .20 .80
762 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
765 @item @emph{Reference}:
766 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
771 @section @code{omp_in_final} -- Whether in final or included task region
773 @item @emph{Description}:
774 This function returns @code{true} if currently running in a final
775 or included task region, @code{false} otherwise. Here, @code{true}
776 and @code{false} represent their language-specific counterparts.
779 @multitable @columnfractions .20 .80
780 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
783 @item @emph{Fortran}:
784 @multitable @columnfractions .20 .80
785 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
788 @item @emph{Reference}:
789 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
794 @node omp_is_initial_device
795 @section @code{omp_is_initial_device} -- Whether executing on the host device
797 @item @emph{Description}:
798 This function returns @code{true} if currently running on the host device,
799 @code{false} otherwise. Here, @code{true} and @code{false} represent
800 their language-specific counterparts.
803 @multitable @columnfractions .20 .80
804 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
807 @item @emph{Fortran}:
808 @multitable @columnfractions .20 .80
809 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
812 @item @emph{Reference}:
813 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
818 @node omp_set_default_device
819 @section @code{omp_set_default_device} -- Set the default device for target regions
821 @item @emph{Description}:
822 Set the default device for target regions without device clause. The argument
823 shall be a nonnegative device number.
826 @multitable @columnfractions .20 .80
827 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
830 @item @emph{Fortran}:
831 @multitable @columnfractions .20 .80
832 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
833 @item @tab @code{integer device_num}
836 @item @emph{See also}:
837 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
839 @item @emph{Reference}:
840 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
845 @node omp_set_dynamic
846 @section @code{omp_set_dynamic} -- Enable/disable dynamic teams
848 @item @emph{Description}:
849 Enable or disable the dynamic adjustment of the number of threads
850 within a team. The function takes the language-specific equivalent
851 of @code{true} and @code{false}, where @code{true} enables dynamic
852 adjustment of team sizes and @code{false} disables it.
855 @multitable @columnfractions .20 .80
856 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
859 @item @emph{Fortran}:
860 @multitable @columnfractions .20 .80
861 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
862 @item @tab @code{logical, intent(in) :: dynamic_threads}
865 @item @emph{See also}:
866 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
868 @item @emph{Reference}:
869 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
874 @node omp_set_max_active_levels
875 @section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
877 @item @emph{Description}:
878 This function limits the maximum allowed number of nested, active
882 @multitable @columnfractions .20 .80
883 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
886 @item @emph{Fortran}:
887 @multitable @columnfractions .20 .80
888 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
889 @item @tab @code{integer max_levels}
892 @item @emph{See also}:
893 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
895 @item @emph{Reference}:
896 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
902 @section @code{omp_set_nested} -- Enable/disable nested parallel regions
904 @item @emph{Description}:
905 Enable or disable nested parallel regions, i.e., whether team members
906 are allowed to create new teams. The function takes the language-specific
907 equivalent of @code{true} and @code{false}, where @code{true} enables
908 dynamic adjustment of team sizes and @code{false} disables it.
911 @multitable @columnfractions .20 .80
912 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
915 @item @emph{Fortran}:
916 @multitable @columnfractions .20 .80
917 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
918 @item @tab @code{logical, intent(in) :: nested}
921 @item @emph{See also}:
922 @ref{OMP_NESTED}, @ref{omp_get_nested}
924 @item @emph{Reference}:
925 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
930 @node omp_set_num_threads
931 @section @code{omp_set_num_threads} -- Set upper team size limit
933 @item @emph{Description}:
934 Specifies the number of threads used by default in subsequent parallel
935 sections, if those do not specify a @code{num_threads} clause. The
936 argument of @code{omp_set_num_threads} shall be a positive integer.
939 @multitable @columnfractions .20 .80
940 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
943 @item @emph{Fortran}:
944 @multitable @columnfractions .20 .80
945 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
946 @item @tab @code{integer, intent(in) :: num_threads}
949 @item @emph{See also}:
950 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
952 @item @emph{Reference}:
953 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
958 @node omp_set_schedule
959 @section @code{omp_set_schedule} -- Set the runtime scheduling method
961 @item @emph{Description}:
962 Sets the runtime scheduling method. The @var{kind} argument can have the
963 value @code{omp_sched_static}, @code{omp_sched_dynamic},
964 @code{omp_sched_guided} or @code{omp_sched_auto}. Except for
965 @code{omp_sched_auto}, the chunk size is set to the value of
966 @var{chunk_size} if positive, or to the default value if zero or negative.
967 For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
970 @multitable @columnfractions .20 .80
971 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
974 @item @emph{Fortran}:
975 @multitable @columnfractions .20 .80
976 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
977 @item @tab @code{integer(kind=omp_sched_kind) kind}
978 @item @tab @code{integer chunk_size}
981 @item @emph{See also}:
982 @ref{omp_get_schedule}
985 @item @emph{Reference}:
986 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
992 @section @code{omp_init_lock} -- Initialize simple lock
994 @item @emph{Description}:
995 Initialize a simple lock. After initialization, the lock is in
999 @multitable @columnfractions .20 .80
1000 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1003 @item @emph{Fortran}:
1004 @multitable @columnfractions .20 .80
1005 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1006 @item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1009 @item @emph{See also}:
1010 @ref{omp_destroy_lock}
1012 @item @emph{Reference}:
1013 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1019 @section @code{omp_set_lock} -- Wait for and set simple lock
1021 @item @emph{Description}:
1022 Before setting a simple lock, the lock variable must be initialized by
1023 @code{omp_init_lock}. The calling thread is blocked until the lock
1024 is available. If the lock is already held by the current thread,
1028 @multitable @columnfractions .20 .80
1029 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1032 @item @emph{Fortran}:
1033 @multitable @columnfractions .20 .80
1034 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1035 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1038 @item @emph{See also}:
1039 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1041 @item @emph{Reference}:
1042 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1048 @section @code{omp_test_lock} -- Test and set simple lock if available
1050 @item @emph{Description}:
1051 Before setting a simple lock, the lock variable must be initialized by
1052 @code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1053 does not block if the lock is not available. This function returns
1054 @code{true} upon success, @code{false} otherwise. Here, @code{true} and
1055 @code{false} represent their language-specific counterparts.
1058 @multitable @columnfractions .20 .80
1059 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1062 @item @emph{Fortran}:
1063 @multitable @columnfractions .20 .80
1064 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1065 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1068 @item @emph{See also}:
1069 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1071 @item @emph{Reference}:
1072 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1077 @node omp_unset_lock
1078 @section @code{omp_unset_lock} -- Unset simple lock
1080 @item @emph{Description}:
1081 A simple lock about to be unset must have been locked by @code{omp_set_lock}
1082 or @code{omp_test_lock} before. In addition, the lock must be held by the
1083 thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
1084 or more threads attempted to set the lock before, one of them is chosen to,
1085 again, set the lock to itself.
1088 @multitable @columnfractions .20 .80
1089 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1092 @item @emph{Fortran}:
1093 @multitable @columnfractions .20 .80
1094 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1095 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1098 @item @emph{See also}:
1099 @ref{omp_set_lock}, @ref{omp_test_lock}
1101 @item @emph{Reference}:
1102 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1107 @node omp_destroy_lock
1108 @section @code{omp_destroy_lock} -- Destroy simple lock
1110 @item @emph{Description}:
1111 Destroy a simple lock. In order to be destroyed, a simple lock must be
1112 in the unlocked state.
1115 @multitable @columnfractions .20 .80
1116 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1119 @item @emph{Fortran}:
1120 @multitable @columnfractions .20 .80
1121 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1122 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1125 @item @emph{See also}:
1128 @item @emph{Reference}:
1129 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1134 @node omp_init_nest_lock
1135 @section @code{omp_init_nest_lock} -- Initialize nested lock
1137 @item @emph{Description}:
1138 Initialize a nested lock. After initialization, the lock is in
1139 an unlocked state and the nesting count is set to zero.
1142 @multitable @columnfractions .20 .80
1143 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1146 @item @emph{Fortran}:
1147 @multitable @columnfractions .20 .80
1148 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1149 @item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1152 @item @emph{See also}:
1153 @ref{omp_destroy_nest_lock}
1155 @item @emph{Reference}:
1156 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1160 @node omp_set_nest_lock
1161 @section @code{omp_set_nest_lock} -- Wait for and set nested lock
1163 @item @emph{Description}:
1164 Before setting a nested lock, the lock variable must be initialized by
1165 @code{omp_init_nest_lock}. The calling thread is blocked until the lock
1166 is available. If the lock is already held by the current thread, the
1167 nesting count for the lock is incremented.
1170 @multitable @columnfractions .20 .80
1171 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1174 @item @emph{Fortran}:
1175 @multitable @columnfractions .20 .80
1176 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1177 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1180 @item @emph{See also}:
1181 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1183 @item @emph{Reference}:
1184 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1189 @node omp_test_nest_lock
1190 @section @code{omp_test_nest_lock} -- Test and set nested lock if available
1192 @item @emph{Description}:
1193 Before setting a nested lock, the lock variable must be initialized by
1194 @code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
1195 @code{omp_test_nest_lock} does not block if the lock is not available.
1196 If the lock is already held by the current thread, the new nesting count
1197 is returned. Otherwise, the return value equals zero.
1200 @multitable @columnfractions .20 .80
1201 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1204 @item @emph{Fortran}:
1205 @multitable @columnfractions .20 .80
1206 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1207 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1211 @item @emph{See also}:
1212 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1214 @item @emph{Reference}:
1215 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1220 @node omp_unset_nest_lock
1221 @section @code{omp_unset_nest_lock} -- Unset nested lock
1223 @item @emph{Description}:
1224 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1225 or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
1226 thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
1227 lock becomes unlocked. If one ore more threads attempted to set the lock before,
1228 one of them is chosen to, again, set the lock to itself.
1231 @multitable @columnfractions .20 .80
1232 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1235 @item @emph{Fortran}:
1236 @multitable @columnfractions .20 .80
1237 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1238 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1241 @item @emph{See also}:
1242 @ref{omp_set_nest_lock}
1244 @item @emph{Reference}:
1245 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1250 @node omp_destroy_nest_lock
1251 @section @code{omp_destroy_nest_lock} -- Destroy nested lock
1253 @item @emph{Description}:
1254 Destroy a nested lock. In order to be destroyed, a nested lock must be
1255 in the unlocked state and its nesting count must equal zero.
1258 @multitable @columnfractions .20 .80
1259 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1262 @item @emph{Fortran}:
1263 @multitable @columnfractions .20 .80
1264 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1265 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1268 @item @emph{See also}:
1271 @item @emph{Reference}:
1272 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1278 @section @code{omp_get_wtick} -- Get timer precision
1280 @item @emph{Description}:
1281 Gets the timer precision, i.e., the number of seconds between two
1282 successive clock ticks.
1285 @multitable @columnfractions .20 .80
1286 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1289 @item @emph{Fortran}:
1290 @multitable @columnfractions .20 .80
1291 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1294 @item @emph{See also}:
1297 @item @emph{Reference}:
1298 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
1304 @section @code{omp_get_wtime} -- Elapsed wall clock time
1306 @item @emph{Description}:
1307 Elapsed wall clock time in seconds. The time is measured per thread, no
1308 guarantee can be made that two distinct threads measure the same time.
1309 Time is measured from some "time in the past", which is an arbitrary time
1310 guaranteed not to change during the execution of the program.
1313 @multitable @columnfractions .20 .80
1314 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1317 @item @emph{Fortran}:
1318 @multitable @columnfractions .20 .80
1319 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1322 @item @emph{See also}:
1325 @item @emph{Reference}:
1326 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
1331 @c ---------------------------------------------------------------------
1332 @c OpenMP Environment Variables
1333 @c ---------------------------------------------------------------------
1335 @node Environment Variables
1336 @chapter OpenMP Environment Variables
1338 The environment variables which beginning with @env{OMP_} are defined by
1339 section 4 of the OpenMP specification in version 4.5, while those
1340 beginning with @env{GOMP_} are GNU extensions.
1343 * OMP_CANCELLATION:: Set whether cancellation is activated
1344 * OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
1345 * OMP_DEFAULT_DEVICE:: Set the device used in target regions
1346 * OMP_DYNAMIC:: Dynamic adjustment of threads
1347 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
1348 * OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
1349 * OMP_NESTED:: Nested parallel regions
1350 * OMP_NUM_THREADS:: Specifies the number of threads to use
1351 * OMP_PROC_BIND:: Whether theads may be moved between CPUs
1352 * OMP_PLACES:: Specifies on which CPUs the theads should be placed
1353 * OMP_STACKSIZE:: Set default thread stack size
1354 * OMP_SCHEDULE:: How threads are scheduled
1355 * OMP_THREAD_LIMIT:: Set the maximum number of threads
1356 * OMP_WAIT_POLICY:: How waiting threads are handled
1357 * GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
1358 * GOMP_DEBUG:: Enable debugging output
1359 * GOMP_STACKSIZE:: Set default thread stack size
1360 * GOMP_SPINCOUNT:: Set the busy-wait spin count
1361 * GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
1365 @node OMP_CANCELLATION
1366 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1367 @cindex Environment Variable
1369 @item @emph{Description}:
1370 If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
1371 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1373 @item @emph{See also}:
1374 @ref{omp_get_cancellation}
1376 @item @emph{Reference}:
1377 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
1382 @node OMP_DISPLAY_ENV
1383 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1384 @cindex Environment Variable
1386 @item @emph{Description}:
1387 If set to @code{TRUE}, the OpenMP version number and the values
1388 associated with the OpenMP environment variables are printed to @code{stderr}.
1389 If set to @code{VERBOSE}, it additionally shows the value of the environment
1390 variables which are GNU extensions. If undefined or set to @code{FALSE},
1391 this information will not be shown.
1394 @item @emph{Reference}:
1395 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
1400 @node OMP_DEFAULT_DEVICE
1401 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1402 @cindex Environment Variable
1404 @item @emph{Description}:
1405 Set to choose the device which is used in a @code{target} region, unless the
1406 value is overridden by @code{omp_set_default_device} or by a @code{device}
1407 clause. The value shall be the nonnegative device number. If no device with
1408 the given device number exists, the code is executed on the host. If unset,
1409 device number 0 will be used.
1412 @item @emph{See also}:
1413 @ref{omp_get_default_device}, @ref{omp_set_default_device},
1415 @item @emph{Reference}:
1416 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
1422 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1423 @cindex Environment Variable
1425 @item @emph{Description}:
1426 Enable or disable the dynamic adjustment of the number of threads
1427 within a team. The value of this environment variable shall be
1428 @code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
1429 disabled by default.
1431 @item @emph{See also}:
1432 @ref{omp_set_dynamic}
1434 @item @emph{Reference}:
1435 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
1440 @node OMP_MAX_ACTIVE_LEVELS
1441 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
1442 @cindex Environment Variable
1444 @item @emph{Description}:
1445 Specifies the initial value for the maximum number of nested parallel
1446 regions. The value of this variable shall be a positive integer.
1447 If undefined, the number of active levels is unlimited.
1449 @item @emph{See also}:
1450 @ref{omp_set_max_active_levels}
1452 @item @emph{Reference}:
1453 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
1458 @node OMP_MAX_TASK_PRIORITY
1459 @section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
1460 number that can be set for a task.
1461 @cindex Environment Variable
1463 @item @emph{Description}:
1464 Specifies the initial value for the maximum priority value that can be
1465 set for a task. The value of this variable shall be a non-negative
1466 integer, and zero is allowed. If undefined, the default priority is
1469 @item @emph{See also}:
1470 @ref{omp_get_max_task_priority}
1472 @item @emph{Reference}:
1473 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
1479 @section @env{OMP_NESTED} -- Nested parallel regions
1480 @cindex Environment Variable
1481 @cindex Implementation specific setting
1483 @item @emph{Description}:
1484 Enable or disable nested parallel regions, i.e., whether team members
1485 are allowed to create new teams. The value of this environment variable
1486 shall be @code{TRUE} or @code{FALSE}. If undefined, nested parallel
1487 regions are disabled by default.
1489 @item @emph{See also}:
1490 @ref{omp_set_nested}
1492 @item @emph{Reference}:
1493 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
1498 @node OMP_NUM_THREADS
1499 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
1500 @cindex Environment Variable
1501 @cindex Implementation specific setting
1503 @item @emph{Description}:
1504 Specifies the default number of threads to use in parallel regions. The
1505 value of this variable shall be a comma-separated list of positive integers;
1506 the value specified the number of threads to use for the corresponding nested
1507 level. If undefined one thread per CPU is used.
1509 @item @emph{See also}:
1510 @ref{omp_set_num_threads}
1512 @item @emph{Reference}:
1513 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
1519 @section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
1520 @cindex Environment Variable
1522 @item @emph{Description}:
1523 Specifies whether threads may be moved between processors. If set to
1524 @code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
1525 they may be moved. Alternatively, a comma separated list with the
1526 values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
1527 the thread affinity policy for the corresponding nesting level. With
1528 @code{MASTER} the worker threads are in the same place partition as the
1529 master thread. With @code{CLOSE} those are kept close to the master thread
1530 in contiguous place partitions. And with @code{SPREAD} a sparse distribution
1531 across the place partitions is used.
1533 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
1534 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
1536 @item @emph{See also}:
1537 @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
1539 @item @emph{Reference}:
1540 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
1546 @section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
1547 @cindex Environment Variable
1549 @item @emph{Description}:
1550 The thread placement can be either specified using an abstract name or by an
1551 explicit list of the places. The abstract names @code{threads}, @code{cores}
1552 and @code{sockets} can be optionally followed by a positive number in
1553 parentheses, which denotes the how many places shall be created. With
1554 @code{threads} each place corresponds to a single hardware thread; @code{cores}
1555 to a single core with the corresponding number of hardware threads; and with
1556 @code{sockets} the place corresponds to a single socket. The resulting
1557 placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
1560 Alternatively, the placement can be specified explicitly as comma-separated
1561 list of places. A place is specified by set of nonnegative numbers in curly
1562 braces, denoting the denoting the hardware threads. The hardware threads
1563 belonging to a place can either be specified as comma-separated list of
1564 nonnegative thread numbers or using an interval. Multiple places can also be
1565 either specified by a comma-separated list of places or by an interval. To
1566 specify an interval, a colon followed by the count is placed after after
1567 the hardware thread number or the place. Optionally, the length can be
1568 followed by a colon and the stride number -- otherwise a unit stride is
1569 assumed. For instance, the following specifies the same places list:
1570 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
1571 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
1573 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
1574 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
1575 between CPUs following no placement policy.
1577 @item @emph{See also}:
1578 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
1579 @ref{OMP_DISPLAY_ENV}
1581 @item @emph{Reference}:
1582 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
1588 @section @env{OMP_STACKSIZE} -- Set default thread stack size
1589 @cindex Environment Variable
1591 @item @emph{Description}:
1592 Set the default thread stack size in kilobytes, unless the number
1593 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
1594 case the size is, respectively, in bytes, kilobytes, megabytes
1595 or gigabytes. This is different from @code{pthread_attr_setstacksize}
1596 which gets the number of bytes as an argument. If the stack size cannot
1597 be set due to system constraints, an error is reported and the initial
1598 stack size is left unchanged. If undefined, the stack size is system
1601 @item @emph{Reference}:
1602 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
1608 @section @env{OMP_SCHEDULE} -- How threads are scheduled
1609 @cindex Environment Variable
1610 @cindex Implementation specific setting
1612 @item @emph{Description}:
1613 Allows to specify @code{schedule type} and @code{chunk size}.
1614 The value of the variable shall have the form: @code{type[,chunk]} where
1615 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
1616 The optional @code{chunk} size shall be a positive integer. If undefined,
1617 dynamic scheduling and a chunk size of 1 is used.
1619 @item @emph{See also}:
1620 @ref{omp_set_schedule}
1622 @item @emph{Reference}:
1623 @uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
1628 @node OMP_THREAD_LIMIT
1629 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
1630 @cindex Environment Variable
1632 @item @emph{Description}:
1633 Specifies the number of threads to use for the whole program. The
1634 value of this variable shall be a positive integer. If undefined,
1635 the number of threads is not limited.
1637 @item @emph{See also}:
1638 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
1640 @item @emph{Reference}:
1641 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
1646 @node OMP_WAIT_POLICY
1647 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
1648 @cindex Environment Variable
1650 @item @emph{Description}:
1651 Specifies whether waiting threads should be active or passive. If
1652 the value is @code{PASSIVE}, waiting threads should not consume CPU
1653 power while waiting; while the value is @code{ACTIVE} specifies that
1654 they should. If undefined, threads wait actively for a short time
1655 before waiting passively.
1657 @item @emph{See also}:
1658 @ref{GOMP_SPINCOUNT}
1660 @item @emph{Reference}:
1661 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
1666 @node GOMP_CPU_AFFINITY
1667 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
1668 @cindex Environment Variable
1670 @item @emph{Description}:
1671 Binds threads to specific CPUs. The variable should contain a space-separated
1672 or comma-separated list of CPUs. This list may contain different kinds of
1673 entries: either single CPU numbers in any order, a range of CPUs (M-N)
1674 or a range with some stride (M-N:S). CPU numbers are zero based. For example,
1675 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
1676 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
1677 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
1678 and 14 respectively and then start assigning back from the beginning of
1679 the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
1681 There is no libgomp library routine to determine whether a CPU affinity
1682 specification is in effect. As a workaround, language-specific library
1683 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
1684 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
1685 environment variable. A defined CPU affinity on startup cannot be changed
1686 or disabled during the runtime of the application.
1688 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
1689 @env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
1690 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
1691 @code{FALSE}, the host system will handle the assignment of threads to CPUs.
1693 @item @emph{See also}:
1694 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
1700 @section @env{GOMP_DEBUG} -- Enable debugging output
1701 @cindex Environment Variable
1703 @item @emph{Description}:
1704 Enable debugging output. The variable should be set to @code{0}
1705 (disabled, also the default if not set), or @code{1} (enabled).
1707 If enabled, some debugging output will be printed during execution.
1708 This is currently not specified in more detail, and subject to change.
1713 @node GOMP_STACKSIZE
1714 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
1715 @cindex Environment Variable
1716 @cindex Implementation specific setting
1718 @item @emph{Description}:
1719 Set the default thread stack size in kilobytes. This is different from
1720 @code{pthread_attr_setstacksize} which gets the number of bytes as an
1721 argument. If the stack size cannot be set due to system constraints, an
1722 error is reported and the initial stack size is left unchanged. If undefined,
1723 the stack size is system dependent.
1725 @item @emph{See also}:
1728 @item @emph{Reference}:
1729 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
1730 GCC Patches Mailinglist},
1731 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
1732 GCC Patches Mailinglist}
1737 @node GOMP_SPINCOUNT
1738 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
1739 @cindex Environment Variable
1740 @cindex Implementation specific setting
1742 @item @emph{Description}:
1743 Determines how long a threads waits actively with consuming CPU power
1744 before waiting passively without consuming CPU power. The value may be
1745 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
1746 integer which gives the number of spins of the busy-wait loop. The
1747 integer may optionally be followed by the following suffixes acting
1748 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
1749 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
1750 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
1751 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
1752 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
1753 If there are more OpenMP threads than available CPUs, 1000 and 100
1754 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
1755 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
1756 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
1758 @item @emph{See also}:
1759 @ref{OMP_WAIT_POLICY}
1764 @node GOMP_RTEMS_THREAD_POOLS
1765 @section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
1766 @cindex Environment Variable
1767 @cindex Implementation specific setting
1769 @item @emph{Description}:
1770 This environment variable is only used on the RTEMS real-time operating system.
1771 It determines the scheduler instance specific thread pools. The format for
1772 @env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
1773 @code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
1774 separated by @code{:} where:
1776 @item @code{<thread-pool-count>} is the thread pool count for this scheduler
1778 @item @code{$<priority>} is an optional priority for the worker threads of a
1779 thread pool according to @code{pthread_setschedparam}. In case a priority
1780 value is omitted, then a worker thread will inherit the priority of the OpenMP
1781 master thread that created it. The priority of the worker thread is not
1782 changed after creation, even if a new OpenMP master thread using the worker has
1783 a different priority.
1784 @item @code{@@<scheduler-name>} is the scheduler instance name according to the
1785 RTEMS application configuration.
1787 In case no thread pool configuration is specified for a scheduler instance,
1788 then each OpenMP master thread of this scheduler instance will use its own
1789 dynamically allocated thread pool. To limit the worker thread count of the
1790 thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
1791 @item @emph{Example}:
1792 Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
1793 @code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
1794 @code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
1795 scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
1796 one thread pool available. Since no priority is specified for this scheduler
1797 instance, the worker thread inherits the priority of the OpenMP master thread
1798 that created it. In the scheduler instance @code{WRK1} there are three thread
1799 pools available and their worker threads run at priority four.
1804 @c ---------------------------------------------------------------------
1806 @c ---------------------------------------------------------------------
1808 @node Enabling OpenACC
1809 @chapter Enabling OpenACC
1811 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
1812 flag @option{-fopenacc} must be specified. This enables the OpenACC directive
1813 @code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
1814 @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
1815 @code{!$} conditional compilation sentinels in free form and @code{c$},
1816 @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
1817 arranges for automatic linking of the OpenACC runtime library
1818 (@ref{OpenACC Runtime Library Routines}).
1820 A complete description of all OpenACC directives accepted may be found in
1821 the @uref{https://www.openacc.org, OpenACC} Application Programming
1822 Interface manual, version 2.0.
1824 Note that this is an experimental feature and subject to
1825 change in future versions of GCC. See
1826 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
1830 @c ---------------------------------------------------------------------
1831 @c OpenACC Runtime Library Routines
1832 @c ---------------------------------------------------------------------
1834 @node OpenACC Runtime Library Routines
1835 @chapter OpenACC Runtime Library Routines
1837 The runtime routines described here are defined by section 3 of the OpenACC
1838 specifications in version 2.0.
1839 They have C linkage, and do not throw exceptions.
1840 Generally, they are available only for the host, with the exception of
1841 @code{acc_on_device}, which is available for both the host and the
1842 acceleration device.
1845 * acc_get_num_devices:: Get number of devices for the given device
1847 * acc_set_device_type:: Set type of device accelerator to use.
1848 * acc_get_device_type:: Get type of device accelerator to be used.
1849 * acc_set_device_num:: Set device number to use.
1850 * acc_get_device_num:: Get device number to be used.
1851 * acc_async_test:: Tests for completion of a specific asynchronous
1853 * acc_async_test_all:: Tests for completion of all asychronous
1855 * acc_wait:: Wait for completion of a specific asynchronous
1857 * acc_wait_all:: Waits for completion of all asyncrhonous
1859 * acc_wait_all_async:: Wait for completion of all asynchronous
1861 * acc_wait_async:: Wait for completion of asynchronous operations.
1862 * acc_init:: Initialize runtime for a specific device type.
1863 * acc_shutdown:: Shuts down the runtime for a specific device
1865 * acc_on_device:: Whether executing on a particular device
1866 * acc_malloc:: Allocate device memory.
1867 * acc_free:: Free device memory.
1868 * acc_copyin:: Allocate device memory and copy host memory to
1870 * acc_present_or_copyin:: If the data is not present on the device,
1871 allocate device memory and copy from host
1873 * acc_create:: Allocate device memory and map it to host
1875 * acc_present_or_create:: If the data is not present on the device,
1876 allocate device memory and map it to host
1878 * acc_copyout:: Copy device memory to host memory.
1879 * acc_delete:: Free device memory.
1880 * acc_update_device:: Update device memory from mapped host memory.
1881 * acc_update_self:: Update host memory from mapped device memory.
1882 * acc_map_data:: Map previously allocated device memory to host
1884 * acc_unmap_data:: Unmap device memory from host memory.
1885 * acc_deviceptr:: Get device pointer associated with specific
1887 * acc_hostptr:: Get host pointer associated with specific
1889 * acc_is_present:: Indiciate whether host variable / array is
1891 * acc_memcpy_to_device:: Copy host memory to device memory.
1892 * acc_memcpy_from_device:: Copy device memory to host memory.
1894 API routines for target platforms.
1896 * acc_get_current_cuda_device:: Get CUDA device handle.
1897 * acc_get_current_cuda_context::Get CUDA context handle.
1898 * acc_get_cuda_stream:: Get CUDA stream handle.
1899 * acc_set_cuda_stream:: Set CUDA stream handle.
1904 @node acc_get_num_devices
1905 @section @code{acc_get_num_devices} -- Get number of devices for given device type
1907 @item @emph{Description}
1908 This function returns a value indicating the number of devices available
1909 for the device type specified in @var{devicetype}.
1912 @multitable @columnfractions .20 .80
1913 @item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
1916 @item @emph{Fortran}:
1917 @multitable @columnfractions .20 .80
1918 @item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
1919 @item @tab @code{integer(kind=acc_device_kind) devicetype}
1922 @item @emph{Reference}:
1923 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
1929 @node acc_set_device_type
1930 @section @code{acc_set_device_type} -- Set type of device accelerator to use.
1932 @item @emph{Description}
1933 This function indicates to the runtime library which device typr, specified
1934 in @var{devicetype}, to use when executing a parallel or kernels region.
1937 @multitable @columnfractions .20 .80
1938 @item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
1941 @item @emph{Fortran}:
1942 @multitable @columnfractions .20 .80
1943 @item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
1944 @item @tab @code{integer(kind=acc_device_kind) devicetype}
1947 @item @emph{Reference}:
1948 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
1954 @node acc_get_device_type
1955 @section @code{acc_get_device_type} -- Get type of device accelerator to be used.
1957 @item @emph{Description}
1958 This function returns what device type will be used when executing a
1959 parallel or kernels region.
1962 @multitable @columnfractions .20 .80
1963 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
1966 @item @emph{Fortran}:
1967 @multitable @columnfractions .20 .80
1968 @item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
1969 @item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
1972 @item @emph{Reference}:
1973 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
1979 @node acc_set_device_num
1980 @section @code{acc_set_device_num} -- Set device number to use.
1982 @item @emph{Description}
1983 This function will indicate to the runtime which device number,
1984 specified by @var{num}, associated with the specifed device
1985 type @var{devicetype}.
1988 @multitable @columnfractions .20 .80
1989 @item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
1992 @item @emph{Fortran}:
1993 @multitable @columnfractions .20 .80
1994 @item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
1995 @item @tab @code{integer devicenum}
1996 @item @tab @code{integer(kind=acc_device_kind) devicetype}
1999 @item @emph{Reference}:
2000 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2006 @node acc_get_device_num
2007 @section @code{acc_get_device_num} -- Get device number to be used.
2009 @item @emph{Description}
2010 This function returns which device number associated with the specified device
2011 type @var{devicetype}, will be used when executing a parallel or kernels
2015 @multitable @columnfractions .20 .80
2016 @item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
2019 @item @emph{Fortran}:
2020 @multitable @columnfractions .20 .80
2021 @item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
2022 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2023 @item @tab @code{integer acc_get_device_num}
2026 @item @emph{Reference}:
2027 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2033 @node acc_async_test
2034 @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
2036 @item @emph{Description}
2037 This function tests for completion of the asynchrounous operation specified
2038 in @var{arg}. In C/C++, a non-zero value will be returned to indicate
2039 the specified asynchronous operation has completed. While Fortran will return
2040 a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
2041 a zero and Fortran returns a @code{false}.
2044 @multitable @columnfractions .20 .80
2045 @item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
2048 @item @emph{Fortran}:
2049 @multitable @columnfractions .20 .80
2050 @item @emph{Interface}: @tab @code{function acc_async_test(arg)}
2051 @item @tab @code{integer(kind=acc_handle_kind) arg}
2052 @item @tab @code{logical acc_async_test}
2055 @item @emph{Reference}:
2056 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2062 @node acc_async_test_all
2063 @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
2065 @item @emph{Description}
2066 This function tests for completion of all asynchrounous operations.
2067 In C/C++, a non-zero value will be returned to indicate all asynchronous
2068 operations have completed. While Fortran will return a @code{true}. If
2069 any asynchronous operation has not completed, C/C++ returns a zero and
2070 Fortran returns a @code{false}.
2073 @multitable @columnfractions .20 .80
2074 @item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
2077 @item @emph{Fortran}:
2078 @multitable @columnfractions .20 .80
2079 @item @emph{Interface}: @tab @code{function acc_async_test()}
2080 @item @tab @code{logical acc_get_device_num}
2083 @item @emph{Reference}:
2084 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2091 @section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
2093 @item @emph{Description}
2094 This function waits for completion of the asynchronous operation
2095 specified in @var{arg}.
2098 @multitable @columnfractions .20 .80
2099 @item @emph{Prototype}: @tab @code{acc_wait(arg);}
2100 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
2103 @item @emph{Fortran}:
2104 @multitable @columnfractions .20 .80
2105 @item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
2106 @item @tab @code{integer(acc_handle_kind) arg}
2107 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
2108 @item @tab @code{integer(acc_handle_kind) arg}
2111 @item @emph{Reference}:
2112 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2119 @section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
2121 @item @emph{Description}
2122 This function waits for the completion of all asynchronous operations.
2125 @multitable @columnfractions .20 .80
2126 @item @emph{Prototype}: @tab @code{acc_wait_all(void);}
2127 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
2130 @item @emph{Fortran}:
2131 @multitable @columnfractions .20 .80
2132 @item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
2133 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
2136 @item @emph{Reference}:
2137 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2143 @node acc_wait_all_async
2144 @section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
2146 @item @emph{Description}
2147 This function enqueues a wait operation on the queue @var{async} for any
2148 and all asynchronous operations that have been previously enqueued on
2152 @multitable @columnfractions .20 .80
2153 @item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
2156 @item @emph{Fortran}:
2157 @multitable @columnfractions .20 .80
2158 @item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
2159 @item @tab @code{integer(acc_handle_kind) async}
2162 @item @emph{Reference}:
2163 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2169 @node acc_wait_async
2170 @section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
2172 @item @emph{Description}
2173 This function enqueues a wait operation on queue @var{async} for any and all
2174 asynchronous operations enqueued on queue @var{arg}.
2177 @multitable @columnfractions .20 .80
2178 @item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
2181 @item @emph{Fortran}:
2182 @multitable @columnfractions .20 .80
2183 @item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
2184 @item @tab @code{integer(acc_handle_kind) arg, async}
2187 @item @emph{Reference}:
2188 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2195 @section @code{acc_init} -- Initialize runtime for a specific device type.
2197 @item @emph{Description}
2198 This function initializes the runtime for the device type specified in
2202 @multitable @columnfractions .20 .80
2203 @item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
2206 @item @emph{Fortran}:
2207 @multitable @columnfractions .20 .80
2208 @item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
2209 @item @tab @code{integer(acc_device_kind) devicetype}
2212 @item @emph{Reference}:
2213 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2220 @section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
2222 @item @emph{Description}
2223 This function shuts down the runtime for the device type specified in
2227 @multitable @columnfractions .20 .80
2228 @item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
2231 @item @emph{Fortran}:
2232 @multitable @columnfractions .20 .80
2233 @item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
2234 @item @tab @code{integer(acc_device_kind) devicetype}
2237 @item @emph{Reference}:
2238 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2245 @section @code{acc_on_device} -- Whether executing on a particular device
2247 @item @emph{Description}:
2248 This function returns whether the program is executing on a particular
2249 device specified in @var{devicetype}. In C/C++ a non-zero value is
2250 returned to indicate the device is execiting on the specified device type.
2251 In Fortran, @code{true} will be returned. If the program is not executing
2252 on the specified device type C/C++ will return a zero, while Fortran will
2253 return @code{false}.
2256 @multitable @columnfractions .20 .80
2257 @item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
2260 @item @emph{Fortran}:
2261 @multitable @columnfractions .20 .80
2262 @item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
2263 @item @tab @code{integer(acc_device_kind) devicetype}
2264 @item @tab @code{logical acc_on_device}
2268 @item @emph{Reference}:
2269 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2276 @section @code{acc_malloc} -- Allocate device memory.
2278 @item @emph{Description}
2279 This function allocates @var{len} bytes of device memory. It returns
2280 the device address of the allocated memory.
2283 @multitable @columnfractions .20 .80
2284 @item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
2287 @item @emph{Reference}:
2288 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2295 @section @code{acc_free} -- Free device memory.
2297 @item @emph{Description}
2298 Free previously allocated device memory at the device address @code{a}.
2301 @multitable @columnfractions .20 .80
2302 @item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
2305 @item @emph{Reference}:
2306 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2313 @section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
2315 @item @emph{Description}
2316 In C/C++, this function allocates @var{len} bytes of device memory
2317 and maps it to the specified host address in @var{a}. The device
2318 address of the newly allocated device memory is returned.
2320 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2321 a contiguous array section. The second form @var{a} specifies a
2322 variable or array element and @var{len} specifies the length in bytes.
2325 @multitable @columnfractions .20 .80
2326 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
2329 @item @emph{Fortran}:
2330 @multitable @columnfractions .20 .80
2331 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
2332 @item @tab @code{type, dimension(:[,:]...) :: a}
2333 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
2334 @item @tab @code{type, dimension(:[,:]...) :: a}
2335 @item @tab @code{integer len}
2338 @item @emph{Reference}:
2339 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2345 @node acc_present_or_copyin
2346 @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
2348 @item @emph{Description}
2349 This function tests if the host data specifed by @var{a} and of length
2350 @var{len} is present or not. If it is not present, then device memory
2351 will be allocated and the host memory copied. The device address of
2352 the newly allocated device memory is returned.
2354 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2355 a contiguous array section. The second form @var{a} specifies a variable or
2356 array element and @var{len} specifies the length in bytes.
2359 @multitable @columnfractions .20 .80
2360 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
2361 @item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
2364 @item @emph{Fortran}:
2365 @multitable @columnfractions .20 .80
2366 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
2367 @item @tab @code{type, dimension(:[,:]...) :: a}
2368 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
2369 @item @tab @code{type, dimension(:[,:]...) :: a}
2370 @item @tab @code{integer len}
2371 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
2372 @item @tab @code{type, dimension(:[,:]...) :: a}
2373 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
2374 @item @tab @code{type, dimension(:[,:]...) :: a}
2375 @item @tab @code{integer len}
2378 @item @emph{Reference}:
2379 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2386 @section @code{acc_create} -- Allocate device memory and map it to host memory.
2388 @item @emph{Description}
2389 This function allocates device memory and maps it to host memory specified
2390 by the host address @var{a} with a length of @var{len} bytes. In C/C++,
2391 the function returns the device address of the allocated device memory.
2393 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2394 a contiguous array section. The second form @var{a} specifies a variable or
2395 array element and @var{len} specifies the length in bytes.
2398 @multitable @columnfractions .20 .80
2399 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
2402 @item @emph{Fortran}:
2403 @multitable @columnfractions .20 .80
2404 @item @emph{Interface}: @tab @code{subroutine acc_create(a)}
2405 @item @tab @code{type, dimension(:[,:]...) :: a}
2406 @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
2407 @item @tab @code{type, dimension(:[,:]...) :: a}
2408 @item @tab @code{integer len}
2411 @item @emph{Reference}:
2412 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2418 @node acc_present_or_create
2419 @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
2421 @item @emph{Description}
2422 This function tests if the host data specifed by @var{a} and of length
2423 @var{len} is present or not. If it is not present, then device memory
2424 will be allocated and mapped to host memory. In C/C++, the device address
2425 of the newly allocated device memory is returned.
2427 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2428 a contiguous array section. The second form @var{a} specifies a variable or
2429 array element and @var{len} specifies the length in bytes.
2433 @multitable @columnfractions .20 .80
2434 @item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
2435 @item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
2438 @item @emph{Fortran}:
2439 @multitable @columnfractions .20 .80
2440 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
2441 @item @tab @code{type, dimension(:[,:]...) :: a}
2442 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
2443 @item @tab @code{type, dimension(:[,:]...) :: a}
2444 @item @tab @code{integer len}
2445 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
2446 @item @tab @code{type, dimension(:[,:]...) :: a}
2447 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
2448 @item @tab @code{type, dimension(:[,:]...) :: a}
2449 @item @tab @code{integer len}
2452 @item @emph{Reference}:
2453 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2460 @section @code{acc_copyout} -- Copy device memory to host memory.
2462 @item @emph{Description}
2463 This function copies mapped device memory to host memory which is specified
2464 by host address @var{a} for a length @var{len} bytes in C/C++.
2466 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2467 a contiguous array section. The second form @var{a} specifies a variable or
2468 array element and @var{len} specifies the length in bytes.
2471 @multitable @columnfractions .20 .80
2472 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
2475 @item @emph{Fortran}:
2476 @multitable @columnfractions .20 .80
2477 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
2478 @item @tab @code{type, dimension(:[,:]...) :: a}
2479 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
2480 @item @tab @code{type, dimension(:[,:]...) :: a}
2481 @item @tab @code{integer len}
2484 @item @emph{Reference}:
2485 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2492 @section @code{acc_delete} -- Free device memory.
2494 @item @emph{Description}
2495 This function frees previously allocated device memory specified by
2496 the device address @var{a} and the length of @var{len} bytes.
2498 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2499 a contiguous array section. The second form @var{a} specifies a variable or
2500 array element and @var{len} specifies the length in bytes.
2503 @multitable @columnfractions .20 .80
2504 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
2507 @item @emph{Fortran}:
2508 @multitable @columnfractions .20 .80
2509 @item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
2510 @item @tab @code{type, dimension(:[,:]...) :: a}
2511 @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
2512 @item @tab @code{type, dimension(:[,:]...) :: a}
2513 @item @tab @code{integer len}
2516 @item @emph{Reference}:
2517 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2523 @node acc_update_device
2524 @section @code{acc_update_device} -- Update device memory from mapped host memory.
2526 @item @emph{Description}
2527 This function updates the device copy from the previously mapped host memory.
2528 The host memory is specified with the host address @var{a} and a length of
2531 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2532 a contiguous array section. The second form @var{a} specifies a variable or
2533 array element and @var{len} specifies the length in bytes.
2536 @multitable @columnfractions .20 .80
2537 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
2540 @item @emph{Fortran}:
2541 @multitable @columnfractions .20 .80
2542 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
2543 @item @tab @code{type, dimension(:[,:]...) :: a}
2544 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
2545 @item @tab @code{type, dimension(:[,:]...) :: a}
2546 @item @tab @code{integer len}
2549 @item @emph{Reference}:
2550 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2556 @node acc_update_self
2557 @section @code{acc_update_self} -- Update host memory from mapped device memory.
2559 @item @emph{Description}
2560 This function updates the host copy from the previously mapped device memory.
2561 The host memory is specified with the host address @var{a} and a length of
2564 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2565 a contiguous array section. The second form @var{a} specifies a variable or
2566 array element and @var{len} specifies the length in bytes.
2569 @multitable @columnfractions .20 .80
2570 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
2573 @item @emph{Fortran}:
2574 @multitable @columnfractions .20 .80
2575 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
2576 @item @tab @code{type, dimension(:[,:]...) :: a}
2577 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
2578 @item @tab @code{type, dimension(:[,:]...) :: a}
2579 @item @tab @code{integer len}
2582 @item @emph{Reference}:
2583 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2590 @section @code{acc_map_data} -- Map previously allocated device memory to host memory.
2592 @item @emph{Description}
2593 This function maps previously allocated device and host memory. The device
2594 memory is specified with the device address @var{d}. The host memory is
2595 specified with the host address @var{h} and a length of @var{len}.
2598 @multitable @columnfractions .20 .80
2599 @item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
2602 @item @emph{Reference}:
2603 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2609 @node acc_unmap_data
2610 @section @code{acc_unmap_data} -- Unmap device memory from host memory.
2612 @item @emph{Description}
2613 This function unmaps previously mapped device and host memory. The latter
2614 specified by @var{h}.
2617 @multitable @columnfractions .20 .80
2618 @item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
2621 @item @emph{Reference}:
2622 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2629 @section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
2631 @item @emph{Description}
2632 This function returns the device address that has been mapped to the
2633 host address specified by @var{h}.
2636 @multitable @columnfractions .20 .80
2637 @item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
2640 @item @emph{Reference}:
2641 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2648 @section @code{acc_hostptr} -- Get host pointer associated with specific device address.
2650 @item @emph{Description}
2651 This function returns the host address that has been mapped to the
2652 device address specified by @var{d}.
2655 @multitable @columnfractions .20 .80
2656 @item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
2659 @item @emph{Reference}:
2660 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2666 @node acc_is_present
2667 @section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
2669 @item @emph{Description}
2670 This function indicates whether the specified host address in @var{a} and a
2671 length of @var{len} bytes is present on the device. In C/C++, a non-zero
2672 value is returned to indicate the presence of the mapped memory on the
2673 device. A zero is returned to indicate the memory is not mapped on the
2676 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2677 a contiguous array section. The second form @var{a} specifies a variable or
2678 array element and @var{len} specifies the length in bytes. If the host
2679 memory is mapped to device memory, then a @code{true} is returned. Otherwise,
2680 a @code{false} is return to indicate the mapped memory is not present.
2683 @multitable @columnfractions .20 .80
2684 @item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
2687 @item @emph{Fortran}:
2688 @multitable @columnfractions .20 .80
2689 @item @emph{Interface}: @tab @code{function acc_is_present(a)}
2690 @item @tab @code{type, dimension(:[,:]...) :: a}
2691 @item @tab @code{logical acc_is_present}
2692 @item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
2693 @item @tab @code{type, dimension(:[,:]...) :: a}
2694 @item @tab @code{integer len}
2695 @item @tab @code{logical acc_is_present}
2698 @item @emph{Reference}:
2699 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2705 @node acc_memcpy_to_device
2706 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
2708 @item @emph{Description}
2709 This function copies host memory specified by host address of @var{src} to
2710 device memory specified by the device address @var{dest} for a length of
2714 @multitable @columnfractions .20 .80
2715 @item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
2718 @item @emph{Reference}:
2719 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2725 @node acc_memcpy_from_device
2726 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
2728 @item @emph{Description}
2729 This function copies host memory specified by host address of @var{src} from
2730 device memory specified by the device address @var{dest} for a length of
2734 @multitable @columnfractions .20 .80
2735 @item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
2738 @item @emph{Reference}:
2739 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2745 @node acc_get_current_cuda_device
2746 @section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
2748 @item @emph{Description}
2749 This function returns the CUDA device handle. This handle is the same
2750 as used by the CUDA Runtime or Driver API's.
2753 @multitable @columnfractions .20 .80
2754 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
2757 @item @emph{Reference}:
2758 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2764 @node acc_get_current_cuda_context
2765 @section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
2767 @item @emph{Description}
2768 This function returns the CUDA context handle. This handle is the same
2769 as used by the CUDA Runtime or Driver API's.
2772 @multitable @columnfractions .20 .80
2773 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
2776 @item @emph{Reference}:
2777 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2783 @node acc_get_cuda_stream
2784 @section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
2786 @item @emph{Description}
2787 This function returns the CUDA stream handle for the queue @var{async}.
2788 This handle is the same as used by the CUDA Runtime or Driver API's.
2791 @multitable @columnfractions .20 .80
2792 @item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
2795 @item @emph{Reference}:
2796 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2802 @node acc_set_cuda_stream
2803 @section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
2805 @item @emph{Description}
2806 This function associates the stream handle specified by @var{stream} with
2807 the queue @var{async}.
2809 This cannot be used to change the stream handle associated with
2810 @code{acc_async_sync}.
2812 The return value is not specified.
2815 @multitable @columnfractions .20 .80
2816 @item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
2819 @item @emph{Reference}:
2820 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2826 @c ---------------------------------------------------------------------
2827 @c OpenACC Environment Variables
2828 @c ---------------------------------------------------------------------
2830 @node OpenACC Environment Variables
2831 @chapter OpenACC Environment Variables
2833 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
2834 are defined by section 4 of the OpenACC specification in version 2.0.
2835 The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
2845 @node ACC_DEVICE_TYPE
2846 @section @code{ACC_DEVICE_TYPE}
2848 @item @emph{Reference}:
2849 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2855 @node ACC_DEVICE_NUM
2856 @section @code{ACC_DEVICE_NUM}
2858 @item @emph{Reference}:
2859 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2865 @node GCC_ACC_NOTIFY
2866 @section @code{GCC_ACC_NOTIFY}
2868 @item @emph{Description}:
2869 Print debug information pertaining to the accelerator.
2874 @c ---------------------------------------------------------------------
2875 @c CUDA Streams Usage
2876 @c ---------------------------------------------------------------------
2878 @node CUDA Streams Usage
2879 @chapter CUDA Streams Usage
2881 This applies to the @code{nvptx} plugin only.
2883 The library provides elements that perform asynchronous movement of
2884 data and asynchronous operation of computing constructs. This
2885 asynchronous functionality is implemented by making use of CUDA
2886 streams@footnote{See "Stream Management" in "CUDA Driver API",
2887 TRM-06703-001, Version 5.5, for additional information}.
2889 The primary means by that the asychronous functionality is accessed
2890 is through the use of those OpenACC directives which make use of the
2891 @code{async} and @code{wait} clauses. When the @code{async} clause is
2892 first used with a directive, it creates a CUDA stream. If an
2893 @code{async-argument} is used with the @code{async} clause, then the
2894 stream is associated with the specified @code{async-argument}.
2896 Following the creation of an association between a CUDA stream and the
2897 @code{async-argument} of an @code{async} clause, both the @code{wait}
2898 clause and the @code{wait} directive can be used. When either the
2899 clause or directive is used after stream creation, it creates a
2900 rendezvous point whereby execution waits until all operations
2901 associated with the @code{async-argument}, that is, stream, have
2904 Normally, the management of the streams that are created as a result of
2905 using the @code{async} clause, is done without any intervention by the
2906 caller. This implies the association between the @code{async-argument}
2907 and the CUDA stream will be maintained for the lifetime of the program.
2908 However, this association can be changed through the use of the library
2909 function @code{acc_set_cuda_stream}. When the function
2910 @code{acc_set_cuda_stream} is called, the CUDA stream that was
2911 originally associated with the @code{async} clause will be destroyed.
2912 Caution should be taken when changing the association as subsequent
2913 references to the @code{async-argument} refer to a different
2918 @c ---------------------------------------------------------------------
2919 @c OpenACC Library Interoperability
2920 @c ---------------------------------------------------------------------
2922 @node OpenACC Library Interoperability
2923 @chapter OpenACC Library Interoperability
2925 @section Introduction
2927 The OpenACC library uses the CUDA Driver API, and may interact with
2928 programs that use the Runtime library directly, or another library
2929 based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
2930 "Interactions with the CUDA Driver API" in
2931 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
2932 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
2933 for additional information on library interoperability.}.
2934 This chapter describes the use cases and what changes are
2935 required in order to use both the OpenACC library and the CUBLAS and Runtime
2936 libraries within a program.
2938 @section First invocation: NVIDIA CUBLAS library API
2940 In this first use case (see below), a function in the CUBLAS library is called
2941 prior to any of the functions in the OpenACC library. More specifically, the
2942 function @code{cublasCreate()}.
2944 When invoked, the function initializes the library and allocates the
2945 hardware resources on the host and the device on behalf of the caller. Once
2946 the initialization and allocation has completed, a handle is returned to the
2947 caller. The OpenACC library also requires initialization and allocation of
2948 hardware resources. Since the CUBLAS library has already allocated the
2949 hardware resources for the device, all that is left to do is to initialize
2950 the OpenACC library and acquire the hardware resources on the host.
2952 Prior to calling the OpenACC function that initializes the library and
2953 allocate the host hardware resources, you need to acquire the device number
2954 that was allocated during the call to @code{cublasCreate()}. The invoking of the
2955 runtime library function @code{cudaGetDevice()} accomplishes this. Once
2956 acquired, the device number is passed along with the device type as
2957 parameters to the OpenACC library function @code{acc_set_device_num()}.
2959 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
2960 library uses the context that was created during the call to
2961 @code{cublasCreate()}. In other words, both libraries will be sharing the
2965 /* Create the handle */
2966 s = cublasCreate(&h);
2967 if (s != CUBLAS_STATUS_SUCCESS)
2969 fprintf(stderr, "cublasCreate failed %d\n", s);
2973 /* Get the device number */
2974 e = cudaGetDevice(&dev);
2975 if (e != cudaSuccess)
2977 fprintf(stderr, "cudaGetDevice failed %d\n", e);
2981 /* Initialize OpenACC library and use device 'dev' */
2982 acc_set_device_num(dev, acc_device_nvidia);
2987 @section First invocation: OpenACC library API
2989 In this second use case (see below), a function in the OpenACC library is
2990 called prior to any of the functions in the CUBLAS library. More specificially,
2991 the function @code{acc_set_device_num()}.
2993 In the use case presented here, the function @code{acc_set_device_num()}
2994 is used to both initialize the OpenACC library and allocate the hardware
2995 resources on the host and the device. In the call to the function, the
2996 call parameters specify which device to use and what device
2997 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
2998 is but one method to initialize the OpenACC library and allocate the
2999 appropriate hardware resources. Other methods are available through the
3000 use of environment variables and these will be discussed in the next section.
3002 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
3003 functions can be called as seen with multiple calls being made to
3004 @code{acc_copyin()}. In addition, calls can be made to functions in the
3005 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
3006 subsequent to the calls to @code{acc_copyin()}.
3007 As seen in the previous use case, a call to @code{cublasCreate()}
3008 initializes the CUBLAS library and allocates the hardware resources on the
3009 host and the device. However, since the device has already been allocated,
3010 @code{cublasCreate()} will only initialize the CUBLAS library and allocate
3011 the appropriate hardware resources on the host. The context that was created
3012 as part of the OpenACC initialization is shared with the CUBLAS library,
3013 similarly to the first use case.
3018 acc_set_device_num(dev, acc_device_nvidia);
3020 /* Copy the first set to the device */
3021 d_X = acc_copyin(&h_X[0], N * sizeof (float));
3024 fprintf(stderr, "copyin error h_X\n");
3028 /* Copy the second set to the device */
3029 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
3032 fprintf(stderr, "copyin error h_Y1\n");
3036 /* Create the handle */
3037 s = cublasCreate(&h);
3038 if (s != CUBLAS_STATUS_SUCCESS)
3040 fprintf(stderr, "cublasCreate failed %d\n", s);
3044 /* Perform saxpy using CUBLAS library function */
3045 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
3046 if (s != CUBLAS_STATUS_SUCCESS)
3048 fprintf(stderr, "cublasSaxpy failed %d\n", s);
3052 /* Copy the results from the device */
3053 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
3058 @section OpenACC library and environment variables
3060 There are two environment variables associated with the OpenACC library
3061 that may be used to control the device type and device number:
3062 @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
3063 environement variables can be used as an alternative to calling
3064 @code{acc_set_device_num()}. As seen in the second use case, the device
3065 type and device number were specified using @code{acc_set_device_num()}.
3066 If however, the aforementioned environment variables were set, then the
3067 call to @code{acc_set_device_num()} would not be required.
3070 The use of the environment variables is only relevant when an OpenACC function
3071 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
3072 is called prior to a call to an OpenACC function, then you must call
3073 @code{acc_set_device_num()}@footnote{More complete information
3074 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
3075 sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
3076 Application Programming Interface”, Version 2.0.}
3080 @c ---------------------------------------------------------------------
3082 @c ---------------------------------------------------------------------
3084 @node The libgomp ABI
3085 @chapter The libgomp ABI
3087 The following sections present notes on the external ABI as
3088 presented by libgomp. Only maintainers should need them.
3091 * Implementing MASTER construct::
3092 * Implementing CRITICAL construct::
3093 * Implementing ATOMIC construct::
3094 * Implementing FLUSH construct::
3095 * Implementing BARRIER construct::
3096 * Implementing THREADPRIVATE construct::
3097 * Implementing PRIVATE clause::
3098 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
3099 * Implementing REDUCTION clause::
3100 * Implementing PARALLEL construct::
3101 * Implementing FOR construct::
3102 * Implementing ORDERED construct::
3103 * Implementing SECTIONS construct::
3104 * Implementing SINGLE construct::
3105 * Implementing OpenACC's PARALLEL construct::
3109 @node Implementing MASTER construct
3110 @section Implementing MASTER construct
3113 if (omp_get_thread_num () == 0)
3117 Alternately, we generate two copies of the parallel subfunction
3118 and only include this in the version run by the master thread.
3119 Surely this is not worthwhile though...
3123 @node Implementing CRITICAL construct
3124 @section Implementing CRITICAL construct
3126 Without a specified name,
3129 void GOMP_critical_start (void);
3130 void GOMP_critical_end (void);
3133 so that we don't get COPY relocations from libgomp to the main
3136 With a specified name, use omp_set_lock and omp_unset_lock with
3137 name being transformed into a variable declared like
3140 omp_lock_t gomp_critical_user_<name> __attribute__((common))
3143 Ideally the ABI would specify that all zero is a valid unlocked
3144 state, and so we wouldn't need to initialize this at
3149 @node Implementing ATOMIC construct
3150 @section Implementing ATOMIC construct
3152 The target should implement the @code{__sync} builtins.
3154 Failing that we could add
3157 void GOMP_atomic_enter (void)
3158 void GOMP_atomic_exit (void)
3161 which reuses the regular lock code, but with yet another lock
3162 object private to the library.
3166 @node Implementing FLUSH construct
3167 @section Implementing FLUSH construct
3169 Expands to the @code{__sync_synchronize} builtin.
3173 @node Implementing BARRIER construct
3174 @section Implementing BARRIER construct
3177 void GOMP_barrier (void)
3181 @node Implementing THREADPRIVATE construct
3182 @section Implementing THREADPRIVATE construct
3184 In _most_ cases we can map this directly to @code{__thread}. Except
3185 that OMP allows constructors for C++ objects. We can either
3186 refuse to support this (how often is it used?) or we can
3187 implement something akin to .ctors.
3189 Even more ideally, this ctor feature is handled by extensions
3190 to the main pthreads library. Failing that, we can have a set
3191 of entry points to register ctor functions to be called.
3195 @node Implementing PRIVATE clause
3196 @section Implementing PRIVATE clause
3198 In association with a PARALLEL, or within the lexical extent
3199 of a PARALLEL block, the variable becomes a local variable in
3200 the parallel subfunction.
3202 In association with FOR or SECTIONS blocks, create a new
3203 automatic variable within the current function. This preserves
3204 the semantic of new variable creation.
3208 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3209 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3211 This seems simple enough for PARALLEL blocks. Create a private
3212 struct for communicating between the parent and subfunction.
3213 In the parent, copy in values for scalar and "small" structs;
3214 copy in addresses for others TREE_ADDRESSABLE types. In the
3215 subfunction, copy the value into the local variable.
3217 It is not clear what to do with bare FOR or SECTION blocks.
3218 The only thing I can figure is that we do something like:
3221 #pragma omp for firstprivate(x) lastprivate(y)
3222 for (int i = 0; i < n; ++i)
3239 where the "x=x" and "y=y" assignments actually have different
3240 uids for the two variables, i.e. not something you could write
3241 directly in C. Presumably this only makes sense if the "outer"
3242 x and y are global variables.
3244 COPYPRIVATE would work the same way, except the structure
3245 broadcast would have to happen via SINGLE machinery instead.
3249 @node Implementing REDUCTION clause
3250 @section Implementing REDUCTION clause
3252 The private struct mentioned in the previous section should have
3253 a pointer to an array of the type of the variable, indexed by the
3254 thread's @var{team_id}. The thread stores its final value into the
3255 array, and after the barrier, the master thread iterates over the
3256 array to collect the values.
3259 @node Implementing PARALLEL construct
3260 @section Implementing PARALLEL construct
3263 #pragma omp parallel
3272 void subfunction (void *data)
3279 GOMP_parallel_start (subfunction, &data, num_threads);
3280 subfunction (&data);
3281 GOMP_parallel_end ();
3285 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
3288 The @var{FN} argument is the subfunction to be run in parallel.
3290 The @var{DATA} argument is a pointer to a structure used to
3291 communicate data in and out of the subfunction, as discussed
3292 above with respect to FIRSTPRIVATE et al.
3294 The @var{NUM_THREADS} argument is 1 if an IF clause is present
3295 and false, or the value of the NUM_THREADS clause, if
3298 The function needs to create the appropriate number of
3299 threads and/or launch them from the dock. It needs to
3300 create the team structure and assign team ids.
3303 void GOMP_parallel_end (void)
3306 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
3310 @node Implementing FOR construct
3311 @section Implementing FOR construct
3314 #pragma omp parallel for
3315 for (i = lb; i <= ub; i++)
3322 void subfunction (void *data)
3325 while (GOMP_loop_static_next (&_s0, &_e0))
3328 for (i = _s0; i < _e1; i++)
3331 GOMP_loop_end_nowait ();
3334 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
3336 GOMP_parallel_end ();
3340 #pragma omp for schedule(runtime)
3341 for (i = 0; i < n; i++)
3350 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
3353 for (i = _s0, i < _e0; i++)
3355 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
3360 Note that while it looks like there is trickiness to propagating
3361 a non-constant STEP, there isn't really. We're explicitly allowed
3362 to evaluate it as many times as we want, and any variables involved
3363 should automatically be handled as PRIVATE or SHARED like any other
3364 variables. So the expression should remain evaluable in the
3365 subfunction. We can also pull it into a local variable if we like,
3366 but since its supposed to remain unchanged, we can also not if we like.
3368 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
3369 able to get away with no work-sharing context at all, since we can
3370 simply perform the arithmetic directly in each thread to divide up
3371 the iterations. Which would mean that we wouldn't need to call any
3374 There are separate routines for handling loops with an ORDERED
3375 clause. Bookkeeping for that is non-trivial...
3379 @node Implementing ORDERED construct
3380 @section Implementing ORDERED construct
3383 void GOMP_ordered_start (void)
3384 void GOMP_ordered_end (void)
3389 @node Implementing SECTIONS construct
3390 @section Implementing SECTIONS construct
3395 #pragma omp sections
3409 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
3426 @node Implementing SINGLE construct
3427 @section Implementing SINGLE construct
3441 if (GOMP_single_start ())
3449 #pragma omp single copyprivate(x)
3456 datap = GOMP_single_copy_start ();
3461 GOMP_single_copy_end (&data);
3470 @node Implementing OpenACC's PARALLEL construct
3471 @section Implementing OpenACC's PARALLEL construct
3474 void GOACC_parallel ()
3479 @c ---------------------------------------------------------------------
3481 @c ---------------------------------------------------------------------
3483 @node Reporting Bugs
3484 @chapter Reporting Bugs
3486 Bugs in the GNU Offloading and Multi Processing Runtime Library should
3487 be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
3488 "openacc", or "openmp", or both to the keywords field in the bug
3489 report, as appropriate.
3493 @c ---------------------------------------------------------------------
3494 @c GNU General Public License
3495 @c ---------------------------------------------------------------------
3497 @include gpl_v3.texi
3501 @c ---------------------------------------------------------------------
3502 @c GNU Free Documentation License
3503 @c ---------------------------------------------------------------------
3509 @c ---------------------------------------------------------------------
3510 @c Funding Free Software
3511 @c ---------------------------------------------------------------------
3513 @include funding.texi
3515 @c ---------------------------------------------------------------------
3517 @c ---------------------------------------------------------------------
3520 @unnumbered Library Index