1 \input texinfo @c -*-texinfo-*-
4 @setfilename libgomp.info
10 Copyright @copyright{} 2006-2023 Free Software Foundation, Inc.
12 Permission is granted to copy, distribute and/or modify this document
13 under the terms of the GNU Free Documentation License, Version 1.3 or
14 any later version published by the Free Software Foundation; with the
15 Invariant Sections being ``Funding Free Software'', the Front-Cover
16 texts being (a) (see below), and with the Back-Cover Texts being (b)
17 (see below). A copy of the license is included in the section entitled
18 ``GNU Free Documentation License''.
20 (a) The FSF's Front-Cover Text is:
24 (b) The FSF's Back-Cover Text is:
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
32 @dircategory GNU Libraries
34 * libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
37 This manual documents libgomp, the GNU Offloading and Multi Processing
38 Runtime library. This is the GNU implementation of the OpenMP and
39 OpenACC APIs for parallel and accelerator programming in C/C++ and
42 Published by the Free Software Foundation
43 51 Franklin Street, Fifth Floor
44 Boston, MA 02110-1301 USA
50 @setchapternewpage odd
53 @title GNU Offloading and Multi Processing Runtime Library
54 @subtitle The GNU OpenMP and OpenACC Implementation
56 @vskip 0pt plus 1filll
57 @comment For the @value{version-GCC} Version*
59 Published by the Free Software Foundation @*
60 51 Franklin Street, Fifth Floor@*
61 Boston, MA 02110-1301, USA@*
71 @node Top, Enabling OpenMP
75 This manual documents the usage of libgomp, the GNU Offloading and
76 Multi Processing Runtime Library. This includes the GNU
77 implementation of the @uref{https://www.openmp.org, OpenMP} Application
78 Programming Interface (API) for multi-platform shared-memory parallel
79 programming in C/C++ and Fortran, and the GNU implementation of the
80 @uref{https://www.openacc.org, OpenACC} Application Programming
81 Interface (API) for offloading of code to accelerator devices in C/C++
84 Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
86 4's target construct) has been added later on, and the library's name
87 changed to GNU Offloading and Multi Processing Runtime Library.
92 @comment When you add a new menu item, please keep the right hand
93 @comment aligned to the same column. Do not use tabs. This provides
94 @comment better formatting.
97 * Enabling OpenMP:: How to enable OpenMP for your applications.
98 * OpenMP Implementation Status:: List of implemented features by OpenMP version
99 * OpenMP Runtime Library Routines: Runtime Library Routines.
100 The OpenMP runtime application programming
102 * OpenMP Environment Variables: Environment Variables.
103 Influencing OpenMP runtime behavior with
104 environment variables.
105 * Enabling OpenACC:: How to enable OpenACC for your
107 * OpenACC Runtime Library Routines:: The OpenACC runtime application
108 programming interface.
109 * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
110 environment variables.
111 * CUDA Streams Usage:: Notes on the implementation of
112 asynchronous operations.
113 * OpenACC Library Interoperability:: OpenACC library interoperability with the
114 NVIDIA CUBLAS library.
115 * OpenACC Profiling Interface::
116 * OpenMP-Implementation Specifics:: Notes specifics of this OpenMP
118 * Offload-Target Specifics:: Notes on offload-target specific internals
119 * The libgomp ABI:: Notes on the external ABI presented by libgomp.
120 * Reporting Bugs:: How to report bugs in the GNU Offloading and
121 Multi Processing Runtime Library.
122 * Copying:: GNU general public license says
123 how you can copy and share libgomp.
124 * GNU Free Documentation License::
125 How you can copy and share this manual.
126 * Funding:: How to help assure continued work for free
128 * Library Index:: Index of this documentation.
132 @c ---------------------------------------------------------------------
134 @c ---------------------------------------------------------------------
136 @node Enabling OpenMP
137 @chapter Enabling OpenMP
139 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
140 flag @option{-fopenmp} must be specified. For C and C++, this enables
141 the handling of the OpenMP directives using @code{#pragma omp} and the
142 @code{[[omp::directive(...)]]}, @code{[[omp::sequence(...)]]} and
143 @code{[[omp::decl(...)]]} attributes. For Fortran, it enables for
144 free source form the @code{!$omp} sentinel for directives and the
145 @code{!$} conditional compilation sentinel and for fixed source form the
146 @code{c$omp}, @code{*$omp} and @code{!$omp} sentinels for directives and
147 the @code{c$}, @code{*$} and @code{!$} conditional compilation sentinels.
148 The flag also arranges for automatic linking of the OpenMP runtime library
149 (@ref{Runtime Library Routines}).
151 The @option{-fopenmp-simd} flag can be used to enable a subset of
152 OpenMP directives that do not require the linking of either the
153 OpenMP runtime library or the POSIX threads library.
155 A complete description of all OpenMP directives may be found in the
156 @uref{https://www.openmp.org, OpenMP Application Program Interface} manuals.
157 See also @ref{OpenMP Implementation Status}.
160 @c ---------------------------------------------------------------------
161 @c OpenMP Implementation Status
162 @c ---------------------------------------------------------------------
164 @node OpenMP Implementation Status
165 @chapter OpenMP Implementation Status
168 * OpenMP 4.5:: Feature completion status to 4.5 specification
169 * OpenMP 5.0:: Feature completion status to 5.0 specification
170 * OpenMP 5.1:: Feature completion status to 5.1 specification
171 * OpenMP 5.2:: Feature completion status to 5.2 specification
172 * OpenMP Technical Report 12:: Feature completion status to second 6.0 preview
175 The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
176 parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have
177 the value @code{201511} (i.e. OpenMP 4.5).
182 The OpenMP 4.5 specification is fully supported.
187 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
188 @c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
190 @multitable @columnfractions .60 .10 .25
191 @headitem Description @tab Status @tab Comments
192 @item Array shaping @tab N @tab
193 @item Array sections with non-unit strides in C and C++ @tab N @tab
194 @item Iterators @tab Y @tab
195 @item @code{metadirective} directive @tab N @tab
196 @item @code{declare variant} directive
197 @tab P @tab @emph{simd} traits not handled correctly
198 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
199 env variable @tab Y @tab
200 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
201 @item @code{requires} directive @tab P
202 @tab complete but no non-host device provides @code{unified_shared_memory}
203 @item @code{teams} construct outside an enclosing target region @tab Y @tab
204 @item Non-rectangular loop nests @tab P
205 @tab Full support for C/C++, partial for Fortran
206 (@uref{https://gcc.gnu.org/PR110735,PR110735})
207 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
208 @item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
209 constructs @tab Y @tab
210 @item Collapse of associated loops that are imperfectly nested loops @tab Y @tab
211 @item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
212 @code{simd} construct @tab Y @tab
213 @item @code{atomic} constructs in @code{simd} @tab Y @tab
214 @item @code{loop} construct @tab Y @tab
215 @item @code{order(concurrent)} clause @tab Y @tab
216 @item @code{scan} directive and @code{in_scan} modifier for the
217 @code{reduction} clause @tab Y @tab
218 @item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
219 @item @code{in_reduction} clause on @code{target} constructs @tab P
220 @tab @code{nowait} only stub
221 @item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
222 @item @code{task} modifier to @code{reduction} clause @tab Y @tab
223 @item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
224 @item @code{detach} clause to @code{task} construct @tab Y @tab
225 @item @code{omp_fulfill_event} runtime routine @tab Y @tab
226 @item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
227 and @code{taskloop simd} constructs @tab Y @tab
228 @item @code{taskloop} construct cancelable by @code{cancel} construct
230 @item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
232 @item Predefined memory spaces, memory allocators, allocator traits
233 @tab Y @tab See also @ref{Memory allocation}
234 @item Memory management routines @tab Y @tab
235 @item @code{allocate} directive @tab P @tab Only C and Fortran, only stack variables
236 @item @code{allocate} clause @tab P @tab Initial support
237 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
238 @item @code{ancestor} modifier on @code{device} clause @tab Y @tab
239 @item Implicit declare target directive @tab Y @tab
240 @item Discontiguous array section with @code{target update} construct
242 @item C/C++'s lvalue expressions in @code{to}, @code{from}
243 and @code{map} clauses @tab N @tab
244 @item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab
245 @item Nested @code{declare target} directive @tab Y @tab
246 @item Combined @code{master} constructs @tab Y @tab
247 @item @code{depend} clause on @code{taskwait} @tab Y @tab
248 @item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
250 @item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only
251 @item @code{depobj} construct and depend objects @tab Y @tab
252 @item Lock hints were renamed to synchronization hints @tab Y @tab
253 @item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab
254 @item Map-order clarifications @tab P @tab
255 @item @code{close} @emph{map-type-modifier} @tab Y @tab
256 @item Mapping C/C++ pointer variables and to assign the address of
257 device memory mapped by an array section @tab P @tab
258 @item Mapping of Fortran pointer and allocatable variables, including pointer
259 and allocatable components of variables
260 @tab P @tab Mapping of vars with allocatable components unsupported
261 @item @code{defaultmap} extensions @tab Y @tab
262 @item @code{declare mapper} directive @tab N @tab
263 @item @code{omp_get_supported_active_levels} routine @tab Y @tab
264 @item Runtime routines and environment variables to display runtime thread
265 affinity information @tab Y @tab
266 @item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime
268 @item @code{omp_get_device_num} runtime routine @tab Y @tab
269 @item OMPT interface @tab N @tab
270 @item OMPD interface @tab N @tab
273 @unnumberedsubsec Other new OpenMP 5.0 features
275 @multitable @columnfractions .60 .10 .25
276 @headitem Description @tab Status @tab Comments
277 @item Supporting C++'s range-based for loop @tab Y @tab
284 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
286 @multitable @columnfractions .60 .10 .25
287 @headitem Description @tab Status @tab Comments
288 @item OpenMP directive as C++ attribute specifiers @tab Y @tab
289 @item @code{omp_all_memory} reserved locator @tab Y @tab
290 @item @emph{target_device trait} in OpenMP Context @tab N @tab
291 @item @code{target_device} selector set in context selectors @tab N @tab
292 @item C/C++'s @code{declare variant} directive: elision support of
293 preprocessed code @tab N @tab
294 @item @code{declare variant}: new clauses @code{adjust_args} and
295 @code{append_args} @tab N @tab
296 @item @code{dispatch} construct @tab N @tab
297 @item device-specific ICV settings with environment variables @tab Y @tab
298 @item @code{assume} and @code{assumes} directives @tab Y @tab
299 @item @code{nothing} directive @tab Y @tab
300 @item @code{error} directive @tab Y @tab
301 @item @code{masked} construct @tab Y @tab
302 @item @code{scope} directive @tab Y @tab
303 @item Loop transformation constructs @tab N @tab
304 @item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
305 clauses of the @code{taskloop} construct @tab Y @tab
306 @item @code{align} clause in @code{allocate} directive @tab P
307 @tab Only C and Fortran (and only stack variables)
308 @item @code{align} modifier in @code{allocate} clause @tab Y @tab
309 @item @code{thread_limit} clause to @code{target} construct @tab Y @tab
310 @item @code{has_device_addr} clause to @code{target} construct @tab Y @tab
311 @item Iterators in @code{target update} motion clauses and @code{map}
313 @item Indirect calls to the device version of a procedure or function in
314 @code{target} regions @tab P @tab Only C and C++
315 @item @code{interop} directive @tab N @tab
316 @item @code{omp_interop_t} object support in runtime routines @tab N @tab
317 @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
318 @item Extensions to the @code{atomic} directive @tab Y @tab
319 @item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab
320 @item @code{inoutset} argument to the @code{depend} clause @tab Y @tab
321 @item @code{private} and @code{firstprivate} argument to @code{default}
322 clause in C and C++ @tab Y @tab
323 @item @code{present} argument to @code{defaultmap} clause @tab Y @tab
324 @item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
325 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
327 @item @code{omp_target_is_accessible} runtime routine @tab Y @tab
328 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
329 runtime routines @tab Y @tab
330 @item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
331 @item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
332 @code{omp_aligned_calloc} runtime routines @tab Y @tab
333 @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
334 @code{omp_atv_default} changed @tab Y @tab
335 @item @code{omp_display_env} runtime routine @tab Y @tab
336 @item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
337 @item @code{ompt_sync_region_t} enum additions @tab N @tab
338 @item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}
339 and @code{ompt_state_wait_barrier_teams} @tab N @tab
340 @item @code{ompt_callback_target_data_op_emi_t},
341 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
342 and @code{ompt_callback_target_submit_emi_t} @tab N @tab
343 @item @code{ompt_callback_error_t} type @tab N @tab
344 @item @code{OMP_PLACES} syntax extensions @tab Y @tab
345 @item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
346 variables @tab Y @tab
349 @unnumberedsubsec Other new OpenMP 5.1 features
351 @multitable @columnfractions .60 .10 .25
352 @headitem Description @tab Status @tab Comments
353 @item Support of strictly structured blocks in Fortran @tab Y @tab
354 @item Support of structured block sequences in C/C++ @tab Y @tab
355 @item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
357 @item Support @code{begin/end declare target} syntax in C/C++ @tab Y @tab
358 @item Pointer predetermined firstprivate getting initialized
359 to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
360 @item For Fortran, diagnose placing declarative before/between @code{USE},
361 @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
362 @item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
363 @item @code{indirect} clause in @code{declare target} @tab P @tab Only C and C++
364 @item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
365 @item @code{present} modifier to the @code{map}, @code{to} and @code{from}
373 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
375 @multitable @columnfractions .60 .10 .25
376 @headitem Description @tab Status @tab Comments
377 @item @code{omp_in_explicit_task} routine and @var{explicit-task-var} ICV
379 @item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
381 @tab warning for @code{ompx/omx} sentinels@footnote{The @code{ompx}
382 sentinel as C/C++ pragma and C++ attributes are warned for with
383 @code{-Wunknown-pragmas} (implied by @code{-Wall}) and @code{-Wattributes}
384 (enabled by default), respectively; for Fortran free-source code, there is
385 a warning enabled by default and, for fixed-source code, the @code{omx}
386 sentinel is warned for with with @code{-Wsurprising} (enabled by
387 @code{-Wall}). Unknown clauses are always rejected with an error.}
388 @item Clauses on @code{end} directive can be on directive @tab Y @tab
389 @item @code{destroy} clause with destroy-var argument on @code{depobj}
391 @item Deprecation of no-argument @code{destroy} clause on @code{depobj}
393 @item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
394 @item Deprecation of minus operator for reductions @tab N @tab
395 @item Deprecation of separating @code{map} modifiers without comma @tab N @tab
396 @item @code{declare mapper} with iterator and @code{present} modifiers
398 @item If a matching mapped list item is not found in the data environment, the
399 pointer retains its original value @tab Y @tab
400 @item New @code{enter} clause as alias for @code{to} on declare target directive
402 @item Deprecation of @code{to} clause on declare target directive @tab N @tab
403 @item Extended list of directives permitted in Fortran pure procedures
405 @item New @code{allocators} directive for Fortran @tab N @tab
406 @item Deprecation of @code{allocate} directive for Fortran
407 allocatables/pointers @tab N @tab
408 @item Optional paired @code{end} directive with @code{dispatch} @tab N @tab
409 @item New @code{memspace} and @code{traits} modifiers for @code{uses_allocators}
411 @item Deprecation of traits array following the allocator_handle expression in
412 @code{uses_allocators} @tab N @tab
413 @item New @code{otherwise} clause as alias for @code{default} on metadirectives
415 @item Deprecation of @code{default} clause on metadirectives @tab N @tab
416 @item Deprecation of delimited form of @code{declare target} @tab N @tab
417 @item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
418 @item @code{allocate} and @code{firstprivate} clauses on @code{scope}
420 @item @code{ompt_callback_work} @tab N @tab
421 @item Default map-type for the @code{map} clause in @code{target enter/exit data}
423 @item New @code{doacross} clause as alias for @code{depend} with
424 @code{source}/@code{sink} modifier @tab Y @tab
425 @item Deprecation of @code{depend} with @code{source}/@code{sink} modifier
427 @item @code{omp_cur_iteration} keyword @tab Y @tab
430 @unnumberedsubsec Other new OpenMP 5.2 features
432 @multitable @columnfractions .60 .10 .25
433 @headitem Description @tab Status @tab Comments
434 @item For Fortran, optional comma between directive and clause @tab N @tab
435 @item Conforming device numbers and @code{omp_initial_device} and
436 @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
437 @item Initial value of @var{default-device-var} ICV with
438 @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
439 @item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab Y @tab
440 @item @emph{interop_types} in any position of the modifier list for the @code{init} clause
441 of the @code{interop} construct @tab N @tab
442 @item Invoke virtual member functions of C++ objects created on the host device
443 on other devices @tab N @tab
447 @node OpenMP Technical Report 12
448 @section OpenMP Technical Report 12
450 Technical Report (TR) 12 is the second preview for OpenMP 6.0.
452 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
453 @multitable @columnfractions .60 .10 .25
454 @item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
455 @tab N/A @tab Backward compatibility
456 @item Full support for C23 was added @tab P @tab
457 @item Full support for C++23 was added @tab P @tab
458 @item @code{_ALL} suffix to the device-scope environment variables
459 @tab P @tab Host device number wrongly accepted
460 @item @code{num_threads} now accepts a list @tab N @tab
461 @item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
462 @item Extension of @code{OMP_DEFAULT_DEVICE} and new
463 @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
464 @item New @code{OMP_THREADS_RESERVE} environment variable @tab N @tab
465 @item The @code{decl} attribute was added to the C++ attribute syntax
467 @item The OpenMP directive syntax was extended to include C 23 attribute
468 specifiers @tab Y @tab
469 @item All inarguable clauses take now an optional Boolean argument @tab N @tab
470 @item For Fortran, @emph{locator list} can be also function reference with
471 data pointer result @tab N @tab
472 @item Concept of @emph{assumed-size arrays} in C and C++
474 @item @emph{directive-name-modifier} accepted in all clauses @tab N @tab
475 @item For Fortran, atomic with BLOCK construct and, for C/C++, with
476 unlimited curly braces supported @tab N @tab
477 @item For Fortran, atomic compare with storing the comparison result
479 @item New @code{looprange} clause @tab N @tab
480 @item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
482 @item Support for inductions @tab N @tab
483 @item Implicit reduction identifiers of C++ classes
485 @item Change of the @emph{map-type} property from @emph{ultimate} to
486 @emph{default} @tab N @tab
487 @item @code{self} modifier to @code{map} and @code{self} as
488 @code{defaultmap} argument @tab N @tab
489 @item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
491 @item @code{groupprivate} directive @tab N @tab
492 @item @code{local} clause to @code{declare target} directive @tab N @tab
493 @item @code{part_size} allocator trait @tab N @tab
494 @item @code{pin_device}, @code{preferred_device} and @code{target_access}
497 @item @code{access} allocator trait changes @tab N @tab
498 @item Extension of @code{interop} operation of @code{append_args}, allowing all
499 modifiers of the @code{init} clause
501 @item @code{interop} clause to @code{dispatch} @tab N @tab
502 @item @code{message} and @code{severity} calauses to @code{parallel} directive
504 @item @code{self} clause to @code{requires} directive @tab N @tab
505 @item @code{no_openmp_constructs} assumptions clause @tab N @tab
506 @item @code{reverse} loop-transformation construct @tab N @tab
507 @item @code{interchange} loop-transformation construct @tab N @tab
508 @item @code{fuse} loop-transformation construct @tab N @tab
509 @item @code{apply} code to loop-transforming constructs @tab N @tab
510 @item @code{omp_curr_progress_width} identifier @tab N @tab
511 @item @code{safesync} clause to the @code{parallel} construct @tab N @tab
512 @item @code{omp_get_max_progress_width} runtime routine @tab N @tab
513 @item @code{strict} modifier keyword to @code{num_threads} @tab N @tab
514 @item @code{atomic} permitted in a construct with @code{order(concurrent)}
516 @item @code{coexecute} directive for Fortran @tab N @tab
517 @item Fortran DO CONCURRENT as associated loop in a @code{loop} construct
519 @item @code{threadset} clause in task-generating constructs @tab N @tab
520 @item @code{nowait} clause with reverse-offload @code{target} directives
522 @item Boolean argument to @code{nowait} and @code{nogroup} may be non constant
524 @item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab
525 @item @code{omp_is_free_agent} and @code{omp_ancestor_is_free_agent} routines
527 @item @code{omp_target_memset} and @code{omp_target_memset_rect_async} routines
529 @item Routines for obtaining memory spaces/allocators for shared/device memory
531 @item @code{omp_get_memspace_num_resources} routine @tab N @tab
532 @item @code{omp_get_submemspace} routine @tab N @tab
533 @item @code{ompt_target_data_transfer} and @code{ompt_target_data_transfer_async}
534 values in @code{ompt_target_data_op_t} enum @tab N @tab
535 @item @code{ompt_get_buffer_limits} OMPT routine @tab N @tab
538 @unnumberedsubsec Other new TR 12 features
539 @multitable @columnfractions .60 .10 .25
540 @item Relaxed Fortran restrictions to the @code{aligned} clause @tab N @tab
541 @item Mapping lambda captures @tab N @tab
542 @item New @code{omp_pause_stop_tool} constant for omp_pause_resource @tab N @tab
547 @c ---------------------------------------------------------------------
548 @c OpenMP Runtime Library Routines
549 @c ---------------------------------------------------------------------
551 @node Runtime Library Routines
552 @chapter OpenMP Runtime Library Routines
554 The runtime routines described here are defined by Section 18 of the OpenMP
555 specification in version 5.2.
558 * Thread Team Routines::
559 * Thread Affinity Routines::
560 * Teams Region Routines::
562 @c * Resource Relinquishing Routines::
563 * Device Information Routines::
564 * Device Memory Routines::
568 @c * Interoperability Routines::
569 * Memory Management Routines::
570 @c * Tool Control Routine::
571 @c * Environment Display Routine::
576 @node Thread Team Routines
577 @section Thread Team Routines
579 Routines controlling threads in the current contention group.
580 They have C linkage and do not throw exceptions.
583 * omp_set_num_threads:: Set upper team size limit
584 * omp_get_num_threads:: Size of the active team
585 * omp_get_max_threads:: Maximum number of threads of parallel region
586 * omp_get_thread_num:: Current thread ID
587 * omp_in_parallel:: Whether a parallel region is active
588 * omp_set_dynamic:: Enable/disable dynamic teams
589 * omp_get_dynamic:: Dynamic teams setting
590 * omp_get_cancellation:: Whether cancellation support is enabled
591 * omp_set_nested:: Enable/disable nested parallel regions
592 * omp_get_nested:: Nested parallel regions
593 * omp_set_schedule:: Set the runtime scheduling method
594 * omp_get_schedule:: Obtain the runtime scheduling method
595 * omp_get_teams_thread_limit:: Maximum number of threads imposed by teams
596 * omp_get_supported_active_levels:: Maximum number of active regions supported
597 * omp_set_max_active_levels:: Limits the number of active parallel regions
598 * omp_get_max_active_levels:: Current maximum number of active regions
599 * omp_get_level:: Number of parallel regions
600 * omp_get_ancestor_thread_num:: Ancestor thread ID
601 * omp_get_team_size:: Number of threads in a team
602 * omp_get_active_level:: Number of active parallel regions
607 @node omp_set_num_threads
608 @subsection @code{omp_set_num_threads} -- Set upper team size limit
610 @item @emph{Description}:
611 Specifies the number of threads used by default in subsequent parallel
612 sections, if those do not specify a @code{num_threads} clause. The
613 argument of @code{omp_set_num_threads} shall be a positive integer.
616 @multitable @columnfractions .20 .80
617 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
620 @item @emph{Fortran}:
621 @multitable @columnfractions .20 .80
622 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
623 @item @tab @code{integer, intent(in) :: num_threads}
626 @item @emph{See also}:
627 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
629 @item @emph{Reference}:
630 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
635 @node omp_get_num_threads
636 @subsection @code{omp_get_num_threads} -- Size of the active team
638 @item @emph{Description}:
639 Returns the number of threads in the current team. In a sequential section of
640 the program @code{omp_get_num_threads} returns 1.
642 The default team size may be initialized at startup by the
643 @env{OMP_NUM_THREADS} environment variable. At runtime, the size
644 of the current team may be set either by the @code{NUM_THREADS}
645 clause or by @code{omp_set_num_threads}. If none of the above were
646 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
647 one thread per CPU online is used.
650 @multitable @columnfractions .20 .80
651 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
654 @item @emph{Fortran}:
655 @multitable @columnfractions .20 .80
656 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
659 @item @emph{See also}:
660 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
662 @item @emph{Reference}:
663 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
668 @node omp_get_max_threads
669 @subsection @code{omp_get_max_threads} -- Maximum number of threads of parallel region
671 @item @emph{Description}:
672 Return the maximum number of threads used for the current parallel region
673 that does not use the clause @code{num_threads}.
676 @multitable @columnfractions .20 .80
677 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
680 @item @emph{Fortran}:
681 @multitable @columnfractions .20 .80
682 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
685 @item @emph{See also}:
686 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
688 @item @emph{Reference}:
689 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
694 @node omp_get_thread_num
695 @subsection @code{omp_get_thread_num} -- Current thread ID
697 @item @emph{Description}:
698 Returns a unique thread identification number within the current team.
699 In a sequential parts of the program, @code{omp_get_thread_num}
700 always returns 0. In parallel regions the return value varies
701 from 0 to @code{omp_get_num_threads}-1 inclusive. The return
702 value of the primary thread of a team is always 0.
705 @multitable @columnfractions .20 .80
706 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
709 @item @emph{Fortran}:
710 @multitable @columnfractions .20 .80
711 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
714 @item @emph{See also}:
715 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
717 @item @emph{Reference}:
718 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
723 @node omp_in_parallel
724 @subsection @code{omp_in_parallel} -- Whether a parallel region is active
726 @item @emph{Description}:
727 This function returns @code{true} if currently running in parallel,
728 @code{false} otherwise. Here, @code{true} and @code{false} represent
729 their language-specific counterparts.
732 @multitable @columnfractions .20 .80
733 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
736 @item @emph{Fortran}:
737 @multitable @columnfractions .20 .80
738 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
741 @item @emph{Reference}:
742 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
746 @node omp_set_dynamic
747 @subsection @code{omp_set_dynamic} -- Enable/disable dynamic teams
749 @item @emph{Description}:
750 Enable or disable the dynamic adjustment of the number of threads
751 within a team. The function takes the language-specific equivalent
752 of @code{true} and @code{false}, where @code{true} enables dynamic
753 adjustment of team sizes and @code{false} disables it.
756 @multitable @columnfractions .20 .80
757 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
760 @item @emph{Fortran}:
761 @multitable @columnfractions .20 .80
762 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
763 @item @tab @code{logical, intent(in) :: dynamic_threads}
766 @item @emph{See also}:
767 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
769 @item @emph{Reference}:
770 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
775 @node omp_get_dynamic
776 @subsection @code{omp_get_dynamic} -- Dynamic teams setting
778 @item @emph{Description}:
779 This function returns @code{true} if enabled, @code{false} otherwise.
780 Here, @code{true} and @code{false} represent their language-specific
783 The dynamic team setting may be initialized at startup by the
784 @env{OMP_DYNAMIC} environment variable or at runtime using
785 @code{omp_set_dynamic}. If undefined, dynamic adjustment is
789 @multitable @columnfractions .20 .80
790 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
793 @item @emph{Fortran}:
794 @multitable @columnfractions .20 .80
795 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
798 @item @emph{See also}:
799 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
801 @item @emph{Reference}:
802 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
807 @node omp_get_cancellation
808 @subsection @code{omp_get_cancellation} -- Whether cancellation support is enabled
810 @item @emph{Description}:
811 This function returns @code{true} if cancellation is activated, @code{false}
812 otherwise. Here, @code{true} and @code{false} represent their language-specific
813 counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
817 @multitable @columnfractions .20 .80
818 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
821 @item @emph{Fortran}:
822 @multitable @columnfractions .20 .80
823 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
826 @item @emph{See also}:
827 @ref{OMP_CANCELLATION}
829 @item @emph{Reference}:
830 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
836 @subsection @code{omp_set_nested} -- Enable/disable nested parallel regions
838 @item @emph{Description}:
839 Enable or disable nested parallel regions, i.e., whether team members
840 are allowed to create new teams. The function takes the language-specific
841 equivalent of @code{true} and @code{false}, where @code{true} enables
842 dynamic adjustment of team sizes and @code{false} disables it.
844 Enabling nested parallel regions also sets the maximum number of
845 active nested regions to the maximum supported. Disabling nested parallel
846 regions sets the maximum number of active nested regions to one.
848 Note that the @code{omp_set_nested} API routine was deprecated
849 in the OpenMP specification 5.2 in favor of @code{omp_set_max_active_levels}.
852 @multitable @columnfractions .20 .80
853 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
856 @item @emph{Fortran}:
857 @multitable @columnfractions .20 .80
858 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
859 @item @tab @code{logical, intent(in) :: nested}
862 @item @emph{See also}:
863 @ref{omp_get_nested}, @ref{omp_set_max_active_levels},
864 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
866 @item @emph{Reference}:
867 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
873 @subsection @code{omp_get_nested} -- Nested parallel regions
875 @item @emph{Description}:
876 This function returns @code{true} if nested parallel regions are
877 enabled, @code{false} otherwise. Here, @code{true} and @code{false}
878 represent their language-specific counterparts.
880 The state of nested parallel regions at startup depends on several
881 environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
882 and is set to greater than one, then nested parallel regions will be
883 enabled. If not defined, then the value of the @env{OMP_NESTED}
884 environment variable will be followed if defined. If neither are
885 defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
886 are defined with a list of more than one value, then nested parallel
887 regions are enabled. If none of these are defined, then nested parallel
888 regions are disabled by default.
890 Nested parallel regions can be enabled or disabled at runtime using
891 @code{omp_set_nested}, or by setting the maximum number of nested
892 regions with @code{omp_set_max_active_levels} to one to disable, or
895 Note that the @code{omp_get_nested} API routine was deprecated
896 in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}.
899 @multitable @columnfractions .20 .80
900 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
903 @item @emph{Fortran}:
904 @multitable @columnfractions .20 .80
905 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
908 @item @emph{See also}:
909 @ref{omp_get_max_active_levels}, @ref{omp_set_nested},
910 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
912 @item @emph{Reference}:
913 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
918 @node omp_set_schedule
919 @subsection @code{omp_set_schedule} -- Set the runtime scheduling method
921 @item @emph{Description}:
922 Sets the runtime scheduling method. The @var{kind} argument can have the
923 value @code{omp_sched_static}, @code{omp_sched_dynamic},
924 @code{omp_sched_guided} or @code{omp_sched_auto}. Except for
925 @code{omp_sched_auto}, the chunk size is set to the value of
926 @var{chunk_size} if positive, or to the default value if zero or negative.
927 For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
930 @multitable @columnfractions .20 .80
931 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
934 @item @emph{Fortran}:
935 @multitable @columnfractions .20 .80
936 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
937 @item @tab @code{integer(kind=omp_sched_kind) kind}
938 @item @tab @code{integer chunk_size}
941 @item @emph{See also}:
942 @ref{omp_get_schedule}
945 @item @emph{Reference}:
946 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
951 @node omp_get_schedule
952 @subsection @code{omp_get_schedule} -- Obtain the runtime scheduling method
954 @item @emph{Description}:
955 Obtain the runtime scheduling method. The @var{kind} argument is set to
956 @code{omp_sched_static}, @code{omp_sched_dynamic},
957 @code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
958 @var{chunk_size}, is set to the chunk size.
961 @multitable @columnfractions .20 .80
962 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
965 @item @emph{Fortran}:
966 @multitable @columnfractions .20 .80
967 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
968 @item @tab @code{integer(kind=omp_sched_kind) kind}
969 @item @tab @code{integer chunk_size}
972 @item @emph{See also}:
973 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
975 @item @emph{Reference}:
976 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
980 @node omp_get_teams_thread_limit
981 @subsection @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams
983 @item @emph{Description}:
984 Return the maximum number of threads that are able to participate in
985 each team created by a teams construct.
988 @multitable @columnfractions .20 .80
989 @item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
992 @item @emph{Fortran}:
993 @multitable @columnfractions .20 .80
994 @item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()}
997 @item @emph{See also}:
998 @ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
1000 @item @emph{Reference}:
1001 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
1006 @node omp_get_supported_active_levels
1007 @subsection @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
1009 @item @emph{Description}:
1010 This function returns the maximum number of nested, active parallel regions
1011 supported by this implementation.
1014 @multitable @columnfractions .20 .80
1015 @item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
1018 @item @emph{Fortran}:
1019 @multitable @columnfractions .20 .80
1020 @item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
1023 @item @emph{See also}:
1024 @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
1026 @item @emph{Reference}:
1027 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
1032 @node omp_set_max_active_levels
1033 @subsection @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
1035 @item @emph{Description}:
1036 This function limits the maximum allowed number of nested, active
1037 parallel regions. @var{max_levels} must be less or equal to
1038 the value returned by @code{omp_get_supported_active_levels}.
1041 @multitable @columnfractions .20 .80
1042 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
1045 @item @emph{Fortran}:
1046 @multitable @columnfractions .20 .80
1047 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1048 @item @tab @code{integer max_levels}
1051 @item @emph{See also}:
1052 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
1053 @ref{omp_get_supported_active_levels}
1055 @item @emph{Reference}:
1056 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
1061 @node omp_get_max_active_levels
1062 @subsection @code{omp_get_max_active_levels} -- Current maximum number of active regions
1064 @item @emph{Description}:
1065 This function obtains the maximum allowed number of nested, active parallel regions.
1068 @multitable @columnfractions .20 .80
1069 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
1072 @item @emph{Fortran}:
1073 @multitable @columnfractions .20 .80
1074 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
1077 @item @emph{See also}:
1078 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
1080 @item @emph{Reference}:
1081 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
1086 @subsection @code{omp_get_level} -- Obtain the current nesting level
1088 @item @emph{Description}:
1089 This function returns the nesting level for the parallel blocks,
1090 which enclose the calling call.
1093 @multitable @columnfractions .20 .80
1094 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
1097 @item @emph{Fortran}:
1098 @multitable @columnfractions .20 .80
1099 @item @emph{Interface}: @tab @code{integer function omp_level()}
1102 @item @emph{See also}:
1103 @ref{omp_get_active_level}
1105 @item @emph{Reference}:
1106 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
1111 @node omp_get_ancestor_thread_num
1112 @subsection @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
1114 @item @emph{Description}:
1115 This function returns the thread identification number for the given
1116 nesting level of the current thread. For values of @var{level} outside
1117 zero to @code{omp_get_level} -1 is returned; if @var{level} is
1118 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
1121 @multitable @columnfractions .20 .80
1122 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
1125 @item @emph{Fortran}:
1126 @multitable @columnfractions .20 .80
1127 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
1128 @item @tab @code{integer level}
1131 @item @emph{See also}:
1132 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
1134 @item @emph{Reference}:
1135 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
1140 @node omp_get_team_size
1141 @subsection @code{omp_get_team_size} -- Number of threads in a team
1143 @item @emph{Description}:
1144 This function returns the number of threads in a thread team to which
1145 either the current thread or its ancestor belongs. For values of @var{level}
1146 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
1147 1 is returned, and for @code{omp_get_level}, the result is identical
1148 to @code{omp_get_num_threads}.
1151 @multitable @columnfractions .20 .80
1152 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
1155 @item @emph{Fortran}:
1156 @multitable @columnfractions .20 .80
1157 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1158 @item @tab @code{integer level}
1161 @item @emph{See also}:
1162 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1164 @item @emph{Reference}:
1165 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
1170 @node omp_get_active_level
1171 @subsection @code{omp_get_active_level} -- Number of parallel regions
1173 @item @emph{Description}:
1174 This function returns the nesting level for the active parallel blocks,
1175 which enclose the calling call.
1178 @multitable @columnfractions .20 .80
1179 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
1182 @item @emph{Fortran}:
1183 @multitable @columnfractions .20 .80
1184 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
1187 @item @emph{See also}:
1188 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
1190 @item @emph{Reference}:
1191 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
1196 @node Thread Affinity Routines
1197 @section Thread Affinity Routines
1199 Routines controlling and accessing thread-affinity policies.
1200 They have C linkage and do not throw exceptions.
1203 * omp_get_proc_bind:: Whether threads may be moved between CPUs
1204 @c * omp_get_num_places:: <fixme>
1205 @c * omp_get_place_num_procs:: <fixme>
1206 @c * omp_get_place_proc_ids:: <fixme>
1207 @c * omp_get_place_num:: <fixme>
1208 @c * omp_get_partition_num_places:: <fixme>
1209 @c * omp_get_partition_place_nums:: <fixme>
1210 @c * omp_set_affinity_format:: <fixme>
1211 @c * omp_get_affinity_format:: <fixme>
1212 @c * omp_display_affinity:: <fixme>
1213 @c * omp_capture_affinity:: <fixme>
1218 @node omp_get_proc_bind
1219 @subsection @code{omp_get_proc_bind} -- Whether threads may be moved between CPUs
1221 @item @emph{Description}:
1222 This functions returns the currently active thread affinity policy, which is
1223 set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
1224 @code{omp_proc_bind_true}, @code{omp_proc_bind_primary},
1225 @code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread},
1226 where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}.
1229 @multitable @columnfractions .20 .80
1230 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1233 @item @emph{Fortran}:
1234 @multitable @columnfractions .20 .80
1235 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1238 @item @emph{See also}:
1239 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1241 @item @emph{Reference}:
1242 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
1247 @node Teams Region Routines
1248 @section Teams Region Routines
1250 Routines controlling the league of teams that are executed in a @code{teams}
1251 region. They have C linkage and do not throw exceptions.
1254 * omp_get_num_teams:: Number of teams
1255 * omp_get_team_num:: Get team number
1256 * omp_set_num_teams:: Set upper teams limit for teams region
1257 * omp_get_max_teams:: Maximum number of teams for teams region
1258 * omp_set_teams_thread_limit:: Set upper thread limit for teams construct
1259 * omp_get_thread_limit:: Maximum number of threads
1264 @node omp_get_num_teams
1265 @subsection @code{omp_get_num_teams} -- Number of teams
1267 @item @emph{Description}:
1268 Returns the number of teams in the current team region.
1271 @multitable @columnfractions .20 .80
1272 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
1275 @item @emph{Fortran}:
1276 @multitable @columnfractions .20 .80
1277 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
1280 @item @emph{Reference}:
1281 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
1286 @node omp_get_team_num
1287 @subsection @code{omp_get_team_num} -- Get team number
1289 @item @emph{Description}:
1290 Returns the team number of the calling thread.
1293 @multitable @columnfractions .20 .80
1294 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1297 @item @emph{Fortran}:
1298 @multitable @columnfractions .20 .80
1299 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1302 @item @emph{Reference}:
1303 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
1308 @node omp_set_num_teams
1309 @subsection @code{omp_set_num_teams} -- Set upper teams limit for teams construct
1311 @item @emph{Description}:
1312 Specifies the upper bound for number of teams created by the teams construct
1313 which does not specify a @code{num_teams} clause. The
1314 argument of @code{omp_set_num_teams} shall be a positive integer.
1317 @multitable @columnfractions .20 .80
1318 @item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);}
1321 @item @emph{Fortran}:
1322 @multitable @columnfractions .20 .80
1323 @item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)}
1324 @item @tab @code{integer, intent(in) :: num_teams}
1327 @item @emph{See also}:
1328 @ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams}
1330 @item @emph{Reference}:
1331 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3.
1336 @node omp_get_max_teams
1337 @subsection @code{omp_get_max_teams} -- Maximum number of teams of teams region
1339 @item @emph{Description}:
1340 Return the maximum number of teams used for the teams region
1341 that does not use the clause @code{num_teams}.
1344 @multitable @columnfractions .20 .80
1345 @item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
1348 @item @emph{Fortran}:
1349 @multitable @columnfractions .20 .80
1350 @item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
1353 @item @emph{See also}:
1354 @ref{omp_set_num_teams}, @ref{omp_get_num_teams}
1356 @item @emph{Reference}:
1357 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
1362 @node omp_set_teams_thread_limit
1363 @subsection @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct
1365 @item @emph{Description}:
1366 Specifies the upper bound for number of threads that are available
1367 for each team created by the teams construct which does not specify a
1368 @code{thread_limit} clause. The argument of
1369 @code{omp_set_teams_thread_limit} shall be a positive integer.
1372 @multitable @columnfractions .20 .80
1373 @item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);}
1376 @item @emph{Fortran}:
1377 @multitable @columnfractions .20 .80
1378 @item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)}
1379 @item @tab @code{integer, intent(in) :: thread_limit}
1382 @item @emph{See also}:
1383 @ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit}
1385 @item @emph{Reference}:
1386 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5.
1391 @node omp_get_thread_limit
1392 @subsection @code{omp_get_thread_limit} -- Maximum number of threads
1394 @item @emph{Description}:
1395 Return the maximum number of threads of the program.
1398 @multitable @columnfractions .20 .80
1399 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
1402 @item @emph{Fortran}:
1403 @multitable @columnfractions .20 .80
1404 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
1407 @item @emph{See also}:
1408 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
1410 @item @emph{Reference}:
1411 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
1416 @node Tasking Routines
1417 @section Tasking Routines
1419 Routines relating to explicit tasks.
1420 They have C linkage and do not throw exceptions.
1423 * omp_get_max_task_priority:: Maximum task priority value that can be set
1424 * omp_in_explicit_task:: Whether a given task is an explicit task
1425 * omp_in_final:: Whether in final or included task region
1426 @c * omp_is_free_agent:: <fixme>/TR12
1427 @c * omp_ancestor_is_free_agent:: <fixme>/TR12
1432 @node omp_get_max_task_priority
1433 @subsection @code{omp_get_max_task_priority} -- Maximum priority value
1434 that can be set for tasks.
1436 @item @emph{Description}:
1437 This function obtains the maximum allowed priority number for tasks.
1440 @multitable @columnfractions .20 .80
1441 @item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
1444 @item @emph{Fortran}:
1445 @multitable @columnfractions .20 .80
1446 @item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
1449 @item @emph{Reference}:
1450 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1455 @node omp_in_explicit_task
1456 @subsection @code{omp_in_explicit_task} -- Whether a given task is an explicit task
1458 @item @emph{Description}:
1459 The function returns the @var{explicit-task-var} ICV; it returns true when the
1460 encountering task was generated by a task-generating construct such as
1461 @code{target}, @code{task} or @code{taskloop}. Otherwise, the encountering task
1462 is in an implicit task region such as generated by the implicit or explicit
1463 @code{parallel} region and @code{omp_in_explicit_task} returns false.
1466 @multitable @columnfractions .20 .80
1467 @item @emph{Prototype}: @tab @code{int omp_in_explicit_task(void);}
1470 @item @emph{Fortran}:
1471 @multitable @columnfractions .20 .80
1472 @item @emph{Interface}: @tab @code{logical function omp_in_explicit_task()}
1475 @item @emph{Reference}:
1476 @uref{https://www.openmp.org, OpenMP specification v5.2}, Section 18.5.2.
1482 @subsection @code{omp_in_final} -- Whether in final or included task region
1484 @item @emph{Description}:
1485 This function returns @code{true} if currently running in a final
1486 or included task region, @code{false} otherwise. Here, @code{true}
1487 and @code{false} represent their language-specific counterparts.
1490 @multitable @columnfractions .20 .80
1491 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1494 @item @emph{Fortran}:
1495 @multitable @columnfractions .20 .80
1496 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
1499 @item @emph{Reference}:
1500 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
1505 @c @node Resource Relinquishing Routines
1506 @c @section Resource Relinquishing Routines
1508 @c Routines releasing resources used by the OpenMP runtime.
1509 @c They have C linkage and do not throw exceptions.
1512 @c * omp_pause_resource:: <fixme>
1513 @c * omp_pause_resource_all:: <fixme>
1516 @node Device Information Routines
1517 @section Device Information Routines
1519 Routines related to devices available to an OpenMP program.
1520 They have C linkage and do not throw exceptions.
1523 * omp_get_num_procs:: Number of processors online
1524 @c * omp_get_max_progress_width:: <fixme>/TR11
1525 * omp_set_default_device:: Set the default device for target regions
1526 * omp_get_default_device:: Get the default device for target regions
1527 * omp_get_num_devices:: Number of target devices
1528 * omp_get_device_num:: Get device that current thread is running on
1529 * omp_is_initial_device:: Whether executing on the host device
1530 * omp_get_initial_device:: Device number of host device
1535 @node omp_get_num_procs
1536 @subsection @code{omp_get_num_procs} -- Number of processors online
1538 @item @emph{Description}:
1539 Returns the number of processors online on that device.
1542 @multitable @columnfractions .20 .80
1543 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
1546 @item @emph{Fortran}:
1547 @multitable @columnfractions .20 .80
1548 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
1551 @item @emph{Reference}:
1552 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
1557 @node omp_set_default_device
1558 @subsection @code{omp_set_default_device} -- Set the default device for target regions
1560 @item @emph{Description}:
1561 Set the default device for target regions without device clause. The argument
1562 shall be a nonnegative device number.
1565 @multitable @columnfractions .20 .80
1566 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1569 @item @emph{Fortran}:
1570 @multitable @columnfractions .20 .80
1571 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1572 @item @tab @code{integer device_num}
1575 @item @emph{See also}:
1576 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1578 @item @emph{Reference}:
1579 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1584 @node omp_get_default_device
1585 @subsection @code{omp_get_default_device} -- Get the default device for target regions
1587 @item @emph{Description}:
1588 Get the default device for target regions without device clause.
1591 @multitable @columnfractions .20 .80
1592 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
1595 @item @emph{Fortran}:
1596 @multitable @columnfractions .20 .80
1597 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
1600 @item @emph{See also}:
1601 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
1603 @item @emph{Reference}:
1604 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
1609 @node omp_get_num_devices
1610 @subsection @code{omp_get_num_devices} -- Number of target devices
1612 @item @emph{Description}:
1613 Returns the number of target devices.
1616 @multitable @columnfractions .20 .80
1617 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
1620 @item @emph{Fortran}:
1621 @multitable @columnfractions .20 .80
1622 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
1625 @item @emph{Reference}:
1626 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
1631 @node omp_get_device_num
1632 @subsection @code{omp_get_device_num} -- Return device number of current device
1634 @item @emph{Description}:
1635 This function returns a device number that represents the device that the
1636 current thread is executing on. For OpenMP 5.0, this must be equal to the
1637 value returned by the @code{omp_get_initial_device} function when called
1641 @multitable @columnfractions .20 .80
1642 @item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
1645 @item @emph{Fortran}:
1646 @multitable @columnfractions .20 .80
1647 @item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
1650 @item @emph{See also}:
1651 @ref{omp_get_initial_device}
1653 @item @emph{Reference}:
1654 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
1659 @node omp_is_initial_device
1660 @subsection @code{omp_is_initial_device} -- Whether executing on the host device
1662 @item @emph{Description}:
1663 This function returns @code{true} if currently running on the host device,
1664 @code{false} otherwise. Here, @code{true} and @code{false} represent
1665 their language-specific counterparts.
1668 @multitable @columnfractions .20 .80
1669 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
1672 @item @emph{Fortran}:
1673 @multitable @columnfractions .20 .80
1674 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
1677 @item @emph{Reference}:
1678 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
1683 @node omp_get_initial_device
1684 @subsection @code{omp_get_initial_device} -- Return device number of initial device
1686 @item @emph{Description}:
1687 This function returns a device number that represents the host device.
1688 For OpenMP 5.1, this must be equal to the value returned by the
1689 @code{omp_get_num_devices} function.
1692 @multitable @columnfractions .20 .80
1693 @item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
1696 @item @emph{Fortran}:
1697 @multitable @columnfractions .20 .80
1698 @item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
1701 @item @emph{See also}:
1702 @ref{omp_get_num_devices}
1704 @item @emph{Reference}:
1705 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
1710 @node Device Memory Routines
1711 @section Device Memory Routines
1713 Routines related to memory allocation and managing corresponding
1714 pointers on devices. They have C linkage and do not throw exceptions.
1717 * omp_target_alloc:: Allocate device memory
1718 * omp_target_free:: Free device memory
1719 * omp_target_is_present:: Check whether storage is mapped
1720 @c * omp_target_is_accessible:: <fixme>
1721 @c * omp_target_memcpy:: <fixme>
1722 @c * omp_target_memcpy_rect:: <fixme>
1723 @c * omp_target_memcpy_async:: <fixme>
1724 @c * omp_target_memcpy_rect_async:: <fixme>
1725 @c * omp_target_memset:: <fixme>/TR12
1726 @c * omp_target_memset_async:: <fixme>/TR12
1727 * omp_target_associate_ptr:: Associate a device pointer with a host pointer
1728 * omp_target_disassociate_ptr:: Remove device--host pointer association
1729 * omp_get_mapped_ptr:: Return device pointer to a host pointer
1734 @node omp_target_alloc
1735 @subsection @code{omp_target_alloc} -- Allocate device memory
1737 @item @emph{Description}:
1738 This routine allocates @var{size} bytes of memory in the device environment
1739 associated with the device number @var{device_num}. If successful, a device
1740 pointer is returned, otherwise a null pointer.
1742 In GCC, when the device is the host or the device shares memory with the host,
1743 the memory is allocated on the host; in that case, when @var{size} is zero,
1744 either NULL or a unique pointer value that can later be successfully passed to
1745 @code{omp_target_free} is returned. When the allocation is not performed on
1746 the host, a null pointer is returned when @var{size} is zero; in that case,
1747 additionally a diagnostic might be printed to standard error (stderr).
1749 Running this routine in a @code{target} region except on the initial device
1753 @multitable @columnfractions .20 .80
1754 @item @emph{Prototype}: @tab @code{void *omp_target_alloc(size_t size, int device_num)}
1757 @item @emph{Fortran}:
1758 @multitable @columnfractions .20 .80
1759 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_target_alloc(size, device_num) bind(C)}
1760 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
1761 @item @tab @code{integer(c_size_t), value :: size}
1762 @item @tab @code{integer(c_int), value :: device_num}
1765 @item @emph{See also}:
1766 @ref{omp_target_free}, @ref{omp_target_associate_ptr}
1768 @item @emph{Reference}:
1769 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.1
1774 @node omp_target_free
1775 @subsection @code{omp_target_free} -- Free device memory
1777 @item @emph{Description}:
1778 This routine frees memory allocated by the @code{omp_target_alloc} routine.
1779 The @var{device_ptr} argument must be either a null pointer or a device pointer
1780 returned by @code{omp_target_alloc} for the specified @code{device_num}. The
1781 device number @var{device_num} must be a conforming device number.
1783 Running this routine in a @code{target} region except on the initial device
1787 @multitable @columnfractions .20 .80
1788 @item @emph{Prototype}: @tab @code{void omp_target_free(void *device_ptr, int device_num)}
1791 @item @emph{Fortran}:
1792 @multitable @columnfractions .20 .80
1793 @item @emph{Interface}: @tab @code{subroutine omp_target_free(device_ptr, device_num) bind(C)}
1794 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1795 @item @tab @code{type(c_ptr), value :: device_ptr}
1796 @item @tab @code{integer(c_int), value :: device_num}
1799 @item @emph{See also}:
1800 @ref{omp_target_alloc}, @ref{omp_target_disassociate_ptr}
1802 @item @emph{Reference}:
1803 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.2
1808 @node omp_target_is_present
1809 @subsection @code{omp_target_is_present} -- Check whether storage is mapped
1811 @item @emph{Description}:
1812 This routine tests whether storage, identified by the host pointer @var{ptr}
1813 is mapped to the device specified by @var{device_num}. If so, it returns
1814 @emph{true} and otherwise @emph{false}.
1816 In GCC, this includes self mapping such that @code{omp_target_is_present}
1817 returns @emph{true} when @var{device_num} specifies the host or when the host
1818 and the device share memory. If @var{ptr} is a null pointer, @var{true} is
1819 returned and if @var{device_num} is an invalid device number, @var{false} is
1822 If those conditions do not apply, @emph{true} is returned if the association has
1823 been established by an explicit or implicit @code{map} clause, the
1824 @code{declare target} directive or a call to the @code{omp_target_associate_ptr}
1827 Running this routine in a @code{target} region except on the initial device
1831 @multitable @columnfractions .20 .80
1832 @item @emph{Prototype}: @tab @code{int omp_target_is_present(const void *ptr,}
1833 @item @tab @code{ int device_num)}
1836 @item @emph{Fortran}:
1837 @multitable @columnfractions .20 .80
1838 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_is_present(ptr, &}
1839 @item @tab @code{ device_num) bind(C)}
1840 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1841 @item @tab @code{type(c_ptr), value :: ptr}
1842 @item @tab @code{integer(c_int), value :: device_num}
1845 @item @emph{See also}:
1846 @ref{omp_target_associate_ptr}
1848 @item @emph{Reference}:
1849 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.3
1854 @node omp_target_associate_ptr
1855 @subsection @code{omp_target_associate_ptr} -- Associate a device pointer with a host pointer
1857 @item @emph{Description}:
1858 This routine associates storage on the host with storage on a device identified
1859 by @var{device_num}. The device pointer is usually obtained by calling
1860 @code{omp_target_alloc} or by other means (but not by using the @code{map}
1861 clauses or the @code{declare target} directive). The host pointer should point
1862 to memory that has a storage size of at least @var{size}.
1864 The @var{device_offset} parameter specifies the offset into @var{device_ptr}
1865 that is used as the base address for the device side of the mapping; the
1866 storage size should be at least @var{device_offset} plus @var{size}.
1868 After the association, the host pointer can be used in a @code{map} clause and
1869 in the @code{to} and @code{from} clauses of the @code{target update} directive
1870 to transfer data between the associated pointers. The reference count of such
1871 associated storage is infinite. The association can be removed by calling
1872 @code{omp_target_disassociate_ptr} which should be done before the lifetime
1873 of either either storage ends.
1875 The routine returns nonzero (@code{EINVAL}) when the @var{device_num} invalid,
1876 for when the initial device or the associated device shares memory with the
1877 host. @code{omp_target_associate_ptr} returns zero if @var{host_ptr} points
1878 into already associated storage that is fully inside of a previously associated
1879 memory. Otherwise, if the association was successful zero is returned; if none
1880 of the cases above apply, nonzero (@code{EINVAL}) is returned.
1882 The @code{omp_target_is_present} routine can be used to test whether
1883 associated storage for a device pointer exists.
1885 Running this routine in a @code{target} region except on the initial device
1889 @multitable @columnfractions .20 .80
1890 @item @emph{Prototype}: @tab @code{int omp_target_associate_ptr(const void *host_ptr,}
1891 @item @tab @code{ const void *device_ptr,}
1892 @item @tab @code{ size_t size,}
1893 @item @tab @code{ size_t device_offset,}
1894 @item @tab @code{ int device_num)}
1897 @item @emph{Fortran}:
1898 @multitable @columnfractions .20 .80
1899 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_associate_ptr(host_ptr, &}
1900 @item @tab @code{ device_ptr, size, device_offset, device_num) bind(C)}
1901 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
1902 @item @tab @code{type(c_ptr), value :: host_ptr, device_ptr}
1903 @item @tab @code{integer(c_size_t), value :: size, device_offset}
1904 @item @tab @code{integer(c_int), value :: device_num}
1907 @item @emph{See also}:
1908 @ref{omp_target_disassociate_ptr}, @ref{omp_target_is_present},
1909 @ref{omp_target_alloc}
1911 @item @emph{Reference}:
1912 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.9
1917 @node omp_target_disassociate_ptr
1918 @subsection @code{omp_target_disassociate_ptr} -- Remove device--host pointer association
1920 @item @emph{Description}:
1921 This routine removes the storage association established by calling
1922 @code{omp_target_associate_ptr} and sets the reference count to zero,
1923 even if @code{omp_target_associate_ptr} was invoked multiple times for
1924 for host pointer @code{ptr}. If applicable, the device memory needs
1925 to be freed by the user.
1927 If an associated device storage location for the @var{device_num} was
1928 found and has infinite reference count, the association is removed and
1929 zero is returned. In all other cases, nonzero (@code{EINVAL}) is returned
1930 and no other action is taken.
1932 Note that passing a host pointer where the association to the device pointer
1933 was established with the @code{declare target} directive yields undefined
1936 Running this routine in a @code{target} region except on the initial device
1940 @multitable @columnfractions .20 .80
1941 @item @emph{Prototype}: @tab @code{int omp_target_disassociate_ptr(const void *ptr,}
1942 @item @tab @code{ int device_num)}
1945 @item @emph{Fortran}:
1946 @multitable @columnfractions .20 .80
1947 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_disassociate_ptr(ptr, &}
1948 @item @tab @code{ device_num) bind(C)}
1949 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1950 @item @tab @code{type(c_ptr), value :: ptr}
1951 @item @tab @code{integer(c_int), value :: device_num}
1954 @item @emph{See also}:
1955 @ref{omp_target_associate_ptr}
1957 @item @emph{Reference}:
1958 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.10
1963 @node omp_get_mapped_ptr
1964 @subsection @code{omp_get_mapped_ptr} -- Return device pointer to a host pointer
1966 @item @emph{Description}:
1967 If the device number is refers to the initial device or to a device with
1968 memory accessible from the host (shared memory), the @code{omp_get_mapped_ptr}
1969 routines returns the value of the passed @var{ptr}. Otherwise, if associated
1970 storage to the passed host pointer @var{ptr} exists on device associated with
1971 @var{device_num}, it returns that pointer. In all other cases and in cases of
1972 an error, a null pointer is returned.
1974 The association of storage location is established either via an explicit or
1975 implicit @code{map} clause, the @code{declare target} directive or the
1976 @code{omp_target_associate_ptr} routine.
1978 Running this routine in a @code{target} region except on the initial device
1982 @multitable @columnfractions .20 .80
1983 @item @emph{Prototype}: @tab @code{void *omp_get_mapped_ptr(const void *ptr, int device_num);}
1986 @item @emph{Fortran}:
1987 @multitable @columnfractions .20 .80
1988 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_get_mapped_ptr(ptr, device_num) bind(C)}
1989 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1990 @item @tab @code{type(c_ptr), value :: ptr}
1991 @item @tab @code{integer(c_int), value :: device_num}
1994 @item @emph{See also}:
1995 @ref{omp_target_associate_ptr}
1997 @item @emph{Reference}:
1998 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.11
2004 @section Lock Routines
2006 Initialize, set, test, unset and destroy simple and nested locks.
2007 The routines have C linkage and do not throw exceptions.
2010 * omp_init_lock:: Initialize simple lock
2011 * omp_init_nest_lock:: Initialize nested lock
2012 @c * omp_init_lock_with_hint:: <fixme>
2013 @c * omp_init_nest_lock_with_hint:: <fixme>
2014 * omp_destroy_lock:: Destroy simple lock
2015 * omp_destroy_nest_lock:: Destroy nested lock
2016 * omp_set_lock:: Wait for and set simple lock
2017 * omp_set_nest_lock:: Wait for and set simple lock
2018 * omp_unset_lock:: Unset simple lock
2019 * omp_unset_nest_lock:: Unset nested lock
2020 * omp_test_lock:: Test and set simple lock if available
2021 * omp_test_nest_lock:: Test and set nested lock if available
2027 @subsection @code{omp_init_lock} -- Initialize simple lock
2029 @item @emph{Description}:
2030 Initialize a simple lock. After initialization, the lock is in
2034 @multitable @columnfractions .20 .80
2035 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
2038 @item @emph{Fortran}:
2039 @multitable @columnfractions .20 .80
2040 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
2041 @item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
2044 @item @emph{See also}:
2045 @ref{omp_destroy_lock}
2047 @item @emph{Reference}:
2048 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
2053 @node omp_init_nest_lock
2054 @subsection @code{omp_init_nest_lock} -- Initialize nested lock
2056 @item @emph{Description}:
2057 Initialize a nested lock. After initialization, the lock is in
2058 an unlocked state and the nesting count is set to zero.
2061 @multitable @columnfractions .20 .80
2062 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
2065 @item @emph{Fortran}:
2066 @multitable @columnfractions .20 .80
2067 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
2068 @item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
2071 @item @emph{See also}:
2072 @ref{omp_destroy_nest_lock}
2074 @item @emph{Reference}:
2075 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
2080 @node omp_destroy_lock
2081 @subsection @code{omp_destroy_lock} -- Destroy simple lock
2083 @item @emph{Description}:
2084 Destroy a simple lock. In order to be destroyed, a simple lock must be
2085 in the unlocked state.
2088 @multitable @columnfractions .20 .80
2089 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
2092 @item @emph{Fortran}:
2093 @multitable @columnfractions .20 .80
2094 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
2095 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2098 @item @emph{See also}:
2101 @item @emph{Reference}:
2102 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
2107 @node omp_destroy_nest_lock
2108 @subsection @code{omp_destroy_nest_lock} -- Destroy nested lock
2110 @item @emph{Description}:
2111 Destroy a nested lock. In order to be destroyed, a nested lock must be
2112 in the unlocked state and its nesting count must equal zero.
2115 @multitable @columnfractions .20 .80
2116 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
2119 @item @emph{Fortran}:
2120 @multitable @columnfractions .20 .80
2121 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
2122 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2125 @item @emph{See also}:
2128 @item @emph{Reference}:
2129 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
2135 @subsection @code{omp_set_lock} -- Wait for and set simple lock
2137 @item @emph{Description}:
2138 Before setting a simple lock, the lock variable must be initialized by
2139 @code{omp_init_lock}. The calling thread is blocked until the lock
2140 is available. If the lock is already held by the current thread,
2144 @multitable @columnfractions .20 .80
2145 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
2148 @item @emph{Fortran}:
2149 @multitable @columnfractions .20 .80
2150 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
2151 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2154 @item @emph{See also}:
2155 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
2157 @item @emph{Reference}:
2158 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
2163 @node omp_set_nest_lock
2164 @subsection @code{omp_set_nest_lock} -- Wait for and set nested lock
2166 @item @emph{Description}:
2167 Before setting a nested lock, the lock variable must be initialized by
2168 @code{omp_init_nest_lock}. The calling thread is blocked until the lock
2169 is available. If the lock is already held by the current thread, the
2170 nesting count for the lock is incremented.
2173 @multitable @columnfractions .20 .80
2174 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
2177 @item @emph{Fortran}:
2178 @multitable @columnfractions .20 .80
2179 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
2180 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2183 @item @emph{See also}:
2184 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
2186 @item @emph{Reference}:
2187 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
2192 @node omp_unset_lock
2193 @subsection @code{omp_unset_lock} -- Unset simple lock
2195 @item @emph{Description}:
2196 A simple lock about to be unset must have been locked by @code{omp_set_lock}
2197 or @code{omp_test_lock} before. In addition, the lock must be held by the
2198 thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
2199 or more threads attempted to set the lock before, one of them is chosen to,
2200 again, set the lock to itself.
2203 @multitable @columnfractions .20 .80
2204 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
2207 @item @emph{Fortran}:
2208 @multitable @columnfractions .20 .80
2209 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
2210 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2213 @item @emph{See also}:
2214 @ref{omp_set_lock}, @ref{omp_test_lock}
2216 @item @emph{Reference}:
2217 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
2222 @node omp_unset_nest_lock
2223 @subsection @code{omp_unset_nest_lock} -- Unset nested lock
2225 @item @emph{Description}:
2226 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
2227 or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
2228 thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
2229 lock becomes unlocked. If one ore more threads attempted to set the lock before,
2230 one of them is chosen to, again, set the lock to itself.
2233 @multitable @columnfractions .20 .80
2234 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
2237 @item @emph{Fortran}:
2238 @multitable @columnfractions .20 .80
2239 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
2240 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2243 @item @emph{See also}:
2244 @ref{omp_set_nest_lock}
2246 @item @emph{Reference}:
2247 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
2253 @subsection @code{omp_test_lock} -- Test and set simple lock if available
2255 @item @emph{Description}:
2256 Before setting a simple lock, the lock variable must be initialized by
2257 @code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
2258 does not block if the lock is not available. This function returns
2259 @code{true} upon success, @code{false} otherwise. Here, @code{true} and
2260 @code{false} represent their language-specific counterparts.
2263 @multitable @columnfractions .20 .80
2264 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
2267 @item @emph{Fortran}:
2268 @multitable @columnfractions .20 .80
2269 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
2270 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2273 @item @emph{See also}:
2274 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
2276 @item @emph{Reference}:
2277 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
2282 @node omp_test_nest_lock
2283 @subsection @code{omp_test_nest_lock} -- Test and set nested lock if available
2285 @item @emph{Description}:
2286 Before setting a nested lock, the lock variable must be initialized by
2287 @code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
2288 @code{omp_test_nest_lock} does not block if the lock is not available.
2289 If the lock is already held by the current thread, the new nesting count
2290 is returned. Otherwise, the return value equals zero.
2293 @multitable @columnfractions .20 .80
2294 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
2297 @item @emph{Fortran}:
2298 @multitable @columnfractions .20 .80
2299 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
2300 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2304 @item @emph{See also}:
2305 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
2307 @item @emph{Reference}:
2308 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
2313 @node Timing Routines
2314 @section Timing Routines
2316 Portable, thread-based, wall clock timer.
2317 The routines have C linkage and do not throw exceptions.
2320 * omp_get_wtick:: Get timer precision.
2321 * omp_get_wtime:: Elapsed wall clock time.
2327 @subsection @code{omp_get_wtick} -- Get timer precision
2329 @item @emph{Description}:
2330 Gets the timer precision, i.e., the number of seconds between two
2331 successive clock ticks.
2334 @multitable @columnfractions .20 .80
2335 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
2338 @item @emph{Fortran}:
2339 @multitable @columnfractions .20 .80
2340 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
2343 @item @emph{See also}:
2346 @item @emph{Reference}:
2347 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
2353 @subsection @code{omp_get_wtime} -- Elapsed wall clock time
2355 @item @emph{Description}:
2356 Elapsed wall clock time in seconds. The time is measured per thread, no
2357 guarantee can be made that two distinct threads measure the same time.
2358 Time is measured from some "time in the past", which is an arbitrary time
2359 guaranteed not to change during the execution of the program.
2362 @multitable @columnfractions .20 .80
2363 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
2366 @item @emph{Fortran}:
2367 @multitable @columnfractions .20 .80
2368 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
2371 @item @emph{See also}:
2374 @item @emph{Reference}:
2375 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
2381 @section Event Routine
2383 Support for event objects.
2384 The routine has C linkage and do not throw exceptions.
2387 * omp_fulfill_event:: Fulfill and destroy an OpenMP event.
2392 @node omp_fulfill_event
2393 @subsection @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event
2395 @item @emph{Description}:
2396 Fulfill the event associated with the event handle argument. Currently, it
2397 is only used to fulfill events generated by detach clauses on task
2398 constructs - the effect of fulfilling the event is to allow the task to
2401 The result of calling @code{omp_fulfill_event} with an event handle other
2402 than that generated by a detach clause is undefined. Calling it with an
2403 event handle that has already been fulfilled is also undefined.
2406 @multitable @columnfractions .20 .80
2407 @item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);}
2410 @item @emph{Fortran}:
2411 @multitable @columnfractions .20 .80
2412 @item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)}
2413 @item @tab @code{integer (kind=omp_event_handle_kind) :: event}
2416 @item @emph{Reference}:
2417 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1.
2422 @c @node Interoperability Routines
2423 @c @section Interoperability Routines
2425 @c Routines to obtain properties from an @code{omp_interop_t} object.
2426 @c They have C linkage and do not throw exceptions.
2429 @c * omp_get_num_interop_properties:: <fixme>
2430 @c * omp_get_interop_int:: <fixme>
2431 @c * omp_get_interop_ptr:: <fixme>
2432 @c * omp_get_interop_str:: <fixme>
2433 @c * omp_get_interop_name:: <fixme>
2434 @c * omp_get_interop_type_desc:: <fixme>
2435 @c * omp_get_interop_rc_desc:: <fixme>
2438 @node Memory Management Routines
2439 @section Memory Management Routines
2441 Routines to manage and allocate memory on the current device.
2442 They have C linkage and do not throw exceptions.
2445 * omp_init_allocator:: Create an allocator
2446 * omp_destroy_allocator:: Destroy an allocator
2447 * omp_set_default_allocator:: Set the default allocator
2448 * omp_get_default_allocator:: Get the default allocator
2449 * omp_alloc:: Memory allocation with an allocator
2450 * omp_aligned_alloc:: Memory allocation with an allocator and alignment
2451 * omp_free:: Freeing memory allocated with OpenMP routines
2452 * omp_calloc:: Allocate nullified memory with an allocator
2453 * omp_aligned_calloc:: Allocate nullified aligned memory with an allocator
2454 * omp_realloc:: Reallocate memory allocated with OpenMP routines
2455 @c * omp_get_memspace_num_resources:: <fixme>/TR11
2456 @c * omp_get_submemspace:: <fixme>/TR11
2461 @node omp_init_allocator
2462 @subsection @code{omp_init_allocator} -- Create an allocator
2464 @item @emph{Description}:
2465 Create an allocator that uses the specified memory space and has the specified
2466 traits; if an allocator that fulfills the requirements cannot be created,
2467 @code{omp_null_allocator} is returned.
2469 The predefined memory spaces and available traits can be found at
2470 @ref{OMP_ALLOCATOR}, where the trait names have to be be prefixed by
2471 @code{omp_atk_} (e.g. @code{omp_atk_pinned}) and the named trait values by
2472 @code{omp_atv_} (e.g. @code{omp_atv_true}); additionally, @code{omp_atv_default}
2473 may be used as trait value to specify that the default value should be used.
2476 @multitable @columnfractions .20 .80
2477 @item @emph{Prototype}: @tab @code{omp_allocator_handle_t omp_init_allocator(}
2478 @item @tab @code{ omp_memspace_handle_t memspace,}
2479 @item @tab @code{ int ntraits,}
2480 @item @tab @code{ const omp_alloctrait_t traits[]);}
2483 @item @emph{Fortran}:
2484 @multitable @columnfractions .20 .80
2485 @item @emph{Interface}: @tab @code{function omp_init_allocator(memspace, ntraits, traits)}
2486 @item @tab @code{integer (omp_allocator_handle_kind) :: omp_init_allocator}
2487 @item @tab @code{integer (omp_memspace_handle_kind), intent(in) :: memspace}
2488 @item @tab @code{integer, intent(in) :: ntraits}
2489 @item @tab @code{type (omp_alloctrait), intent(in) :: traits(*)}
2492 @item @emph{See also}:
2493 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_destroy_allocator}
2495 @item @emph{Reference}:
2496 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.2
2501 @node omp_destroy_allocator
2502 @subsection @code{omp_destroy_allocator} -- Destroy an allocator
2504 @item @emph{Description}:
2505 Releases all resources used by a memory allocator, which must not represent
2506 a predefined memory allocator. Accessing memory after its allocator has been
2507 destroyed has unspecified behavior. Passing @code{omp_null_allocator} to the
2508 routine is permitted but has no effect.
2512 @multitable @columnfractions .20 .80
2513 @item @emph{Prototype}: @tab @code{void omp_destroy_allocator (omp_allocator_handle_t allocator);}
2516 @item @emph{Fortran}:
2517 @multitable @columnfractions .20 .80
2518 @item @emph{Interface}: @tab @code{subroutine omp_destroy_allocator(allocator)}
2519 @item @tab @code{integer (omp_allocator_handle_kind), intent(in) :: allocator}
2522 @item @emph{See also}:
2523 @ref{omp_init_allocator}
2525 @item @emph{Reference}:
2526 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.3
2531 @node omp_set_default_allocator
2532 @subsection @code{omp_set_default_allocator} -- Set the default allocator
2534 @item @emph{Description}:
2535 Sets the default allocator that is used when no allocator has been specified
2536 in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
2537 routine is invoked with the @code{omp_null_allocator} allocator.
2540 @multitable @columnfractions .20 .80
2541 @item @emph{Prototype}: @tab @code{void omp_set_default_allocator(omp_allocator_handle_t allocator);}
2544 @item @emph{Fortran}:
2545 @multitable @columnfractions .20 .80
2546 @item @emph{Interface}: @tab @code{subroutine omp_set_default_allocator(allocator)}
2547 @item @tab @code{integer (omp_allocator_handle_kind), intent(in) :: allocator}
2550 @item @emph{See also}:
2551 @ref{omp_get_default_allocator}, @ref{omp_init_allocator}, @ref{OMP_ALLOCATOR},
2552 @ref{Memory allocation}
2554 @item @emph{Reference}:
2555 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.4
2560 @node omp_get_default_allocator
2561 @subsection @code{omp_get_default_allocator} -- Get the default allocator
2563 @item @emph{Description}:
2564 The routine returns the default allocator that is used when no allocator has
2565 been specified in the @code{allocate} or @code{allocator} clause or if an
2566 OpenMP memory routine is invoked with the @code{omp_null_allocator} allocator.
2569 @multitable @columnfractions .20 .80
2570 @item @emph{Prototype}: @tab @code{omp_allocator_handle_t omp_get_default_allocator();}
2573 @item @emph{Fortran}:
2574 @multitable @columnfractions .20 .80
2575 @item @emph{Interface}: @tab @code{function omp_get_default_allocator()}
2576 @item @tab @code{integer (omp_allocator_handle_kind) :: omp_get_default_allocator}
2579 @item @emph{See also}:
2580 @ref{omp_set_default_allocator}, @ref{OMP_ALLOCATOR}
2582 @item @emph{Reference}:
2583 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.5
2589 @subsection @code{omp_alloc} -- Memory allocation with an allocator
2591 @item @emph{Description}:
2592 Allocate memory with the specified allocator, which can either be a predefined
2593 allocator, an allocator handle or @code{omp_null_allocator}. If the allocators
2594 is @code{omp_null_allocator}, the allocator specified by the
2595 @var{def-allocator-var} ICV is used. @var{size} must be a nonnegative number
2596 denoting the number of bytes to be allocated; if @var{size} is zero,
2597 @code{omp_alloc} will return a null pointer. If successful, a pointer to the
2598 allocated memory is returned, otherwise the @code{fallback} trait of the
2599 allocator determines the behavior. The content of the allocated memory is
2602 In @code{target} regions, either the @code{dynamic_allocators} clause must
2603 appear on a @code{requires} directive in the same compilation unit -- or the
2604 @var{allocator} argument may only be a constant expression with the value of
2605 one of the predefined allocators and may not be @code{omp_null_allocator}.
2607 Memory allocated by @code{omp_alloc} must be freed using @code{omp_free}.
2610 @multitable @columnfractions .20 .80
2611 @item @emph{Prototype}: @tab @code{void* omp_alloc(size_t size,}
2612 @item @tab @code{ omp_allocator_handle_t allocator)}
2616 @multitable @columnfractions .20 .80
2617 @item @emph{Prototype}: @tab @code{void* omp_alloc(size_t size,}
2618 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2621 @item @emph{Fortran}:
2622 @multitable @columnfractions .20 .80
2623 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_alloc(size, allocator) bind(C)}
2624 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2625 @item @tab @code{integer (c_size_t), value :: size}
2626 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2629 @item @emph{See also}:
2630 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2631 @ref{omp_free}, @ref{omp_init_allocator}
2633 @item @emph{Reference}:
2634 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.6
2639 @node omp_aligned_alloc
2640 @subsection @code{omp_aligned_alloc} -- Memory allocation with an allocator and alignment
2642 @item @emph{Description}:
2643 Allocate memory with the specified allocator, which can either be a predefined
2644 allocator, an allocator handle or @code{omp_null_allocator}. If the allocators
2645 is @code{omp_null_allocator}, the allocator specified by the
2646 @var{def-allocator-var} ICV is used. @var{alignment} must be a positive power
2647 of two and @var{size} must be a nonnegative number that is a multiple of the
2648 alignment and denotes the number of bytes to be allocated; if @var{size} is
2649 zero, @code{omp_aligned_alloc} will return a null pointer. The alignment will
2650 be at least the maximal value required by @code{alignment} trait of the
2651 allocator and the value of the passed @var{alignment} argument. If successful,
2652 a pointer to the allocated memory is returned, otherwise the @code{fallback}
2653 trait of the allocator determines the behavior. The content of the allocated
2654 memory is unspecified.
2656 In @code{target} regions, either the @code{dynamic_allocators} clause must
2657 appear on a @code{requires} directive in the same compilation unit -- or the
2658 @var{allocator} argument may only be a constant expression with the value of
2659 one of the predefined allocators and may not be @code{omp_null_allocator}.
2661 Memory allocated by @code{omp_aligned_alloc} must be freed using
2665 @multitable @columnfractions .20 .80
2666 @item @emph{Prototype}: @tab @code{void* omp_aligned_alloc(size_t alignment,}
2667 @item @tab @code{ size_t size,}
2668 @item @tab @code{ omp_allocator_handle_t allocator)}
2672 @multitable @columnfractions .20 .80
2673 @item @emph{Prototype}: @tab @code{void* omp_aligned_alloc(size_t alignment,}
2674 @item @tab @code{ size_t size,}
2675 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2678 @item @emph{Fortran}:
2679 @multitable @columnfractions .20 .80
2680 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_aligned_alloc(alignment, size, allocator) bind(C)}
2681 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2682 @item @tab @code{integer (c_size_t), value :: alignment, size}
2683 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2686 @item @emph{See also}:
2687 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2688 @ref{omp_free}, @ref{omp_init_allocator}
2690 @item @emph{Reference}:
2691 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.6
2697 @subsection @code{omp_free} -- Freeing memory allocated with OpenMP routines
2699 @item @emph{Description}:
2700 The @code{omp_free} routine deallocates memory previously allocated by an
2701 OpenMP memory-management routine. The @var{ptr} argument must point to such
2702 memory or be a null pointer; if it is a null pointer, no operation is
2703 performed. If specified, the @var{allocator} argument must be either the
2704 memory allocator that was used for the allocation or @code{omp_null_allocator};
2705 if it is @code{omp_null_allocator}, the implementation will determine the value
2708 Calling @code{omp_free} invokes undefined behavior if the memory
2709 was already deallocated or when the used allocator has already been destroyed.
2712 @multitable @columnfractions .20 .80
2713 @item @emph{Prototype}: @tab @code{void omp_free(void *ptr,}
2714 @item @tab @code{ omp_allocator_handle_t allocator)}
2718 @multitable @columnfractions .20 .80
2719 @item @emph{Prototype}: @tab @code{void omp_free(void *ptr,}
2720 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2723 @item @emph{Fortran}:
2724 @multitable @columnfractions .20 .80
2725 @item @emph{Interface}: @tab @code{subroutine omp_free(ptr, allocator) bind(C)}
2726 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr}
2727 @item @tab @code{type (c_ptr), value :: ptr}
2728 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2731 @item @emph{See also}:
2732 @ref{omp_alloc}, @ref{omp_aligned_alloc}, @ref{omp_calloc},
2733 @ref{omp_aligned_calloc}, @ref{omp_realloc}
2735 @item @emph{Reference}:
2736 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.7
2742 @subsection @code{omp_calloc} -- Allocate nullified memory with an allocator
2744 @item @emph{Description}:
2745 Allocate zero-initialized memory with the specified allocator, which can either
2746 be a predefined allocator, an allocator handle or @code{omp_null_allocator}. If
2747 the allocators is @code{omp_null_allocator}, the allocator specified by the
2748 @var{def-allocator-var} ICV is used. The to-be allocated memory is for an
2749 array with @var{nmemb} elements, each having a size of @var{size} bytes. Both
2750 @var{nmemb} and @var{size} must be nonnegative numbers; if either of them is
2751 zero, @code{omp_calloc} will return a null pointer. If successful, a pointer to
2752 the zero-initialized allocated memory is returned, otherwise the @code{fallback}
2753 trait of the allocator determines the behavior.
2755 In @code{target} regions, either the @code{dynamic_allocators} clause must
2756 appear on a @code{requires} directive in the same compilation unit -- or the
2757 @var{allocator} argument may only be a constant expression with the value of
2758 one of the predefined allocators and may not be @code{omp_null_allocator}.
2760 Memory allocated by @code{omp_calloc} must be freed using @code{omp_free}.
2763 @multitable @columnfractions .20 .80
2764 @item @emph{Prototype}: @tab @code{void* omp_calloc(size_t nmemb, size_t size,}
2765 @item @tab @code{ omp_allocator_handle_t allocator)}
2769 @multitable @columnfractions .20 .80
2770 @item @emph{Prototype}: @tab @code{void* omp_calloc(size_t nmemb, size_t size,}
2771 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2774 @item @emph{Fortran}:
2775 @multitable @columnfractions .20 .80
2776 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_calloc(nmemb, size, allocator) bind(C)}
2777 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2778 @item @tab @code{integer (c_size_t), value :: nmemb, size}
2779 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2782 @item @emph{See also}:
2783 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2784 @ref{omp_free}, @ref{omp_init_allocator}
2786 @item @emph{Reference}:
2787 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.8
2792 @node omp_aligned_calloc
2793 @subsection @code{omp_aligned_calloc} -- Allocate aligned nullified memory with an allocator
2795 @item @emph{Description}:
2796 Allocate zero-initialized memory with the specified allocator, which can either
2797 be a predefined allocator, an allocator handle or @code{omp_null_allocator}. If
2798 the allocators is @code{omp_null_allocator}, the allocator specified by the
2799 @var{def-allocator-var} ICV is used. The to-be allocated memory is for an
2800 array with @var{nmemb} elements, each having a size of @var{size} bytes. Both
2801 @var{nmemb} and @var{size} must be nonnegative numbers; if either of them is
2802 zero, @code{omp_aligned_calloc} will return a null pointer. @var{alignment}
2803 must be a positive power of two and @var{size} must be a multiple of the
2804 alignment; the alignment will be at least the maximal value required by
2805 @code{alignment} trait of the allocator and the value of the passed
2806 @var{alignment} argument. If successful, a pointer to the zero-initialized
2807 allocated memory is returned, otherwise the @code{fallback} trait of the
2808 allocator determines the behavior.
2810 In @code{target} regions, either the @code{dynamic_allocators} clause must
2811 appear on a @code{requires} directive in the same compilation unit -- or the
2812 @var{allocator} argument may only be a constant expression with the value of
2813 one of the predefined allocators and may not be @code{omp_null_allocator}.
2815 Memory allocated by @code{omp_aligned_calloc} must be freed using
2819 @multitable @columnfractions .20 .80
2820 @item @emph{Prototype}: @tab @code{void* omp_aligned_calloc(size_t nmemb, size_t size,}
2821 @item @tab @code{ omp_allocator_handle_t allocator)}
2825 @multitable @columnfractions .20 .80
2826 @item @emph{Prototype}: @tab @code{void* omp_aligned_calloc(size_t nmemb, size_t size,}
2827 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2830 @item @emph{Fortran}:
2831 @multitable @columnfractions .20 .80
2832 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_aligned_calloc(nmemb, size, allocator) bind(C)}
2833 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2834 @item @tab @code{integer (c_size_t), value :: nmemb, size}
2835 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2838 @item @emph{See also}:
2839 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2840 @ref{omp_free}, @ref{omp_init_allocator}
2842 @item @emph{Reference}:
2843 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.8
2849 @subsection @code{omp_realloc} -- Reallocate memory allocated with OpenMP routines
2851 @item @emph{Description}:
2852 The @code{omp_realloc} routine deallocates memory to which @var{ptr} points to
2853 and allocates new memory with the specified @var{allocator} argument; the
2854 new memory will have the content of the old memory up to the minimum of the
2855 old size and the new @var{size}, otherwise the content of the returned memory
2856 is unspecified. If the new allocator is the same as the old one, the routine
2857 tries to resize the existing memory allocation, returning the same address as
2858 @var{ptr} if successful. @var{ptr} must point to memory allocated by an OpenMP
2859 memory-management routine.
2861 The @var{allocator} and @var{free_allocator} arguments must be a predefined
2862 allocator, an allocator handle or @code{omp_null_allocator}. If
2863 @var{free_allocator} is @code{omp_null_allocator}, the implementation
2864 automatically determines the allocator used for the allocation of @var{ptr}.
2865 If @var{allocator} is @code{omp_null_allocator} and @var{ptr} is is not a
2866 null pointer, the same allocator as @code{free_allocator} is used and
2867 when @var{ptr} is a null pointer the allocator specified by the
2868 @var{def-allocator-var} ICV is used.
2870 The @var{size} must be a nonnegative number denoting the number of bytes to be
2871 allocated; if @var{size} is zero, @code{omp_realloc} will return free the
2872 memory and return a null pointer. When @var{size} is nonzero: if successful,
2873 a pointer to the allocated memory is returned, otherwise the @code{fallback}
2874 trait of the allocator determines the behavior.
2876 In @code{target} regions, either the @code{dynamic_allocators} clause must
2877 appear on a @code{requires} directive in the same compilation unit -- or the
2878 @var{free_allocator} and @var{allocator} arguments may only be a constant
2879 expression with the value of one of the predefined allocators and may not be
2880 @code{omp_null_allocator}.
2882 Memory allocated by @code{omp_realloc} must be freed using @code{omp_free}.
2883 Calling @code{omp_free} invokes undefined behavior if the memory
2884 was already deallocated or when the used allocator has already been destroyed.
2887 @multitable @columnfractions .20 .80
2888 @item @emph{Prototype}: @tab @code{void* omp_realloc(void *ptr, size_t size,}
2889 @item @tab @code{ omp_allocator_handle_t allocator,}
2890 @item @tab @code{ omp_allocator_handle_t free_allocator)}
2894 @multitable @columnfractions .20 .80
2895 @item @emph{Prototype}: @tab @code{void* omp_realloc(void *ptr, size_t size,}
2896 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator,}
2897 @item @tab @code{ omp_allocator_handle_t free_allocator=omp_null_allocator)}
2900 @item @emph{Fortran}:
2901 @multitable @columnfractions .20 .80
2902 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_realloc(ptr, size, allocator, free_allocator) bind(C)}
2903 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2904 @item @tab @code{type(C_ptr), value :: ptr}
2905 @item @tab @code{integer (c_size_t), value :: size}
2906 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator, free_allocator}
2909 @item @emph{See also}:
2910 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2911 @ref{omp_free}, @ref{omp_init_allocator}
2913 @item @emph{Reference}:
2914 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.9
2919 @c @node Tool Control Routine
2923 @c @node Environment Display Routine
2924 @c @section Environment Display Routine
2926 @c Routine to display the OpenMP number and the initial value of ICVs.
2927 @c It has C linkage and do not throw exceptions.
2930 @c * omp_display_env:: <fixme>
2933 @c ---------------------------------------------------------------------
2934 @c OpenMP Environment Variables
2935 @c ---------------------------------------------------------------------
2937 @node Environment Variables
2938 @chapter OpenMP Environment Variables
2940 The environment variables which beginning with @env{OMP_} are defined by
2941 section 4 of the OpenMP specification in version 4.5 or in a later version
2942 of the specification, while those beginning with @env{GOMP_} are GNU extensions.
2943 Most @env{OMP_} environment variables have an associated internal control
2946 For any OpenMP environment variable that sets an ICV and is neither
2947 @code{OMP_DEFAULT_DEVICE} nor has global ICV scope, associated
2948 device-specific environment variables exist. For them, the environment
2949 variable without suffix affects the host. The suffix @code{_DEV_} followed
2950 by a non-negative device number less that the number of available devices sets
2951 the ICV for the corresponding device. The suffix @code{_DEV} sets the ICV
2952 of all non-host devices for which a device-specific corresponding environment
2953 variable has not been set while the @code{_ALL} suffix sets the ICV of all
2954 host and non-host devices for which a more specific corresponding environment
2955 variable is not set.
2958 * OMP_ALLOCATOR:: Set the default allocator
2959 * OMP_AFFINITY_FORMAT:: Set the format string used for affinity display
2960 * OMP_CANCELLATION:: Set whether cancellation is activated
2961 * OMP_DISPLAY_AFFINITY:: Display thread affinity information
2962 * OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
2963 * OMP_DEFAULT_DEVICE:: Set the device used in target regions
2964 * OMP_DYNAMIC:: Dynamic adjustment of threads
2965 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
2966 * OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
2967 * OMP_NESTED:: Nested parallel regions
2968 * OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region
2969 * OMP_NUM_THREADS:: Specifies the number of threads to use
2970 * OMP_PROC_BIND:: Whether threads may be moved between CPUs
2971 * OMP_PLACES:: Specifies on which CPUs the threads should be placed
2972 * OMP_STACKSIZE:: Set default thread stack size
2973 * OMP_SCHEDULE:: How threads are scheduled
2974 * OMP_TARGET_OFFLOAD:: Controls offloading behavior
2975 * OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams
2976 * OMP_THREAD_LIMIT:: Set the maximum number of threads
2977 * OMP_WAIT_POLICY:: How waiting threads are handled
2978 * GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
2979 * GOMP_DEBUG:: Enable debugging output
2980 * GOMP_STACKSIZE:: Set default thread stack size
2981 * GOMP_SPINCOUNT:: Set the busy-wait spin count
2982 * GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
2987 @section @env{OMP_ALLOCATOR} -- Set the default allocator
2988 @cindex Environment Variable
2990 @item @emph{ICV:} @var{def-allocator-var}
2991 @item @emph{Scope:} data environment
2992 @item @emph{Description}:
2993 Sets the default allocator that is used when no allocator has been specified
2994 in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
2995 routine is invoked with the @code{omp_null_allocator} allocator.
2996 If unset, @code{omp_default_mem_alloc} is used.
2998 The value can either be a predefined allocator or a predefined memory space
2999 or a predefined memory space followed by a colon and a comma-separated list
3000 of memory trait and value pairs, separated by @code{=}.
3002 Note: The corresponding device environment variables are currently not
3003 supported. Therefore, the non-host @var{def-allocator-var} ICVs are always
3004 initialized to @code{omp_default_mem_alloc}. However, on all devices,
3005 the @code{omp_set_default_allocator} API routine can be used to change
3008 @multitable @columnfractions .45 .45
3009 @headitem Predefined allocators @tab Associated predefined memory spaces
3010 @item omp_default_mem_alloc @tab omp_default_mem_space
3011 @item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
3012 @item omp_const_mem_alloc @tab omp_const_mem_space
3013 @item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
3014 @item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
3015 @item omp_cgroup_mem_alloc @tab --
3016 @item omp_pteam_mem_alloc @tab --
3017 @item omp_thread_mem_alloc @tab --
3020 The predefined allocators use the default values for the traits,
3021 as listed below. Except that the last three allocators have the
3022 @code{access} trait set to @code{cgroup}, @code{pteam}, and
3023 @code{thread}, respectively.
3025 @multitable @columnfractions .25 .40 .25
3026 @headitem Trait @tab Allowed values @tab Default value
3027 @item @code{sync_hint} @tab @code{contended}, @code{uncontended},
3028 @code{serialized}, @code{private}
3029 @tab @code{contended}
3030 @item @code{alignment} @tab Positive integer being a power of two
3032 @item @code{access} @tab @code{all}, @code{cgroup},
3033 @code{pteam}, @code{thread}
3035 @item @code{pool_size} @tab Positive integer
3036 @tab See @ref{Memory allocation}
3037 @item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
3038 @code{abort_fb}, @code{allocator_fb}
3040 @item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle}
3042 @item @code{pinned} @tab @code{true}, @code{false}
3044 @item @code{partition} @tab @code{environment}, @code{nearest},
3045 @code{blocked}, @code{interleaved}
3046 @tab @code{environment}
3049 For the @code{fallback} trait, the default value is @code{null_fb} for the
3050 @code{omp_default_mem_alloc} allocator and any allocator that is associated
3051 with device memory; for all other other allocators, it is @code{default_mem_fb}
3056 OMP_ALLOCATOR=omp_high_bw_mem_alloc
3057 OMP_ALLOCATOR=omp_large_cap_mem_space
3058 OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest
3061 @item @emph{See also}:
3062 @ref{Memory allocation}, @ref{omp_get_default_allocator},
3063 @ref{omp_set_default_allocator}
3065 @item @emph{Reference}:
3066 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
3071 @node OMP_AFFINITY_FORMAT
3072 @section @env{OMP_AFFINITY_FORMAT} -- Set the format string used for affinity display
3073 @cindex Environment Variable
3075 @item @emph{ICV:} @var{affinity-format-var}
3076 @item @emph{Scope:} device
3077 @item @emph{Description}:
3078 Sets the format string used when displaying OpenMP thread affinity information.
3079 Special values are output using @code{%} followed by an optional size
3080 specification and then either the single-character field type or its long
3081 name enclosed in curly braces; using @code{%%} displays a literal percent.
3082 The size specification consists of an optional @code{0.} or @code{.} followed
3083 by a positive integer, specifying the minimal width of the output. With
3084 @code{0.} and numerical values, the output is padded with zeros on the left;
3085 with @code{.}, the output is padded by spaces on the left; otherwise, the
3086 output is padded by spaces on the right. If unset, the value is
3087 ``@code{level %L thread %i affinity %A}''.
3089 Supported field types are:
3091 @multitable @columnfractions .10 .25 .60
3092 @item t @tab team_num @tab value returned by @code{omp_get_team_num}
3093 @item T @tab num_teams @tab value returned by @code{omp_get_num_teams}
3094 @item L @tab nesting_level @tab value returned by @code{omp_get_level}
3095 @item n @tab thread_num @tab value returned by @code{omp_get_thread_num}
3096 @item N @tab num_threads @tab value returned by @code{omp_get_num_threads}
3097 @item a @tab ancestor_tnum
3098 @tab value returned by
3099 @code{omp_get_ancestor_thread_num(omp_get_level()-1)}
3100 @item H @tab host @tab name of the host that executes the thread
3101 @item P @tab process_id @tab process identifier
3102 @item i @tab native_thread_id @tab native thread identifier
3103 @item A @tab thread_affinity
3104 @tab comma separated list of integer values or ranges, representing the
3105 processors on which a process might execute, subject to affinity
3109 For instance, after setting
3112 OMP_AFFINITY_FORMAT="%0.2a!%n!%.4L!%N;%.2t;%0.2T;%@{team_num@};%@{num_teams@};%A"
3115 with either @code{OMP_DISPLAY_AFFINITY} being set or when calling
3116 @code{omp_display_affinity} with @code{NULL} or an empty string, the program
3117 might display the following:
3120 00!0! 1!4; 0;01;0;1;0-11
3121 00!3! 1!4; 0;01;0;1;0-11
3122 00!2! 1!4; 0;01;0;1;0-11
3123 00!1! 1!4; 0;01;0;1;0-11
3126 @item @emph{See also}:
3127 @ref{OMP_DISPLAY_AFFINITY}
3129 @item @emph{Reference}:
3130 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.14
3135 @node OMP_CANCELLATION
3136 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
3137 @cindex Environment Variable
3139 @item @emph{ICV:} @var{cancel-var}
3140 @item @emph{Scope:} global
3141 @item @emph{Description}:
3142 If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
3143 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
3145 @item @emph{See also}:
3146 @ref{omp_get_cancellation}
3148 @item @emph{Reference}:
3149 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
3154 @node OMP_DISPLAY_AFFINITY
3155 @section @env{OMP_DISPLAY_AFFINITY} -- Display thread affinity information
3156 @cindex Environment Variable
3158 @item @emph{ICV:} @var{display-affinity-var}
3159 @item @emph{Scope:} global
3160 @item @emph{Description}:
3161 If set to @code{FALSE} or if unset, affinity displaying is disabled.
3162 If set to @code{TRUE}, the runtime displays affinity information about
3163 OpenMP threads in a parallel region upon entering the region and every time
3166 @item @emph{See also}:
3167 @ref{OMP_AFFINITY_FORMAT}
3169 @item @emph{Reference}:
3170 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.13
3176 @node OMP_DISPLAY_ENV
3177 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
3178 @cindex Environment Variable
3180 @item @emph{ICV:} none
3181 @item @emph{Scope:} not applicable
3182 @item @emph{Description}:
3183 If set to @code{TRUE}, the OpenMP version number and the values
3184 associated with the OpenMP environment variables are printed to @code{stderr}.
3185 If set to @code{VERBOSE}, it additionally shows the value of the environment
3186 variables which are GNU extensions. If undefined or set to @code{FALSE},
3187 this information is not shown.
3190 @item @emph{Reference}:
3191 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
3196 @node OMP_DEFAULT_DEVICE
3197 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
3198 @cindex Environment Variable
3200 @item @emph{ICV:} @var{default-device-var}
3201 @item @emph{Scope:} data environment
3202 @item @emph{Description}:
3203 Set to choose the device which is used in a @code{target} region, unless the
3204 value is overridden by @code{omp_set_default_device} or by a @code{device}
3205 clause. The value shall be the nonnegative device number. If no device with
3206 the given device number exists, the code is executed on the host. If unset,
3207 @env{OMP_TARGET_OFFLOAD} is @code{mandatory} and no non-host devices are
3208 available, it is set to @code{omp_invalid_device}. Otherwise, if unset,
3209 device number 0 is used.
3212 @item @emph{See also}:
3213 @ref{omp_get_default_device}, @ref{omp_set_default_device},
3214 @ref{OMP_TARGET_OFFLOAD}
3216 @item @emph{Reference}:
3217 @uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.7
3223 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
3224 @cindex Environment Variable
3226 @item @emph{ICV:} @var{dyn-var}
3227 @item @emph{Scope:} global
3228 @item @emph{Description}:
3229 Enable or disable the dynamic adjustment of the number of threads
3230 within a team. The value of this environment variable shall be
3231 @code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
3232 disabled by default.
3234 @item @emph{See also}:
3235 @ref{omp_set_dynamic}
3237 @item @emph{Reference}:
3238 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
3243 @node OMP_MAX_ACTIVE_LEVELS
3244 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
3245 @cindex Environment Variable
3247 @item @emph{ICV:} @var{max-active-levels-var}
3248 @item @emph{Scope:} data environment
3249 @item @emph{Description}:
3250 Specifies the initial value for the maximum number of nested parallel
3251 regions. The value of this variable shall be a positive integer.
3252 If undefined, then if @env{OMP_NESTED} is defined and set to true, or
3253 if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
3254 a list with more than one item, the maximum number of nested parallel
3255 regions is initialized to the largest number supported, otherwise
3258 @item @emph{See also}:
3259 @ref{omp_set_max_active_levels}, @ref{OMP_NESTED}, @ref{OMP_PROC_BIND},
3260 @ref{OMP_NUM_THREADS}
3263 @item @emph{Reference}:
3264 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
3269 @node OMP_MAX_TASK_PRIORITY
3270 @section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
3271 number that can be set for a task.
3272 @cindex Environment Variable
3274 @item @emph{ICV:} @var{max-task-priority-var}
3275 @item @emph{Scope:} global
3276 @item @emph{Description}:
3277 Specifies the initial value for the maximum priority value that can be
3278 set for a task. The value of this variable shall be a non-negative
3279 integer, and zero is allowed. If undefined, the default priority is
3282 @item @emph{See also}:
3283 @ref{omp_get_max_task_priority}
3285 @item @emph{Reference}:
3286 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
3292 @section @env{OMP_NESTED} -- Nested parallel regions
3293 @cindex Environment Variable
3294 @cindex Implementation specific setting
3296 @item @emph{ICV:} @var{max-active-levels-var}
3297 @item @emph{Scope:} data environment
3298 @item @emph{Description}:
3299 Enable or disable nested parallel regions, i.e., whether team members
3300 are allowed to create new teams. The value of this environment variable
3301 shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
3302 of maximum active nested regions supported is by default set to the
3303 maximum supported, otherwise it is set to one. If
3304 @env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting overrides this
3305 setting. If both are undefined, nested parallel regions are enabled if
3306 @env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
3307 more than one item, otherwise they are disabled by default.
3309 Note that the @code{OMP_NESTED} environment variable was deprecated in
3310 the OpenMP specification 5.2 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
3312 @item @emph{See also}:
3313 @ref{omp_set_max_active_levels}, @ref{omp_set_nested},
3314 @ref{OMP_MAX_ACTIVE_LEVELS}
3316 @item @emph{Reference}:
3317 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
3323 @section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region
3324 @cindex Environment Variable
3326 @item @emph{ICV:} @var{nteams-var}
3327 @item @emph{Scope:} device
3328 @item @emph{Description}:
3329 Specifies the upper bound for number of teams to use in teams regions
3330 without explicit @code{num_teams} clause. The value of this variable shall
3331 be a positive integer. If undefined it defaults to 0 which means
3332 implementation defined upper bound.
3334 @item @emph{See also}:
3335 @ref{omp_set_num_teams}
3337 @item @emph{Reference}:
3338 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23
3343 @node OMP_NUM_THREADS
3344 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
3345 @cindex Environment Variable
3346 @cindex Implementation specific setting
3348 @item @emph{ICV:} @var{nthreads-var}
3349 @item @emph{Scope:} data environment
3350 @item @emph{Description}:
3351 Specifies the default number of threads to use in parallel regions. The
3352 value of this variable shall be a comma-separated list of positive integers;
3353 the value specifies the number of threads to use for the corresponding nested
3354 level. Specifying more than one item in the list automatically enables
3355 nesting by default. If undefined one thread per CPU is used.
3357 When a list with more than value is specified, it also affects the
3358 @var{max-active-levels-var} ICV as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
3360 @item @emph{See also}:
3361 @ref{omp_set_num_threads}, @ref{OMP_MAX_ACTIVE_LEVELS}
3363 @item @emph{Reference}:
3364 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
3370 @section @env{OMP_PROC_BIND} -- Whether threads may be moved between CPUs
3371 @cindex Environment Variable
3373 @item @emph{ICV:} @var{bind-var}
3374 @item @emph{Scope:} data environment
3375 @item @emph{Description}:
3376 Specifies whether threads may be moved between processors. If set to
3377 @code{TRUE}, OpenMP threads should not be moved; if set to @code{FALSE}
3378 they may be moved. Alternatively, a comma separated list with the
3379 values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can
3380 be used to specify the thread affinity policy for the corresponding nesting
3381 level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the
3382 same place partition as the primary thread. With @code{CLOSE} those are
3383 kept close to the primary thread in contiguous place partitions. And
3384 with @code{SPREAD} a sparse distribution
3385 across the place partitions is used. Specifying more than one item in the
3386 list automatically enables nesting by default.
3388 When a list is specified, it also affects the @var{max-active-levels-var} ICV
3389 as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
3391 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
3392 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
3394 @item @emph{See also}:
3395 @ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY}, @ref{OMP_PLACES},
3396 @ref{OMP_MAX_ACTIVE_LEVELS}
3398 @item @emph{Reference}:
3399 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
3405 @section @env{OMP_PLACES} -- Specifies on which CPUs the threads should be placed
3406 @cindex Environment Variable
3408 @item @emph{ICV:} @var{place-partition-var}
3409 @item @emph{Scope:} implicit tasks
3410 @item @emph{Description}:
3411 The thread placement can be either specified using an abstract name or by an
3412 explicit list of the places. The abstract names @code{threads}, @code{cores},
3413 @code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
3414 followed by a positive number in parentheses, which denotes the how many places
3415 shall be created. With @code{threads} each place corresponds to a single
3416 hardware thread; @code{cores} to a single core with the corresponding number of
3417 hardware threads; with @code{sockets} the place corresponds to a single
3418 socket; with @code{ll_caches} to a set of cores that shares the last level
3419 cache on the device; and @code{numa_domains} to a set of cores for which their
3420 closest memory on the device is the same memory and at a similar distance from
3421 the cores. The resulting placement can be shown by setting the
3422 @env{OMP_DISPLAY_ENV} environment variable.
3424 Alternatively, the placement can be specified explicitly as comma-separated
3425 list of places. A place is specified by set of nonnegative numbers in curly
3426 braces, denoting the hardware threads. The curly braces can be omitted
3427 when only a single number has been specified. The hardware threads
3428 belonging to a place can either be specified as comma-separated list of
3429 nonnegative thread numbers or using an interval. Multiple places can also be
3430 either specified by a comma-separated list of places or by an interval. To
3431 specify an interval, a colon followed by the count is placed after
3432 the hardware thread number or the place. Optionally, the length can be
3433 followed by a colon and the stride number -- otherwise a unit stride is
3434 assumed. Placing an exclamation mark (@code{!}) directly before a curly
3435 brace or numbers inside the curly braces (excluding intervals)
3436 excludes those hardware threads.
3438 For instance, the following specifies the same places list:
3439 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
3440 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
3442 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
3443 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
3444 between CPUs following no placement policy.
3446 @item @emph{See also}:
3447 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
3448 @ref{OMP_DISPLAY_ENV}
3450 @item @emph{Reference}:
3451 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
3457 @section @env{OMP_STACKSIZE} -- Set default thread stack size
3458 @cindex Environment Variable
3460 @item @emph{ICV:} @var{stacksize-var}
3461 @item @emph{Scope:} device
3462 @item @emph{Description}:
3463 Set the default thread stack size in kilobytes, unless the number
3464 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
3465 case the size is, respectively, in bytes, kilobytes, megabytes
3466 or gigabytes. This is different from @code{pthread_attr_setstacksize}
3467 which gets the number of bytes as an argument. If the stack size cannot
3468 be set due to system constraints, an error is reported and the initial
3469 stack size is left unchanged. If undefined, the stack size is system
3472 @item @emph{See also}:
3473 @ref{GOMP_STACKSIZE}
3475 @item @emph{Reference}:
3476 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
3482 @section @env{OMP_SCHEDULE} -- How threads are scheduled
3483 @cindex Environment Variable
3484 @cindex Implementation specific setting
3486 @item @emph{ICV:} @var{run-sched-var}
3487 @item @emph{Scope:} data environment
3488 @item @emph{Description}:
3489 Allows to specify @code{schedule type} and @code{chunk size}.
3490 The value of the variable shall have the form: @code{type[,chunk]} where
3491 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
3492 The optional @code{chunk} size shall be a positive integer. If undefined,
3493 dynamic scheduling and a chunk size of 1 is used.
3495 @item @emph{See also}:
3496 @ref{omp_set_schedule}
3498 @item @emph{Reference}:
3499 @uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
3504 @node OMP_TARGET_OFFLOAD
3505 @section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behavior
3506 @cindex Environment Variable
3507 @cindex Implementation specific setting
3509 @item @emph{ICV:} @var{target-offload-var}
3510 @item @emph{Scope:} global
3511 @item @emph{Description}:
3512 Specifies the behavior with regard to offloading code to a device. This
3513 variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
3516 If set to @code{MANDATORY}, the program terminates with an error if
3517 any device construct or device memory routine uses a device that is unavailable
3518 or not supported by the implementation, or uses a non-conforming device number.
3519 If set to @code{DISABLED}, then offloading is disabled and all code runs on
3520 the host. If set to @code{DEFAULT}, the program tries offloading to the
3521 device first, then falls back to running code on the host if it cannot.
3523 If undefined, then the program behaves as if @code{DEFAULT} was set.
3525 Note: Even with @code{MANDATORY}, no run-time termination is performed when
3526 the device number in a @code{device} clause or argument to a device memory
3527 routine is for host, which includes using the device number in the
3528 @var{default-device-var} ICV. However, the initial value of
3529 the @var{default-device-var} ICV is affected by @code{MANDATORY}.
3531 @item @emph{See also}:
3532 @ref{OMP_DEFAULT_DEVICE}
3534 @item @emph{Reference}:
3535 @uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.8
3540 @node OMP_TEAMS_THREAD_LIMIT
3541 @section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams
3542 @cindex Environment Variable
3544 @item @emph{ICV:} @var{teams-thread-limit-var}
3545 @item @emph{Scope:} device
3546 @item @emph{Description}:
3547 Specifies an upper bound for the number of threads to use by each contention
3548 group created by a teams construct without explicit @code{thread_limit}
3549 clause. The value of this variable shall be a positive integer. If undefined,
3550 the value of 0 is used which stands for an implementation defined upper
3553 @item @emph{See also}:
3554 @ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit}
3556 @item @emph{Reference}:
3557 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24
3562 @node OMP_THREAD_LIMIT
3563 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
3564 @cindex Environment Variable
3566 @item @emph{ICV:} @var{thread-limit-var}
3567 @item @emph{Scope:} data environment
3568 @item @emph{Description}:
3569 Specifies the number of threads to use for the whole program. The
3570 value of this variable shall be a positive integer. If undefined,
3571 the number of threads is not limited.
3573 @item @emph{See also}:
3574 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
3576 @item @emph{Reference}:
3577 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
3582 @node OMP_WAIT_POLICY
3583 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
3584 @cindex Environment Variable
3586 @item @emph{Description}:
3587 Specifies whether waiting threads should be active or passive. If
3588 the value is @code{PASSIVE}, waiting threads should not consume CPU
3589 power while waiting; while the value is @code{ACTIVE} specifies that
3590 they should. If undefined, threads wait actively for a short time
3591 before waiting passively.
3593 @item @emph{See also}:
3594 @ref{GOMP_SPINCOUNT}
3596 @item @emph{Reference}:
3597 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
3602 @node GOMP_CPU_AFFINITY
3603 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
3604 @cindex Environment Variable
3606 @item @emph{Description}:
3607 Binds threads to specific CPUs. The variable should contain a space-separated
3608 or comma-separated list of CPUs. This list may contain different kinds of
3609 entries: either single CPU numbers in any order, a range of CPUs (M-N)
3610 or a range with some stride (M-N:S). CPU numbers are zero based. For example,
3611 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} binds the initial thread
3612 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
3613 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
3614 and 14 respectively and then starts assigning back from the beginning of
3615 the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
3617 There is no libgomp library routine to determine whether a CPU affinity
3618 specification is in effect. As a workaround, language-specific library
3619 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
3620 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
3621 environment variable. A defined CPU affinity on startup cannot be changed
3622 or disabled during the runtime of the application.
3624 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
3625 @env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
3626 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
3627 @code{FALSE}, the host system handles the assignment of threads to CPUs.
3629 @item @emph{See also}:
3630 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
3636 @section @env{GOMP_DEBUG} -- Enable debugging output
3637 @cindex Environment Variable
3639 @item @emph{Description}:
3640 Enable debugging output. The variable should be set to @code{0}
3641 (disabled, also the default if not set), or @code{1} (enabled).
3643 If enabled, some debugging output is printed during execution.
3644 This is currently not specified in more detail, and subject to change.
3649 @node GOMP_STACKSIZE
3650 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
3651 @cindex Environment Variable
3652 @cindex Implementation specific setting
3654 @item @emph{Description}:
3655 Set the default thread stack size in kilobytes. This is different from
3656 @code{pthread_attr_setstacksize} which gets the number of bytes as an
3657 argument. If the stack size cannot be set due to system constraints, an
3658 error is reported and the initial stack size is left unchanged. If undefined,
3659 the stack size is system dependent.
3661 @item @emph{See also}:
3664 @item @emph{Reference}:
3665 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
3666 GCC Patches Mailinglist},
3667 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
3668 GCC Patches Mailinglist}
3673 @node GOMP_SPINCOUNT
3674 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
3675 @cindex Environment Variable
3676 @cindex Implementation specific setting
3678 @item @emph{Description}:
3679 Determines how long a threads waits actively with consuming CPU power
3680 before waiting passively without consuming CPU power. The value may be
3681 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
3682 integer which gives the number of spins of the busy-wait loop. The
3683 integer may optionally be followed by the following suffixes acting
3684 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
3685 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
3686 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
3687 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
3688 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
3689 If there are more OpenMP threads than available CPUs, 1000 and 100
3690 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
3691 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
3692 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
3694 @item @emph{See also}:
3695 @ref{OMP_WAIT_POLICY}
3700 @node GOMP_RTEMS_THREAD_POOLS
3701 @section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
3702 @cindex Environment Variable
3703 @cindex Implementation specific setting
3705 @item @emph{Description}:
3706 This environment variable is only used on the RTEMS real-time operating system.
3707 It determines the scheduler instance specific thread pools. The format for
3708 @env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
3709 @code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
3710 separated by @code{:} where:
3712 @item @code{<thread-pool-count>} is the thread pool count for this scheduler
3714 @item @code{$<priority>} is an optional priority for the worker threads of a
3715 thread pool according to @code{pthread_setschedparam}. In case a priority
3716 value is omitted, then a worker thread inherits the priority of the OpenMP
3717 primary thread that created it. The priority of the worker thread is not
3718 changed after creation, even if a new OpenMP primary thread using the worker has
3719 a different priority.
3720 @item @code{@@<scheduler-name>} is the scheduler instance name according to the
3721 RTEMS application configuration.
3723 In case no thread pool configuration is specified for a scheduler instance,
3724 then each OpenMP primary thread of this scheduler instance uses its own
3725 dynamically allocated thread pool. To limit the worker thread count of the
3726 thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}.
3727 @item @emph{Example}:
3728 Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
3729 @code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
3730 @code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
3731 scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
3732 one thread pool available. Since no priority is specified for this scheduler
3733 instance, the worker thread inherits the priority of the OpenMP primary thread
3734 that created it. In the scheduler instance @code{WRK1} there are three thread
3735 pools available and their worker threads run at priority four.
3740 @c ---------------------------------------------------------------------
3742 @c ---------------------------------------------------------------------
3744 @node Enabling OpenACC
3745 @chapter Enabling OpenACC
3747 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
3748 flag @option{-fopenacc} must be specified. This enables the OpenACC directive
3749 @samp{#pragma acc} in C/C++ and, in Fortran, the @samp{!$acc} sentinel in free
3750 source form and the @samp{c$acc}, @samp{*$acc} and @samp{!$acc} sentinels in
3751 fixed source form. The flag also arranges for automatic linking of the OpenACC
3752 runtime library (@ref{OpenACC Runtime Library Routines}).
3754 See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
3756 A complete description of all OpenACC directives accepted may be found in
3757 the @uref{https://www.openacc.org, OpenACC} Application Programming
3758 Interface manual, version 2.6.
3762 @c ---------------------------------------------------------------------
3763 @c OpenACC Runtime Library Routines
3764 @c ---------------------------------------------------------------------
3766 @node OpenACC Runtime Library Routines
3767 @chapter OpenACC Runtime Library Routines
3769 The runtime routines described here are defined by section 3 of the OpenACC
3770 specifications in version 2.6.
3771 They have C linkage, and do not throw exceptions.
3772 Generally, they are available only for the host, with the exception of
3773 @code{acc_on_device}, which is available for both the host and the
3774 acceleration device.
3777 * acc_get_num_devices:: Get number of devices for the given device
3779 * acc_set_device_type:: Set type of device accelerator to use.
3780 * acc_get_device_type:: Get type of device accelerator to be used.
3781 * acc_set_device_num:: Set device number to use.
3782 * acc_get_device_num:: Get device number to be used.
3783 * acc_get_property:: Get device property.
3784 * acc_async_test:: Tests for completion of a specific asynchronous
3786 * acc_async_test_all:: Tests for completion of all asynchronous
3788 * acc_wait:: Wait for completion of a specific asynchronous
3790 * acc_wait_all:: Waits for completion of all asynchronous
3792 * acc_wait_all_async:: Wait for completion of all asynchronous
3794 * acc_wait_async:: Wait for completion of asynchronous operations.
3795 * acc_init:: Initialize runtime for a specific device type.
3796 * acc_shutdown:: Shuts down the runtime for a specific device
3798 * acc_on_device:: Whether executing on a particular device
3799 * acc_malloc:: Allocate device memory.
3800 * acc_free:: Free device memory.
3801 * acc_copyin:: Allocate device memory and copy host memory to
3803 * acc_present_or_copyin:: If the data is not present on the device,
3804 allocate device memory and copy from host
3806 * acc_create:: Allocate device memory and map it to host
3808 * acc_present_or_create:: If the data is not present on the device,
3809 allocate device memory and map it to host
3811 * acc_copyout:: Copy device memory to host memory.
3812 * acc_delete:: Free device memory.
3813 * acc_update_device:: Update device memory from mapped host memory.
3814 * acc_update_self:: Update host memory from mapped device memory.
3815 * acc_map_data:: Map previously allocated device memory to host
3817 * acc_unmap_data:: Unmap device memory from host memory.
3818 * acc_deviceptr:: Get device pointer associated with specific
3820 * acc_hostptr:: Get host pointer associated with specific
3822 * acc_is_present:: Indicate whether host variable / array is
3824 * acc_memcpy_to_device:: Copy host memory to device memory.
3825 * acc_memcpy_from_device:: Copy device memory to host memory.
3826 * acc_attach:: Let device pointer point to device-pointer target.
3827 * acc_detach:: Let device pointer point to host-pointer target.
3829 API routines for target platforms.
3831 * acc_get_current_cuda_device:: Get CUDA device handle.
3832 * acc_get_current_cuda_context::Get CUDA context handle.
3833 * acc_get_cuda_stream:: Get CUDA stream handle.
3834 * acc_set_cuda_stream:: Set CUDA stream handle.
3836 API routines for the OpenACC Profiling Interface.
3838 * acc_prof_register:: Register callbacks.
3839 * acc_prof_unregister:: Unregister callbacks.
3840 * acc_prof_lookup:: Obtain inquiry functions.
3841 * acc_register_library:: Library registration.
3846 @node acc_get_num_devices
3847 @section @code{acc_get_num_devices} -- Get number of devices for given device type
3849 @item @emph{Description}
3850 This function returns a value indicating the number of devices available
3851 for the device type specified in @var{devicetype}.
3854 @multitable @columnfractions .20 .80
3855 @item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
3858 @item @emph{Fortran}:
3859 @multitable @columnfractions .20 .80
3860 @item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
3861 @item @tab @code{integer(kind=acc_device_kind) devicetype}
3864 @item @emph{Reference}:
3865 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3871 @node acc_set_device_type
3872 @section @code{acc_set_device_type} -- Set type of device accelerator to use.
3874 @item @emph{Description}
3875 This function indicates to the runtime library which device type, specified
3876 in @var{devicetype}, to use when executing a parallel or kernels region.
3879 @multitable @columnfractions .20 .80
3880 @item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
3883 @item @emph{Fortran}:
3884 @multitable @columnfractions .20 .80
3885 @item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
3886 @item @tab @code{integer(kind=acc_device_kind) devicetype}
3889 @item @emph{Reference}:
3890 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3896 @node acc_get_device_type
3897 @section @code{acc_get_device_type} -- Get type of device accelerator to be used.
3899 @item @emph{Description}
3900 This function returns what device type will be used when executing a
3901 parallel or kernels region.
3903 This function returns @code{acc_device_none} if
3904 @code{acc_get_device_type} is called from
3905 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3906 callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
3907 Interface}), that is, if the device is currently being initialized.
3910 @multitable @columnfractions .20 .80
3911 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
3914 @item @emph{Fortran}:
3915 @multitable @columnfractions .20 .80
3916 @item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
3917 @item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
3920 @item @emph{Reference}:
3921 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3927 @node acc_set_device_num
3928 @section @code{acc_set_device_num} -- Set device number to use.
3930 @item @emph{Description}
3931 This function will indicate to the runtime which device number,
3932 specified by @var{devicenum}, associated with the specified device
3933 type @var{devicetype}.
3936 @multitable @columnfractions .20 .80
3937 @item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
3940 @item @emph{Fortran}:
3941 @multitable @columnfractions .20 .80
3942 @item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
3943 @item @tab @code{integer devicenum}
3944 @item @tab @code{integer(kind=acc_device_kind) devicetype}
3947 @item @emph{Reference}:
3948 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3954 @node acc_get_device_num
3955 @section @code{acc_get_device_num} -- Get device number to be used.
3957 @item @emph{Description}
3958 This function returns which device number associated with the specified device
3959 type @var{devicetype}, will be used when executing a parallel or kernels
3963 @multitable @columnfractions .20 .80
3964 @item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
3967 @item @emph{Fortran}:
3968 @multitable @columnfractions .20 .80
3969 @item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
3970 @item @tab @code{integer(kind=acc_device_kind) devicetype}
3971 @item @tab @code{integer acc_get_device_num}
3974 @item @emph{Reference}:
3975 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3981 @node acc_get_property
3982 @section @code{acc_get_property} -- Get device property.
3983 @cindex acc_get_property
3984 @cindex acc_get_property_string
3986 @item @emph{Description}
3987 These routines return the value of the specified @var{property} for the
3988 device being queried according to @var{devicenum} and @var{devicetype}.
3989 Integer-valued and string-valued properties are returned by
3990 @code{acc_get_property} and @code{acc_get_property_string} respectively.
3991 The Fortran @code{acc_get_property_string} subroutine returns the string
3992 retrieved in its fourth argument while the remaining entry points are
3993 functions, which pass the return value as their result.
3995 Note for Fortran, only: the OpenACC technical committee corrected and, hence,
3996 modified the interface introduced in OpenACC 2.6. The kind-value parameter
3997 @code{acc_device_property} has been renamed to @code{acc_device_property_kind}
3998 for consistency and the return type of the @code{acc_get_property} function is
3999 now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
4000 The parameter @code{acc_device_property} is still provided,
4001 but might be removed in a future version of GCC.
4004 @multitable @columnfractions .20 .80
4005 @item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
4006 @item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
4009 @item @emph{Fortran}:
4010 @multitable @columnfractions .20 .80
4011 @item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
4012 @item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
4013 @item @tab @code{use ISO_C_Binding, only: c_size_t}
4014 @item @tab @code{integer devicenum}
4015 @item @tab @code{integer(kind=acc_device_kind) devicetype}
4016 @item @tab @code{integer(kind=acc_device_property_kind) property}
4017 @item @tab @code{integer(kind=c_size_t) acc_get_property}
4018 @item @tab @code{character(*) string}
4021 @item @emph{Reference}:
4022 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4028 @node acc_async_test
4029 @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
4031 @item @emph{Description}
4032 This function tests for completion of the asynchronous operation specified
4033 in @var{arg}. In C/C++, a non-zero value is returned to indicate
4034 the specified asynchronous operation has completed while Fortran returns
4035 @code{true}. If the asynchronous operation has not completed, C/C++ returns
4036 zero and Fortran returns @code{false}.
4039 @multitable @columnfractions .20 .80
4040 @item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
4043 @item @emph{Fortran}:
4044 @multitable @columnfractions .20 .80
4045 @item @emph{Interface}: @tab @code{function acc_async_test(arg)}
4046 @item @tab @code{integer(kind=acc_handle_kind) arg}
4047 @item @tab @code{logical acc_async_test}
4050 @item @emph{Reference}:
4051 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4057 @node acc_async_test_all
4058 @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
4060 @item @emph{Description}
4061 This function tests for completion of all asynchronous operations.
4062 In C/C++, a non-zero value is returned to indicate all asynchronous
4063 operations have completed while Fortran returns @code{true}. If
4064 any asynchronous operation has not completed, C/C++ returns zero and
4065 Fortran returns @code{false}.
4068 @multitable @columnfractions .20 .80
4069 @item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
4072 @item @emph{Fortran}:
4073 @multitable @columnfractions .20 .80
4074 @item @emph{Interface}: @tab @code{function acc_async_test()}
4075 @item @tab @code{logical acc_get_device_num}
4078 @item @emph{Reference}:
4079 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4086 @section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
4088 @item @emph{Description}
4089 This function waits for completion of the asynchronous operation
4090 specified in @var{arg}.
4093 @multitable @columnfractions .20 .80
4094 @item @emph{Prototype}: @tab @code{acc_wait(arg);}
4095 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
4098 @item @emph{Fortran}:
4099 @multitable @columnfractions .20 .80
4100 @item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
4101 @item @tab @code{integer(acc_handle_kind) arg}
4102 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
4103 @item @tab @code{integer(acc_handle_kind) arg}
4106 @item @emph{Reference}:
4107 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4114 @section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
4116 @item @emph{Description}
4117 This function waits for the completion of all asynchronous operations.
4120 @multitable @columnfractions .20 .80
4121 @item @emph{Prototype}: @tab @code{acc_wait_all(void);}
4122 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
4125 @item @emph{Fortran}:
4126 @multitable @columnfractions .20 .80
4127 @item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
4128 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
4131 @item @emph{Reference}:
4132 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4138 @node acc_wait_all_async
4139 @section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
4141 @item @emph{Description}
4142 This function enqueues a wait operation on the queue @var{async} for any
4143 and all asynchronous operations that have been previously enqueued on
4147 @multitable @columnfractions .20 .80
4148 @item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
4151 @item @emph{Fortran}:
4152 @multitable @columnfractions .20 .80
4153 @item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
4154 @item @tab @code{integer(acc_handle_kind) async}
4157 @item @emph{Reference}:
4158 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4164 @node acc_wait_async
4165 @section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
4167 @item @emph{Description}
4168 This function enqueues a wait operation on queue @var{async} for any and all
4169 asynchronous operations enqueued on queue @var{arg}.
4172 @multitable @columnfractions .20 .80
4173 @item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
4176 @item @emph{Fortran}:
4177 @multitable @columnfractions .20 .80
4178 @item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
4179 @item @tab @code{integer(acc_handle_kind) arg, async}
4182 @item @emph{Reference}:
4183 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4190 @section @code{acc_init} -- Initialize runtime for a specific device type.
4192 @item @emph{Description}
4193 This function initializes the runtime for the device type specified in
4197 @multitable @columnfractions .20 .80
4198 @item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
4201 @item @emph{Fortran}:
4202 @multitable @columnfractions .20 .80
4203 @item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
4204 @item @tab @code{integer(acc_device_kind) devicetype}
4207 @item @emph{Reference}:
4208 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4215 @section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
4217 @item @emph{Description}
4218 This function shuts down the runtime for the device type specified in
4222 @multitable @columnfractions .20 .80
4223 @item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
4226 @item @emph{Fortran}:
4227 @multitable @columnfractions .20 .80
4228 @item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
4229 @item @tab @code{integer(acc_device_kind) devicetype}
4232 @item @emph{Reference}:
4233 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4240 @section @code{acc_on_device} -- Whether executing on a particular device
4242 @item @emph{Description}:
4243 This function returns whether the program is executing on a particular
4244 device specified in @var{devicetype}. In C/C++ a non-zero value is
4245 returned to indicate the device is executing on the specified device type.
4246 In Fortran, @code{true} is returned. If the program is not executing
4247 on the specified device type C/C++ returns zero, while Fortran
4248 returns @code{false}.
4251 @multitable @columnfractions .20 .80
4252 @item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
4255 @item @emph{Fortran}:
4256 @multitable @columnfractions .20 .80
4257 @item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
4258 @item @tab @code{integer(acc_device_kind) devicetype}
4259 @item @tab @code{logical acc_on_device}
4263 @item @emph{Reference}:
4264 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4271 @section @code{acc_malloc} -- Allocate device memory.
4273 @item @emph{Description}
4274 This function allocates @var{len} bytes of device memory. It returns
4275 the device address of the allocated memory.
4278 @multitable @columnfractions .20 .80
4279 @item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
4282 @item @emph{Reference}:
4283 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4290 @section @code{acc_free} -- Free device memory.
4292 @item @emph{Description}
4293 Free previously allocated device memory at the device address @code{a}.
4296 @multitable @columnfractions .20 .80
4297 @item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
4300 @item @emph{Reference}:
4301 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4308 @section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
4310 @item @emph{Description}
4311 In C/C++, this function allocates @var{len} bytes of device memory
4312 and maps it to the specified host address in @var{a}. The device
4313 address of the newly allocated device memory is returned.
4315 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4316 a contiguous array section. The second form @var{a} specifies a
4317 variable or array element and @var{len} specifies the length in bytes.
4320 @multitable @columnfractions .20 .80
4321 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
4322 @item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
4325 @item @emph{Fortran}:
4326 @multitable @columnfractions .20 .80
4327 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
4328 @item @tab @code{type, dimension(:[,:]...) :: a}
4329 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
4330 @item @tab @code{type, dimension(:[,:]...) :: a}
4331 @item @tab @code{integer len}
4332 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
4333 @item @tab @code{type, dimension(:[,:]...) :: a}
4334 @item @tab @code{integer(acc_handle_kind) :: async}
4335 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
4336 @item @tab @code{type, dimension(:[,:]...) :: a}
4337 @item @tab @code{integer len}
4338 @item @tab @code{integer(acc_handle_kind) :: async}
4341 @item @emph{Reference}:
4342 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4348 @node acc_present_or_copyin
4349 @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
4351 @item @emph{Description}
4352 This function tests if the host data specified by @var{a} and of length
4353 @var{len} is present or not. If it is not present, device memory
4354 is allocated and the host memory copied. The device address of
4355 the newly allocated device memory is returned.
4357 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4358 a contiguous array section. The second form @var{a} specifies a variable or
4359 array element and @var{len} specifies the length in bytes.
4361 Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
4362 backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
4365 @multitable @columnfractions .20 .80
4366 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
4367 @item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
4370 @item @emph{Fortran}:
4371 @multitable @columnfractions .20 .80
4372 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
4373 @item @tab @code{type, dimension(:[,:]...) :: a}
4374 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
4375 @item @tab @code{type, dimension(:[,:]...) :: a}
4376 @item @tab @code{integer len}
4377 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
4378 @item @tab @code{type, dimension(:[,:]...) :: a}
4379 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
4380 @item @tab @code{type, dimension(:[,:]...) :: a}
4381 @item @tab @code{integer len}
4384 @item @emph{Reference}:
4385 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4392 @section @code{acc_create} -- Allocate device memory and map it to host memory.
4394 @item @emph{Description}
4395 This function allocates device memory and maps it to host memory specified
4396 by the host address @var{a} with a length of @var{len} bytes. In C/C++,
4397 the function returns the device address of the allocated device memory.
4399 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4400 a contiguous array section. The second form @var{a} specifies a variable or
4401 array element and @var{len} specifies the length in bytes.
4404 @multitable @columnfractions .20 .80
4405 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
4406 @item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
4409 @item @emph{Fortran}:
4410 @multitable @columnfractions .20 .80
4411 @item @emph{Interface}: @tab @code{subroutine acc_create(a)}
4412 @item @tab @code{type, dimension(:[,:]...) :: a}
4413 @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
4414 @item @tab @code{type, dimension(:[,:]...) :: a}
4415 @item @tab @code{integer len}
4416 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
4417 @item @tab @code{type, dimension(:[,:]...) :: a}
4418 @item @tab @code{integer(acc_handle_kind) :: async}
4419 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
4420 @item @tab @code{type, dimension(:[,:]...) :: a}
4421 @item @tab @code{integer len}
4422 @item @tab @code{integer(acc_handle_kind) :: async}
4425 @item @emph{Reference}:
4426 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4432 @node acc_present_or_create
4433 @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
4435 @item @emph{Description}
4436 This function tests if the host data specified by @var{a} and of length
4437 @var{len} is present or not. If it is not present, device memory
4438 is allocated and mapped to host memory. In C/C++, the device address
4439 of the newly allocated device memory is returned.
4441 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4442 a contiguous array section. The second form @var{a} specifies a variable or
4443 array element and @var{len} specifies the length in bytes.
4445 Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
4446 backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
4449 @multitable @columnfractions .20 .80
4450 @item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
4451 @item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
4454 @item @emph{Fortran}:
4455 @multitable @columnfractions .20 .80
4456 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
4457 @item @tab @code{type, dimension(:[,:]...) :: a}
4458 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
4459 @item @tab @code{type, dimension(:[,:]...) :: a}
4460 @item @tab @code{integer len}
4461 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
4462 @item @tab @code{type, dimension(:[,:]...) :: a}
4463 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
4464 @item @tab @code{type, dimension(:[,:]...) :: a}
4465 @item @tab @code{integer len}
4468 @item @emph{Reference}:
4469 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4476 @section @code{acc_copyout} -- Copy device memory to host memory.
4478 @item @emph{Description}
4479 This function copies mapped device memory to host memory which is specified
4480 by host address @var{a} for a length @var{len} bytes in C/C++.
4482 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4483 a contiguous array section. The second form @var{a} specifies a variable or
4484 array element and @var{len} specifies the length in bytes.
4487 @multitable @columnfractions .20 .80
4488 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
4489 @item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
4490 @item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
4491 @item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
4494 @item @emph{Fortran}:
4495 @multitable @columnfractions .20 .80
4496 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
4497 @item @tab @code{type, dimension(:[,:]...) :: a}
4498 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
4499 @item @tab @code{type, dimension(:[,:]...) :: a}
4500 @item @tab @code{integer len}
4501 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
4502 @item @tab @code{type, dimension(:[,:]...) :: a}
4503 @item @tab @code{integer(acc_handle_kind) :: async}
4504 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
4505 @item @tab @code{type, dimension(:[,:]...) :: a}
4506 @item @tab @code{integer len}
4507 @item @tab @code{integer(acc_handle_kind) :: async}
4508 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
4509 @item @tab @code{type, dimension(:[,:]...) :: a}
4510 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
4511 @item @tab @code{type, dimension(:[,:]...) :: a}
4512 @item @tab @code{integer len}
4513 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
4514 @item @tab @code{type, dimension(:[,:]...) :: a}
4515 @item @tab @code{integer(acc_handle_kind) :: async}
4516 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
4517 @item @tab @code{type, dimension(:[,:]...) :: a}
4518 @item @tab @code{integer len}
4519 @item @tab @code{integer(acc_handle_kind) :: async}
4522 @item @emph{Reference}:
4523 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4530 @section @code{acc_delete} -- Free device memory.
4532 @item @emph{Description}
4533 This function frees previously allocated device memory specified by
4534 the device address @var{a} and the length of @var{len} bytes.
4536 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4537 a contiguous array section. The second form @var{a} specifies a variable or
4538 array element and @var{len} specifies the length in bytes.
4541 @multitable @columnfractions .20 .80
4542 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
4543 @item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
4544 @item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
4545 @item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
4548 @item @emph{Fortran}:
4549 @multitable @columnfractions .20 .80
4550 @item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
4551 @item @tab @code{type, dimension(:[,:]...) :: a}
4552 @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
4553 @item @tab @code{type, dimension(:[,:]...) :: a}
4554 @item @tab @code{integer len}
4555 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
4556 @item @tab @code{type, dimension(:[,:]...) :: a}
4557 @item @tab @code{integer(acc_handle_kind) :: async}
4558 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
4559 @item @tab @code{type, dimension(:[,:]...) :: a}
4560 @item @tab @code{integer len}
4561 @item @tab @code{integer(acc_handle_kind) :: async}
4562 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
4563 @item @tab @code{type, dimension(:[,:]...) :: a}
4564 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
4565 @item @tab @code{type, dimension(:[,:]...) :: a}
4566 @item @tab @code{integer len}
4567 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
4568 @item @tab @code{type, dimension(:[,:]...) :: a}
4569 @item @tab @code{integer(acc_handle_kind) :: async}
4570 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
4571 @item @tab @code{type, dimension(:[,:]...) :: a}
4572 @item @tab @code{integer len}
4573 @item @tab @code{integer(acc_handle_kind) :: async}
4576 @item @emph{Reference}:
4577 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4583 @node acc_update_device
4584 @section @code{acc_update_device} -- Update device memory from mapped host memory.
4586 @item @emph{Description}
4587 This function updates the device copy from the previously mapped host memory.
4588 The host memory is specified with the host address @var{a} and a length of
4591 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4592 a contiguous array section. The second form @var{a} specifies a variable or
4593 array element and @var{len} specifies the length in bytes.
4596 @multitable @columnfractions .20 .80
4597 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
4598 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
4601 @item @emph{Fortran}:
4602 @multitable @columnfractions .20 .80
4603 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
4604 @item @tab @code{type, dimension(:[,:]...) :: a}
4605 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
4606 @item @tab @code{type, dimension(:[,:]...) :: a}
4607 @item @tab @code{integer len}
4608 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
4609 @item @tab @code{type, dimension(:[,:]...) :: a}
4610 @item @tab @code{integer(acc_handle_kind) :: async}
4611 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
4612 @item @tab @code{type, dimension(:[,:]...) :: a}
4613 @item @tab @code{integer len}
4614 @item @tab @code{integer(acc_handle_kind) :: async}
4617 @item @emph{Reference}:
4618 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4624 @node acc_update_self
4625 @section @code{acc_update_self} -- Update host memory from mapped device memory.
4627 @item @emph{Description}
4628 This function updates the host copy from the previously mapped device memory.
4629 The host memory is specified with the host address @var{a} and a length of
4632 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4633 a contiguous array section. The second form @var{a} specifies a variable or
4634 array element and @var{len} specifies the length in bytes.
4637 @multitable @columnfractions .20 .80
4638 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
4639 @item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
4642 @item @emph{Fortran}:
4643 @multitable @columnfractions .20 .80
4644 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
4645 @item @tab @code{type, dimension(:[,:]...) :: a}
4646 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
4647 @item @tab @code{type, dimension(:[,:]...) :: a}
4648 @item @tab @code{integer len}
4649 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
4650 @item @tab @code{type, dimension(:[,:]...) :: a}
4651 @item @tab @code{integer(acc_handle_kind) :: async}
4652 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
4653 @item @tab @code{type, dimension(:[,:]...) :: a}
4654 @item @tab @code{integer len}
4655 @item @tab @code{integer(acc_handle_kind) :: async}
4658 @item @emph{Reference}:
4659 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4666 @section @code{acc_map_data} -- Map previously allocated device memory to host memory.
4668 @item @emph{Description}
4669 This function maps previously allocated device and host memory. The device
4670 memory is specified with the device address @var{d}. The host memory is
4671 specified with the host address @var{h} and a length of @var{len}.
4674 @multitable @columnfractions .20 .80
4675 @item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
4678 @item @emph{Reference}:
4679 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4685 @node acc_unmap_data
4686 @section @code{acc_unmap_data} -- Unmap device memory from host memory.
4688 @item @emph{Description}
4689 This function unmaps previously mapped device and host memory. The latter
4690 specified by @var{h}.
4693 @multitable @columnfractions .20 .80
4694 @item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
4697 @item @emph{Reference}:
4698 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4705 @section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
4707 @item @emph{Description}
4708 This function returns the device address that has been mapped to the
4709 host address specified by @var{h}.
4712 @multitable @columnfractions .20 .80
4713 @item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
4716 @item @emph{Reference}:
4717 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4724 @section @code{acc_hostptr} -- Get host pointer associated with specific device address.
4726 @item @emph{Description}
4727 This function returns the host address that has been mapped to the
4728 device address specified by @var{d}.
4731 @multitable @columnfractions .20 .80
4732 @item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
4735 @item @emph{Reference}:
4736 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4742 @node acc_is_present
4743 @section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
4745 @item @emph{Description}
4746 This function indicates whether the specified host address in @var{a} and a
4747 length of @var{len} bytes is present on the device. In C/C++, a non-zero
4748 value is returned to indicate the presence of the mapped memory on the
4749 device. A zero is returned to indicate the memory is not mapped on the
4752 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4753 a contiguous array section. The second form @var{a} specifies a variable or
4754 array element and @var{len} specifies the length in bytes. If the host
4755 memory is mapped to device memory, then a @code{true} is returned. Otherwise,
4756 a @code{false} is return to indicate the mapped memory is not present.
4759 @multitable @columnfractions .20 .80
4760 @item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
4763 @item @emph{Fortran}:
4764 @multitable @columnfractions .20 .80
4765 @item @emph{Interface}: @tab @code{function acc_is_present(a)}
4766 @item @tab @code{type, dimension(:[,:]...) :: a}
4767 @item @tab @code{logical acc_is_present}
4768 @item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
4769 @item @tab @code{type, dimension(:[,:]...) :: a}
4770 @item @tab @code{integer len}
4771 @item @tab @code{logical acc_is_present}
4774 @item @emph{Reference}:
4775 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4781 @node acc_memcpy_to_device
4782 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
4784 @item @emph{Description}
4785 This function copies host memory specified by host address of @var{src} to
4786 device memory specified by the device address @var{dest} for a length of
4790 @multitable @columnfractions .20 .80
4791 @item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
4794 @item @emph{Reference}:
4795 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4801 @node acc_memcpy_from_device
4802 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
4804 @item @emph{Description}
4805 This function copies host memory specified by host address of @var{src} from
4806 device memory specified by the device address @var{dest} for a length of
4810 @multitable @columnfractions .20 .80
4811 @item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
4814 @item @emph{Reference}:
4815 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4822 @section @code{acc_attach} -- Let device pointer point to device-pointer target.
4824 @item @emph{Description}
4825 This function updates a pointer on the device from pointing to a host-pointer
4826 address to pointing to the corresponding device data.
4829 @multitable @columnfractions .20 .80
4830 @item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
4831 @item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
4834 @item @emph{Reference}:
4835 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4842 @section @code{acc_detach} -- Let device pointer point to host-pointer target.
4844 @item @emph{Description}
4845 This function updates a pointer on the device from pointing to a device-pointer
4846 address to pointing to the corresponding host data.
4849 @multitable @columnfractions .20 .80
4850 @item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
4851 @item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
4852 @item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
4853 @item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
4856 @item @emph{Reference}:
4857 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4863 @node acc_get_current_cuda_device
4864 @section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
4866 @item @emph{Description}
4867 This function returns the CUDA device handle. This handle is the same
4868 as used by the CUDA Runtime or Driver API's.
4871 @multitable @columnfractions .20 .80
4872 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
4875 @item @emph{Reference}:
4876 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4882 @node acc_get_current_cuda_context
4883 @section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
4885 @item @emph{Description}
4886 This function returns the CUDA context handle. This handle is the same
4887 as used by the CUDA Runtime or Driver API's.
4890 @multitable @columnfractions .20 .80
4891 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
4894 @item @emph{Reference}:
4895 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4901 @node acc_get_cuda_stream
4902 @section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
4904 @item @emph{Description}
4905 This function returns the CUDA stream handle for the queue @var{async}.
4906 This handle is the same as used by the CUDA Runtime or Driver API's.
4909 @multitable @columnfractions .20 .80
4910 @item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
4913 @item @emph{Reference}:
4914 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4920 @node acc_set_cuda_stream
4921 @section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
4923 @item @emph{Description}
4924 This function associates the stream handle specified by @var{stream} with
4925 the queue @var{async}.
4927 This cannot be used to change the stream handle associated with
4928 @code{acc_async_sync}.
4930 The return value is not specified.
4933 @multitable @columnfractions .20 .80
4934 @item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
4937 @item @emph{Reference}:
4938 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4944 @node acc_prof_register
4945 @section @code{acc_prof_register} -- Register callbacks.
4947 @item @emph{Description}:
4948 This function registers callbacks.
4951 @multitable @columnfractions .20 .80
4952 @item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
4955 @item @emph{See also}:
4956 @ref{OpenACC Profiling Interface}
4958 @item @emph{Reference}:
4959 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4965 @node acc_prof_unregister
4966 @section @code{acc_prof_unregister} -- Unregister callbacks.
4968 @item @emph{Description}:
4969 This function unregisters callbacks.
4972 @multitable @columnfractions .20 .80
4973 @item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
4976 @item @emph{See also}:
4977 @ref{OpenACC Profiling Interface}
4979 @item @emph{Reference}:
4980 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4986 @node acc_prof_lookup
4987 @section @code{acc_prof_lookup} -- Obtain inquiry functions.
4989 @item @emph{Description}:
4990 Function to obtain inquiry functions.
4993 @multitable @columnfractions .20 .80
4994 @item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
4997 @item @emph{See also}:
4998 @ref{OpenACC Profiling Interface}
5000 @item @emph{Reference}:
5001 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5007 @node acc_register_library
5008 @section @code{acc_register_library} -- Library registration.
5010 @item @emph{Description}:
5011 Function for library registration.
5014 @multitable @columnfractions .20 .80
5015 @item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
5018 @item @emph{See also}:
5019 @ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
5021 @item @emph{Reference}:
5022 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5028 @c ---------------------------------------------------------------------
5029 @c OpenACC Environment Variables
5030 @c ---------------------------------------------------------------------
5032 @node OpenACC Environment Variables
5033 @chapter OpenACC Environment Variables
5035 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
5036 are defined by section 4 of the OpenACC specification in version 2.0.
5037 The variable @env{ACC_PROFLIB}
5038 is defined by section 4 of the OpenACC specification in version 2.6.
5048 @node ACC_DEVICE_TYPE
5049 @section @code{ACC_DEVICE_TYPE}
5051 @item @emph{Description}:
5052 Control the default device type to use when executing compute regions.
5053 If unset, the code can be run on any device type, favoring a non-host
5056 Supported values in GCC (if compiled in) are
5062 @item @emph{Reference}:
5063 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5069 @node ACC_DEVICE_NUM
5070 @section @code{ACC_DEVICE_NUM}
5072 @item @emph{Description}:
5073 Control which device, identified by device number, is the default device.
5074 The value must be a nonnegative integer less than the number of devices.
5075 If unset, device number zero is used.
5076 @item @emph{Reference}:
5077 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5084 @section @code{ACC_PROFLIB}
5086 @item @emph{Description}:
5087 Semicolon-separated list of dynamic libraries that are loaded as profiling
5088 libraries. Each library must provide at least the @code{acc_register_library}
5089 routine. Each library file is found as described by the documentation of
5090 @code{dlopen} of your operating system.
5091 @item @emph{See also}:
5092 @ref{acc_register_library}, @ref{OpenACC Profiling Interface}
5094 @item @emph{Reference}:
5095 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5101 @c ---------------------------------------------------------------------
5102 @c CUDA Streams Usage
5103 @c ---------------------------------------------------------------------
5105 @node CUDA Streams Usage
5106 @chapter CUDA Streams Usage
5108 This applies to the @code{nvptx} plugin only.
5110 The library provides elements that perform asynchronous movement of
5111 data and asynchronous operation of computing constructs. This
5112 asynchronous functionality is implemented by making use of CUDA
5113 streams@footnote{See "Stream Management" in "CUDA Driver API",
5114 TRM-06703-001, Version 5.5, for additional information}.
5116 The primary means by that the asynchronous functionality is accessed
5117 is through the use of those OpenACC directives which make use of the
5118 @code{async} and @code{wait} clauses. When the @code{async} clause is
5119 first used with a directive, it creates a CUDA stream. If an
5120 @code{async-argument} is used with the @code{async} clause, then the
5121 stream is associated with the specified @code{async-argument}.
5123 Following the creation of an association between a CUDA stream and the
5124 @code{async-argument} of an @code{async} clause, both the @code{wait}
5125 clause and the @code{wait} directive can be used. When either the
5126 clause or directive is used after stream creation, it creates a
5127 rendezvous point whereby execution waits until all operations
5128 associated with the @code{async-argument}, that is, stream, have
5131 Normally, the management of the streams that are created as a result of
5132 using the @code{async} clause, is done without any intervention by the
5133 caller. This implies the association between the @code{async-argument}
5134 and the CUDA stream is maintained for the lifetime of the program.
5135 However, this association can be changed through the use of the library
5136 function @code{acc_set_cuda_stream}. When the function
5137 @code{acc_set_cuda_stream} is called, the CUDA stream that was
5138 originally associated with the @code{async} clause is destroyed.
5139 Caution should be taken when changing the association as subsequent
5140 references to the @code{async-argument} refer to a different
5145 @c ---------------------------------------------------------------------
5146 @c OpenACC Library Interoperability
5147 @c ---------------------------------------------------------------------
5149 @node OpenACC Library Interoperability
5150 @chapter OpenACC Library Interoperability
5152 @section Introduction
5154 The OpenACC library uses the CUDA Driver API, and may interact with
5155 programs that use the Runtime library directly, or another library
5156 based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
5157 "Interactions with the CUDA Driver API" in
5158 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
5159 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
5160 for additional information on library interoperability.}.
5161 This chapter describes the use cases and what changes are
5162 required in order to use both the OpenACC library and the CUBLAS and Runtime
5163 libraries within a program.
5165 @section First invocation: NVIDIA CUBLAS library API
5167 In this first use case (see below), a function in the CUBLAS library is called
5168 prior to any of the functions in the OpenACC library. More specifically, the
5169 function @code{cublasCreate()}.
5171 When invoked, the function initializes the library and allocates the
5172 hardware resources on the host and the device on behalf of the caller. Once
5173 the initialization and allocation has completed, a handle is returned to the
5174 caller. The OpenACC library also requires initialization and allocation of
5175 hardware resources. Since the CUBLAS library has already allocated the
5176 hardware resources for the device, all that is left to do is to initialize
5177 the OpenACC library and acquire the hardware resources on the host.
5179 Prior to calling the OpenACC function that initializes the library and
5180 allocate the host hardware resources, you need to acquire the device number
5181 that was allocated during the call to @code{cublasCreate()}. The invoking of the
5182 runtime library function @code{cudaGetDevice()} accomplishes this. Once
5183 acquired, the device number is passed along with the device type as
5184 parameters to the OpenACC library function @code{acc_set_device_num()}.
5186 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
5187 library uses the context that was created during the call to
5188 @code{cublasCreate()}. In other words, both libraries share the
5192 /* Create the handle */
5193 s = cublasCreate(&h);
5194 if (s != CUBLAS_STATUS_SUCCESS)
5196 fprintf(stderr, "cublasCreate failed %d\n", s);
5200 /* Get the device number */
5201 e = cudaGetDevice(&dev);
5202 if (e != cudaSuccess)
5204 fprintf(stderr, "cudaGetDevice failed %d\n", e);
5208 /* Initialize OpenACC library and use device 'dev' */
5209 acc_set_device_num(dev, acc_device_nvidia);
5214 @section First invocation: OpenACC library API
5216 In this second use case (see below), a function in the OpenACC library is
5217 called prior to any of the functions in the CUBLAS library. More specifically,
5218 the function @code{acc_set_device_num()}.
5220 In the use case presented here, the function @code{acc_set_device_num()}
5221 is used to both initialize the OpenACC library and allocate the hardware
5222 resources on the host and the device. In the call to the function, the
5223 call parameters specify which device to use and what device
5224 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
5225 is but one method to initialize the OpenACC library and allocate the
5226 appropriate hardware resources. Other methods are available through the
5227 use of environment variables and these is discussed in the next section.
5229 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
5230 functions can be called as seen with multiple calls being made to
5231 @code{acc_copyin()}. In addition, calls can be made to functions in the
5232 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
5233 subsequent to the calls to @code{acc_copyin()}.
5234 As seen in the previous use case, a call to @code{cublasCreate()}
5235 initializes the CUBLAS library and allocates the hardware resources on the
5236 host and the device. However, since the device has already been allocated,
5237 @code{cublasCreate()} only initializes the CUBLAS library and allocates
5238 the appropriate hardware resources on the host. The context that was created
5239 as part of the OpenACC initialization is shared with the CUBLAS library,
5240 similarly to the first use case.
5245 acc_set_device_num(dev, acc_device_nvidia);
5247 /* Copy the first set to the device */
5248 d_X = acc_copyin(&h_X[0], N * sizeof (float));
5251 fprintf(stderr, "copyin error h_X\n");
5255 /* Copy the second set to the device */
5256 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
5259 fprintf(stderr, "copyin error h_Y1\n");
5263 /* Create the handle */
5264 s = cublasCreate(&h);
5265 if (s != CUBLAS_STATUS_SUCCESS)
5267 fprintf(stderr, "cublasCreate failed %d\n", s);
5271 /* Perform saxpy using CUBLAS library function */
5272 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
5273 if (s != CUBLAS_STATUS_SUCCESS)
5275 fprintf(stderr, "cublasSaxpy failed %d\n", s);
5279 /* Copy the results from the device */
5280 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
5285 @section OpenACC library and environment variables
5287 There are two environment variables associated with the OpenACC library
5288 that may be used to control the device type and device number:
5289 @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
5290 environment variables can be used as an alternative to calling
5291 @code{acc_set_device_num()}. As seen in the second use case, the device
5292 type and device number were specified using @code{acc_set_device_num()}.
5293 If however, the aforementioned environment variables were set, then the
5294 call to @code{acc_set_device_num()} would not be required.
5297 The use of the environment variables is only relevant when an OpenACC function
5298 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
5299 is called prior to a call to an OpenACC function, then you must call
5300 @code{acc_set_device_num()}@footnote{More complete information
5301 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
5302 sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
5303 Application Programming Interface”, Version 2.6.}
5307 @c ---------------------------------------------------------------------
5308 @c OpenACC Profiling Interface
5309 @c ---------------------------------------------------------------------
5311 @node OpenACC Profiling Interface
5312 @chapter OpenACC Profiling Interface
5314 @section Implementation Status and Implementation-Defined Behavior
5316 We're implementing the OpenACC Profiling Interface as defined by the
5317 OpenACC 2.6 specification. We're clarifying some aspects here as
5318 @emph{implementation-defined behavior}, while they're still under
5319 discussion within the OpenACC Technical Committee.
5321 This implementation is tuned to keep the performance impact as low as
5322 possible for the (very common) case that the Profiling Interface is
5323 not enabled. This is relevant, as the Profiling Interface affects all
5324 the @emph{hot} code paths (in the target code, not in the offloaded
5325 code). Users of the OpenACC Profiling Interface can be expected to
5326 understand that performance is impacted to some degree once the
5327 Profiling Interface is enabled: for example, because of the
5328 @emph{runtime} (libgomp) calling into a third-party @emph{library} for
5329 every event that has been registered.
5331 We're not yet accounting for the fact that @cite{OpenACC events may
5332 occur during event processing}.
5333 We just handle one case specially, as required by CUDA 9.0
5334 @command{nvprof}, that @code{acc_get_device_type}
5335 (@ref{acc_get_device_type})) may be called from
5336 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
5339 We're not yet implementing initialization via a
5340 @code{acc_register_library} function that is either statically linked
5341 in, or dynamically via @env{LD_PRELOAD}.
5342 Initialization via @code{acc_register_library} functions dynamically
5343 loaded via the @env{ACC_PROFLIB} environment variable does work, as
5344 does directly calling @code{acc_prof_register},
5345 @code{acc_prof_unregister}, @code{acc_prof_lookup}.
5347 As currently there are no inquiry functions defined, calls to
5348 @code{acc_prof_lookup} always returns @code{NULL}.
5350 There aren't separate @emph{start}, @emph{stop} events defined for the
5351 event types @code{acc_ev_create}, @code{acc_ev_delete},
5352 @code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
5353 should be triggered before or after the actual device-specific call is
5354 made. We trigger them after.
5356 Remarks about data provided to callbacks:
5360 @item @code{acc_prof_info.event_type}
5361 It's not clear if for @emph{nested} event callbacks (for example,
5362 @code{acc_ev_enqueue_launch_start} as part of a parent compute
5363 construct), this should be set for the nested event
5364 (@code{acc_ev_enqueue_launch_start}), or if the value of the parent
5365 construct should remain (@code{acc_ev_compute_construct_start}). In
5366 this implementation, the value generally corresponds to the
5367 innermost nested event type.
5369 @item @code{acc_prof_info.device_type}
5373 For @code{acc_ev_compute_construct_start}, and in presence of an
5374 @code{if} clause with @emph{false} argument, this still refers to
5375 the offloading device type.
5376 It's not clear if that's the expected behavior.
5379 Complementary to the item before, for
5380 @code{acc_ev_compute_construct_end}, this is set to
5381 @code{acc_device_host} in presence of an @code{if} clause with
5382 @emph{false} argument.
5383 It's not clear if that's the expected behavior.
5387 @item @code{acc_prof_info.thread_id}
5388 Always @code{-1}; not yet implemented.
5390 @item @code{acc_prof_info.async}
5394 Not yet implemented correctly for
5395 @code{acc_ev_compute_construct_start}.
5398 In a compute construct, for host-fallback
5399 execution/@code{acc_device_host} it always is
5400 @code{acc_async_sync}.
5401 It is unclear if that is the expected behavior.
5404 For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
5405 it will always be @code{acc_async_sync}.
5406 It is unclear if that is the expected behavior.
5410 @item @code{acc_prof_info.async_queue}
5411 There is no @cite{limited number of asynchronous queues} in libgomp.
5412 This always has the same value as @code{acc_prof_info.async}.
5414 @item @code{acc_prof_info.src_file}
5415 Always @code{NULL}; not yet implemented.
5417 @item @code{acc_prof_info.func_name}
5418 Always @code{NULL}; not yet implemented.
5420 @item @code{acc_prof_info.line_no}
5421 Always @code{-1}; not yet implemented.
5423 @item @code{acc_prof_info.end_line_no}
5424 Always @code{-1}; not yet implemented.
5426 @item @code{acc_prof_info.func_line_no}
5427 Always @code{-1}; not yet implemented.
5429 @item @code{acc_prof_info.func_end_line_no}
5430 Always @code{-1}; not yet implemented.
5432 @item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
5433 Relating to @code{acc_prof_info.event_type} discussed above, in this
5434 implementation, this will always be the same value as
5435 @code{acc_prof_info.event_type}.
5437 @item @code{acc_event_info.*.parent_construct}
5441 Will be @code{acc_construct_parallel} for all OpenACC compute
5442 constructs as well as many OpenACC Runtime API calls; should be the
5443 one matching the actual construct, or
5444 @code{acc_construct_runtime_api}, respectively.
5447 Will be @code{acc_construct_enter_data} or
5448 @code{acc_construct_exit_data} when processing variable mappings
5449 specified in OpenACC @emph{declare} directives; should be
5450 @code{acc_construct_declare}.
5453 For implicit @code{acc_ev_device_init_start},
5454 @code{acc_ev_device_init_end}, and explicit as well as implicit
5455 @code{acc_ev_alloc}, @code{acc_ev_free},
5456 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
5457 @code{acc_ev_enqueue_download_start}, and
5458 @code{acc_ev_enqueue_download_end}, will be
5459 @code{acc_construct_parallel}; should reflect the real parent
5464 @item @code{acc_event_info.*.implicit}
5465 For @code{acc_ev_alloc}, @code{acc_ev_free},
5466 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
5467 @code{acc_ev_enqueue_download_start}, and
5468 @code{acc_ev_enqueue_download_end}, this currently will be @code{1}
5469 also for explicit usage.
5471 @item @code{acc_event_info.data_event.var_name}
5472 Always @code{NULL}; not yet implemented.
5474 @item @code{acc_event_info.data_event.host_ptr}
5475 For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
5478 @item @code{typedef union acc_api_info}
5479 @dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
5480 Information}. This should obviously be @code{typedef @emph{struct}
5483 @item @code{acc_api_info.device_api}
5484 Possibly not yet implemented correctly for
5485 @code{acc_ev_compute_construct_start},
5486 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
5487 will always be @code{acc_device_api_none} for these event types.
5488 For @code{acc_ev_enter_data_start}, it will be
5489 @code{acc_device_api_none} in some cases.
5491 @item @code{acc_api_info.device_type}
5492 Always the same as @code{acc_prof_info.device_type}.
5494 @item @code{acc_api_info.vendor}
5495 Always @code{-1}; not yet implemented.
5497 @item @code{acc_api_info.device_handle}
5498 Always @code{NULL}; not yet implemented.
5500 @item @code{acc_api_info.context_handle}
5501 Always @code{NULL}; not yet implemented.
5503 @item @code{acc_api_info.async_handle}
5504 Always @code{NULL}; not yet implemented.
5508 Remarks about certain event types:
5512 @item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
5516 @c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
5517 @c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
5518 @c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
5519 When a compute construct triggers implicit
5520 @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
5521 events, they currently aren't @emph{nested within} the corresponding
5522 @code{acc_ev_compute_construct_start} and
5523 @code{acc_ev_compute_construct_end}, but they're currently observed
5524 @emph{before} @code{acc_ev_compute_construct_start}.
5525 It's not clear what to do: the standard asks us provide a lot of
5526 details to the @code{acc_ev_compute_construct_start} callback, without
5527 (implicitly) initializing a device before?
5530 Callbacks for these event types will not be invoked for calls to the
5531 @code{acc_set_device_type} and @code{acc_set_device_num} functions.
5532 It's not clear if they should be.
5536 @item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
5540 Callbacks for these event types will also be invoked for OpenACC
5541 @emph{host_data} constructs.
5542 It's not clear if they should be.
5545 Callbacks for these event types will also be invoked when processing
5546 variable mappings specified in OpenACC @emph{declare} directives.
5547 It's not clear if they should be.
5553 Callbacks for the following event types will be invoked, but dispatch
5554 and information provided therein has not yet been thoroughly reviewed:
5557 @item @code{acc_ev_alloc}
5558 @item @code{acc_ev_free}
5559 @item @code{acc_ev_update_start}, @code{acc_ev_update_end}
5560 @item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
5561 @item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
5564 During device initialization, and finalization, respectively,
5565 callbacks for the following event types will not yet be invoked:
5568 @item @code{acc_ev_alloc}
5569 @item @code{acc_ev_free}
5572 Callbacks for the following event types have not yet been implemented,
5573 so currently won't be invoked:
5576 @item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
5577 @item @code{acc_ev_runtime_shutdown}
5578 @item @code{acc_ev_create}, @code{acc_ev_delete}
5579 @item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
5582 For the following runtime library functions, not all expected
5583 callbacks will be invoked (mostly concerning implicit device
5587 @item @code{acc_get_num_devices}
5588 @item @code{acc_set_device_type}
5589 @item @code{acc_get_device_type}
5590 @item @code{acc_set_device_num}
5591 @item @code{acc_get_device_num}
5592 @item @code{acc_init}
5593 @item @code{acc_shutdown}
5596 Aside from implicit device initialization, for the following runtime
5597 library functions, no callbacks will be invoked for shared-memory
5598 offloading devices (it's not clear if they should be):
5601 @item @code{acc_malloc}
5602 @item @code{acc_free}
5603 @item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
5604 @item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
5605 @item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
5606 @item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
5607 @item @code{acc_update_device}, @code{acc_update_device_async}
5608 @item @code{acc_update_self}, @code{acc_update_self_async}
5609 @item @code{acc_map_data}, @code{acc_unmap_data}
5610 @item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
5611 @item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
5614 @c ---------------------------------------------------------------------
5615 @c OpenMP-Implementation Specifics
5616 @c ---------------------------------------------------------------------
5618 @node OpenMP-Implementation Specifics
5619 @chapter OpenMP-Implementation Specifics
5622 * Implementation-defined ICV Initialization::
5623 * OpenMP Context Selectors::
5624 * Memory allocation::
5627 @node Implementation-defined ICV Initialization
5628 @section Implementation-defined ICV Initialization
5629 @cindex Implementation specific setting
5631 @multitable @columnfractions .30 .70
5632 @item @var{affinity-format-var} @tab See @ref{OMP_AFFINITY_FORMAT}.
5633 @item @var{def-allocator-var} @tab See @ref{OMP_ALLOCATOR}.
5634 @item @var{max-active-levels-var} @tab See @ref{OMP_MAX_ACTIVE_LEVELS}.
5635 @item @var{dyn-var} @tab See @ref{OMP_DYNAMIC}.
5636 @item @var{nthreads-var} @tab See @ref{OMP_NUM_THREADS}.
5637 @item @var{num-devices-var} @tab Number of non-host devices found
5638 by GCC's run-time library
5639 @item @var{num-procs-var} @tab The number of CPU cores on the
5640 initial device, except that affinity settings might lead to a
5641 smaller number. On non-host devices, the value of the
5642 @var{nthreads-var} ICV.
5643 @item @var{place-partition-var} @tab See @ref{OMP_PLACES}.
5644 @item @var{run-sched-var} @tab See @ref{OMP_SCHEDULE}.
5645 @item @var{stacksize-var} @tab See @ref{OMP_STACKSIZE}.
5646 @item @var{thread-limit-var} @tab See @ref{OMP_TEAMS_THREAD_LIMIT}
5647 @item @var{wait-policy-var} @tab See @ref{OMP_WAIT_POLICY} and
5648 @ref{GOMP_SPINCOUNT}
5651 @node OpenMP Context Selectors
5652 @section OpenMP Context Selectors
5654 @code{vendor} is always @code{gnu}. References are to the GCC manual.
5656 @c NOTE: Only the following selectors have been implemented. To add
5657 @c additional traits for target architecture, TARGET_OMP_DEVICE_KIND_ARCH_ISA
5658 @c has to be implemented; cf. also PR target/105640.
5659 @c For offload devices, add *additionally* gcc/config/*/t-omp-device.
5661 For the host compiler, @code{kind} always matches @code{host}; for the
5662 offloading architectures AMD GCN and Nvidia PTX, @code{kind} always matches
5663 @code{gpu}. For the x86 family of computers, AMD GCN and Nvidia PTX
5664 the following traits are supported in addition; while OpenMP is supported
5665 on more architectures, GCC currently does not match any @code{arch} or
5666 @code{isa} traits for those.
5668 @multitable @columnfractions .65 .30
5669 @headitem @code{arch} @tab @code{isa}
5670 @item @code{x86}, @code{x86_64}, @code{i386}, @code{i486},
5671 @code{i586}, @code{i686}, @code{ia32}
5672 @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
5673 @item @code{amdgcn}, @code{gcn}
5674 @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally,
5675 @code{gfx803} is supported as an alias for @code{fiji}.}
5677 @tab See @code{-march=} in ``Nvidia PTX Options''
5680 @node Memory allocation
5681 @section Memory allocation
5683 The description below applies to:
5686 @item Explicit use of the OpenMP API routines, see
5687 @ref{Memory Management Routines}.
5688 @item The @code{allocate} clause, except when the @code{allocator} modifier is a
5689 constant expression with value @code{omp_default_mem_alloc} and no
5690 @code{align} modifier has been specified. (In that case, the normal
5691 @code{malloc} allocation is used.)
5692 @item Using the @code{allocate} directive for automatic/stack variables, except
5693 when the @code{allocator} clause is a constant expression with value
5694 @code{omp_default_mem_alloc} and no @code{align} clause has been
5695 specified. (In that case, the normal allocation is used: stack allocation
5696 and, sometimes for Fortran, also @code{malloc} [depending on flags such as
5697 @option{-fstack-arrays}].)
5698 @item Using the @code{allocate} directive for variable in static memory is
5699 currently not supported (compile time error).
5700 @item Using the @code{allocators} directive for Fortran pointers and
5701 allocatables is currently not supported (compile time error).
5704 For the available predefined allocators and, as applicable, their associated
5705 predefined memory spaces and for the available traits and their default values,
5706 see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory
5707 space use the @code{omp_default_mem_space} memory space.
5709 For the memory spaces, the following applies:
5711 @item @code{omp_default_mem_space} is supported
5712 @item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
5713 @item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
5714 @item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
5715 unless the memkind library is available
5716 @item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
5717 unless the memkind library is available
5720 On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
5721 library} (@code{libmemkind.so.0}) is available at runtime, it is used when
5722 creating memory allocators requesting
5725 @item the memory space @code{omp_high_bw_mem_space}
5726 @item the memory space @code{omp_large_cap_mem_space}
5727 @item the @code{partition} trait @code{interleaved}; note that for
5728 @code{omp_large_cap_mem_space} the allocation will not be interleaved
5731 On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
5732 library} (@code{libnuma.so.1}) is available at runtime, it used when creating
5733 memory allocators requesting
5736 @item the @code{partition} trait @code{nearest}, except when both the
5737 libmemkind library is available and the memory space is either
5738 @code{omp_large_cap_mem_space} or @code{omp_high_bw_mem_space}
5741 Note that the numa library will round up the allocation size to a multiple of
5742 the system page size; therefore, consider using it only with large data or
5743 by sharing allocations via the @code{pool_size} trait. Furthermore, the Linux
5744 kernel does not guarantee that an allocation will always be on the nearest NUMA
5745 node nor that after reallocation the same node will be used. Note additionally
5746 that, on Linux, the default setting of the memory placement policy is to use the
5747 current node; therefore, unless the memory placement policy has been overridden,
5748 the @code{partition} trait @code{environment} (the default) will be effectively
5749 a @code{nearest} allocation.
5751 Additional notes regarding the traits:
5753 @item The @code{pinned} trait is unsupported.
5754 @item The default for the @code{pool_size} trait is no pool and for every
5755 (re)allocation the associated library routine is called, which might
5756 internally use a memory pool.
5757 @item For the @code{partition} trait, the partition part size will be the same
5758 as the requested size (i.e. @code{interleaved} or @code{blocked} has no
5759 effect), except for @code{interleaved} when the memkind library is
5760 available. Furthermore, for @code{nearest} and unless the numa library
5761 is available, the memory might not be on the same NUMA node as thread
5762 that allocated the memory; on Linux, this is in particular the case when
5763 the memory placement policy is set to preferred.
5764 @item The @code{access} trait has no effect such that memory is always
5765 accessible by all threads.
5766 @item The @code{sync_hint} trait has no effect.
5769 @c ---------------------------------------------------------------------
5770 @c Offload-Target Specifics
5771 @c ---------------------------------------------------------------------
5773 @node Offload-Target Specifics
5774 @chapter Offload-Target Specifics
5776 The following sections present notes on the offload-target specifics
5784 @section AMD Radeon (GCN)
5786 On the hardware side, there is the hierarchy (fine to coarse):
5788 @item work item (thread)
5791 @item compute unit (CU)
5794 All OpenMP and OpenACC levels are used, i.e.
5796 @item OpenMP's simd and OpenACC's vector map to work items (thread)
5797 @item OpenMP's threads (``parallel'') and OpenACC's workers map
5799 @item OpenMP's teams and OpenACC's gang use a threadpool with the
5800 size of the number of teams or gangs, respectively.
5805 @item Number of teams is the specified @code{num_teams} (OpenMP) or
5806 @code{num_gangs} (OpenACC) or otherwise the number of CU. It is limited
5807 by two times the number of CU.
5808 @item Number of wavefronts is 4 for gfx900 and 16 otherwise;
5809 @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
5810 overrides this if smaller.
5811 @item The wavefront has 102 scalars and 64 vectors
5812 @item Number of workitems is always 64
5813 @item The hardware permits maximally 40 workgroups/CU and
5814 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
5815 @item 80 scalars registers and 24 vector registers in non-kernel functions
5816 (the chosen procedure-calling API).
5817 @item For the kernel itself: as many as register pressure demands (number of
5818 teams and number of threads, scaled down if registers are exhausted)
5821 The implementation remark:
5823 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
5824 using the C library @code{printf} functions and the Fortran
5825 @code{print}/@code{write} statements.
5826 @item Reverse offload regions (i.e. @code{target} regions with
5827 @code{device(ancestor:1)}) are processed serially per @code{target} region
5828 such that the next reverse offload region is only executed after the previous
5830 @item OpenMP code that has a @code{requires} directive with
5831 @code{unified_shared_memory} will remove any GCN device from the list of
5832 available devices (``host fallback'').
5833 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
5834 environment variable; the default is 32 kiB per thread.
5842 On the hardware side, there is the hierarchy (fine to coarse):
5847 @item streaming multiprocessor
5850 All OpenMP and OpenACC levels are used, i.e.
5852 @item OpenMP's simd and OpenACC's vector map to threads
5853 @item OpenMP's threads (``parallel'') and OpenACC's workers map to warps
5854 @item OpenMP's teams and OpenACC's gang use a threadpool with the
5855 size of the number of teams or gangs, respectively.
5860 @item The @code{warp_size} is always 32
5861 @item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
5862 @item The number of teams is limited by the number of blocks the device can
5863 host simultaneously.
5866 Additional information can be obtained by setting the environment variable to
5867 @code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch
5870 GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
5871 which caches the JIT in the user's directory (see CUDA documentation; can be
5872 tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}.
5874 Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline
5875 options still affect the used PTX ISA code and, thus, the requirements on
5876 CUDA version and hardware.
5878 The implementation remark:
5880 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
5881 using the C library @code{printf} functions. Note that the Fortran
5882 @code{print}/@code{write} statements are not supported, yet.
5883 @item Compilation OpenMP code that contains @code{requires reverse_offload}
5884 requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
5886 @item For code containing reverse offload (i.e. @code{target} regions with
5887 @code{device(ancestor:1)}), there is a slight performance penalty
5888 for @emph{all} target regions, consisting mostly of shutdown delay
5889 Per device, reverse offload regions are processed serially such that
5890 the next reverse offload region is only executed after the previous
5892 @item OpenMP code that has a @code{requires} directive with
5893 @code{unified_shared_memory} will remove any nvptx device from the
5894 list of available devices (``host fallback'').
5895 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
5897 @item The OpenMP routines @code{omp_target_memcpy_rect} and
5898 @code{omp_target_memcpy_rect_async} and the @code{target update}
5899 directive for non-contiguous list items will use the 2D and 3D
5900 memory-copy functions of the CUDA library. Higher dimensions will
5901 call those functions in a loop and are therefore supported.
5905 @c ---------------------------------------------------------------------
5907 @c ---------------------------------------------------------------------
5909 @node The libgomp ABI
5910 @chapter The libgomp ABI
5912 The following sections present notes on the external ABI as
5913 presented by libgomp. Only maintainers should need them.
5916 * Implementing MASTER construct::
5917 * Implementing CRITICAL construct::
5918 * Implementing ATOMIC construct::
5919 * Implementing FLUSH construct::
5920 * Implementing BARRIER construct::
5921 * Implementing THREADPRIVATE construct::
5922 * Implementing PRIVATE clause::
5923 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
5924 * Implementing REDUCTION clause::
5925 * Implementing PARALLEL construct::
5926 * Implementing FOR construct::
5927 * Implementing ORDERED construct::
5928 * Implementing SECTIONS construct::
5929 * Implementing SINGLE construct::
5930 * Implementing OpenACC's PARALLEL construct::
5934 @node Implementing MASTER construct
5935 @section Implementing MASTER construct
5938 if (omp_get_thread_num () == 0)
5942 Alternately, we generate two copies of the parallel subfunction
5943 and only include this in the version run by the primary thread.
5944 Surely this is not worthwhile though...
5948 @node Implementing CRITICAL construct
5949 @section Implementing CRITICAL construct
5951 Without a specified name,
5954 void GOMP_critical_start (void);
5955 void GOMP_critical_end (void);
5958 so that we don't get COPY relocations from libgomp to the main
5961 With a specified name, use omp_set_lock and omp_unset_lock with
5962 name being transformed into a variable declared like
5965 omp_lock_t gomp_critical_user_<name> __attribute__((common))
5968 Ideally the ABI would specify that all zero is a valid unlocked
5969 state, and so we wouldn't need to initialize this at
5974 @node Implementing ATOMIC construct
5975 @section Implementing ATOMIC construct
5977 The target should implement the @code{__sync} builtins.
5979 Failing that we could add
5982 void GOMP_atomic_enter (void)
5983 void GOMP_atomic_exit (void)
5986 which reuses the regular lock code, but with yet another lock
5987 object private to the library.
5991 @node Implementing FLUSH construct
5992 @section Implementing FLUSH construct
5994 Expands to the @code{__sync_synchronize} builtin.
5998 @node Implementing BARRIER construct
5999 @section Implementing BARRIER construct
6002 void GOMP_barrier (void)
6006 @node Implementing THREADPRIVATE construct
6007 @section Implementing THREADPRIVATE construct
6009 In _most_ cases we can map this directly to @code{__thread}. Except
6010 that OMP allows constructors for C++ objects. We can either
6011 refuse to support this (how often is it used?) or we can
6012 implement something akin to .ctors.
6014 Even more ideally, this ctor feature is handled by extensions
6015 to the main pthreads library. Failing that, we can have a set
6016 of entry points to register ctor functions to be called.
6020 @node Implementing PRIVATE clause
6021 @section Implementing PRIVATE clause
6023 In association with a PARALLEL, or within the lexical extent
6024 of a PARALLEL block, the variable becomes a local variable in
6025 the parallel subfunction.
6027 In association with FOR or SECTIONS blocks, create a new
6028 automatic variable within the current function. This preserves
6029 the semantic of new variable creation.
6033 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
6034 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
6036 This seems simple enough for PARALLEL blocks. Create a private
6037 struct for communicating between the parent and subfunction.
6038 In the parent, copy in values for scalar and "small" structs;
6039 copy in addresses for others TREE_ADDRESSABLE types. In the
6040 subfunction, copy the value into the local variable.
6042 It is not clear what to do with bare FOR or SECTION blocks.
6043 The only thing I can figure is that we do something like:
6046 #pragma omp for firstprivate(x) lastprivate(y)
6047 for (int i = 0; i < n; ++i)
6064 where the "x=x" and "y=y" assignments actually have different
6065 uids for the two variables, i.e. not something you could write
6066 directly in C. Presumably this only makes sense if the "outer"
6067 x and y are global variables.
6069 COPYPRIVATE would work the same way, except the structure
6070 broadcast would have to happen via SINGLE machinery instead.
6074 @node Implementing REDUCTION clause
6075 @section Implementing REDUCTION clause
6077 The private struct mentioned in the previous section should have
6078 a pointer to an array of the type of the variable, indexed by the
6079 thread's @var{team_id}. The thread stores its final value into the
6080 array, and after the barrier, the primary thread iterates over the
6081 array to collect the values.
6084 @node Implementing PARALLEL construct
6085 @section Implementing PARALLEL construct
6088 #pragma omp parallel
6097 void subfunction (void *data)
6104 GOMP_parallel_start (subfunction, &data, num_threads);
6105 subfunction (&data);
6106 GOMP_parallel_end ();
6110 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
6113 The @var{FN} argument is the subfunction to be run in parallel.
6115 The @var{DATA} argument is a pointer to a structure used to
6116 communicate data in and out of the subfunction, as discussed
6117 above with respect to FIRSTPRIVATE et al.
6119 The @var{NUM_THREADS} argument is 1 if an IF clause is present
6120 and false, or the value of the NUM_THREADS clause, if
6123 The function needs to create the appropriate number of
6124 threads and/or launch them from the dock. It needs to
6125 create the team structure and assign team ids.
6128 void GOMP_parallel_end (void)
6131 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
6135 @node Implementing FOR construct
6136 @section Implementing FOR construct
6139 #pragma omp parallel for
6140 for (i = lb; i <= ub; i++)
6147 void subfunction (void *data)
6150 while (GOMP_loop_static_next (&_s0, &_e0))
6153 for (i = _s0; i < _e1; i++)
6156 GOMP_loop_end_nowait ();
6159 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
6161 GOMP_parallel_end ();
6165 #pragma omp for schedule(runtime)
6166 for (i = 0; i < n; i++)
6175 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
6178 for (i = _s0, i < _e0; i++)
6180 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
6185 Note that while it looks like there is trickiness to propagating
6186 a non-constant STEP, there isn't really. We're explicitly allowed
6187 to evaluate it as many times as we want, and any variables involved
6188 should automatically be handled as PRIVATE or SHARED like any other
6189 variables. So the expression should remain evaluable in the
6190 subfunction. We can also pull it into a local variable if we like,
6191 but since its supposed to remain unchanged, we can also not if we like.
6193 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
6194 able to get away with no work-sharing context at all, since we can
6195 simply perform the arithmetic directly in each thread to divide up
6196 the iterations. Which would mean that we wouldn't need to call any
6199 There are separate routines for handling loops with an ORDERED
6200 clause. Bookkeeping for that is non-trivial...
6204 @node Implementing ORDERED construct
6205 @section Implementing ORDERED construct
6208 void GOMP_ordered_start (void)
6209 void GOMP_ordered_end (void)
6214 @node Implementing SECTIONS construct
6215 @section Implementing SECTIONS construct
6220 #pragma omp sections
6234 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
6251 @node Implementing SINGLE construct
6252 @section Implementing SINGLE construct
6266 if (GOMP_single_start ())
6274 #pragma omp single copyprivate(x)
6281 datap = GOMP_single_copy_start ();
6286 GOMP_single_copy_end (&data);
6295 @node Implementing OpenACC's PARALLEL construct
6296 @section Implementing OpenACC's PARALLEL construct
6299 void GOACC_parallel ()
6304 @c ---------------------------------------------------------------------
6306 @c ---------------------------------------------------------------------
6308 @node Reporting Bugs
6309 @chapter Reporting Bugs
6311 Bugs in the GNU Offloading and Multi Processing Runtime Library should
6312 be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
6313 "openacc", or "openmp", or both to the keywords field in the bug
6314 report, as appropriate.
6318 @c ---------------------------------------------------------------------
6319 @c GNU General Public License
6320 @c ---------------------------------------------------------------------
6322 @include gpl_v3.texi
6326 @c ---------------------------------------------------------------------
6327 @c GNU Free Documentation License
6328 @c ---------------------------------------------------------------------
6334 @c ---------------------------------------------------------------------
6335 @c Funding Free Software
6336 @c ---------------------------------------------------------------------
6338 @include funding.texi
6340 @c ---------------------------------------------------------------------
6342 @c ---------------------------------------------------------------------
6345 @unnumbered Library Index