libgomp/libgomp.texi

   1 \input texinfo @c -*-texinfo-*-
   2
   3 @c %**start of header
   4 @setfilename libgomp.info
   5 @settitle GNU libgomp
   6 @c %**end of header
   7
   8
   9 @copying
  10 Copyright @copyright{} 2006-2019 Free Software Foundation, Inc.
  11
  12 Permission is granted to copy, distribute and/or modify this document
  13 under the terms of the GNU Free Documentation License, Version 1.3 or
  14 any later version published by the Free Software Foundation; with the
  15 Invariant Sections being ``Funding Free Software'', the Front-Cover
  16 texts being (a) (see below), and with the Back-Cover Texts being (b)
  17 (see below).  A copy of the license is included in the section entitled
  18 ``GNU Free Documentation License''.
  19
  20 (a) The FSF's Front-Cover Text is:
  21
  22      A GNU Manual
  23
  24 (b) The FSF's Back-Cover Text is:
  25
  26      You have freedom to copy and modify this GNU Manual, like GNU
  27      software.  Copies published by the Free Software Foundation raise
  28      funds for GNU development.
  29 @end copying
  30
  31 @ifinfo
  32 @dircategory GNU Libraries
  33 @direntry
  34 * libgomp: (libgomp).          GNU Offloading and Multi Processing Runtime Library.
  35 @end direntry
  36
  37 This manual documents libgomp, the GNU Offloading and Multi Processing
  38 Runtime library.  This is the GNU implementation of the OpenMP and
  39 OpenACC APIs for parallel and accelerator programming in C/C++ and
  40 Fortran.
  41
  42 Published by the Free Software Foundation
  43 51 Franklin Street, Fifth Floor
  44 Boston, MA 02110-1301 USA
  45
  46 @insertcopying
  47 @end ifinfo
  48
  49
  50 @setchapternewpage odd
  51
  52 @titlepage
  53 @title GNU Offloading and Multi Processing Runtime Library
  54 @subtitle The GNU OpenMP and OpenACC Implementation
  55 @page
  56 @vskip 0pt plus 1filll
  57 @comment For the @value{version-GCC} Version*
  58 @sp 1
  59 Published by the Free Software Foundation @*
  60 51 Franklin Street, Fifth Floor@*
  61 Boston, MA 02110-1301, USA@*
  62 @sp 1
  63 @insertcopying
  64 @end titlepage
  65
  66 @summarycontents
  67 @contents
  68 @page
  69
  70
  71 @node Top
  72 @top Introduction
  73 @cindex Introduction
  74
  75 This manual documents the usage of libgomp, the GNU Offloading and
  76 Multi Processing Runtime Library.  This includes the GNU
  77 implementation of the @uref{https://www.openmp.org, OpenMP} Application
  78 Programming Interface (API) for multi-platform shared-memory parallel
  79 programming in C/C++ and Fortran, and the GNU implementation of the
  80 @uref{https://www.openacc.org, OpenACC} Application Programming
  81 Interface (API) for offloading of code to accelerator devices in C/C++
  82 and Fortran.
  83
  84 Originally, libgomp implemented the GNU OpenMP Runtime Library.  Based
  85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
  86 4's target construct) has been added later on, and the library's name
  87 changed to GNU Offloading and Multi Processing Runtime Library.
  88
  89
  90
  91 @comment
  92 @comment  When you add a new menu item, please keep the right hand
  93 @comment  aligned to the same column.  Do not use tabs.  This provides
  94 @comment  better formatting.
  95 @comment
  96 @menu
  97 * Enabling OpenMP::            How to enable OpenMP for your applications.
  98 * OpenMP Runtime Library Routines: Runtime Library Routines.
  99                                The OpenMP runtime application programming
 100                                interface.
 101 * OpenMP Environment Variables: Environment Variables.
 102                                Influencing OpenMP runtime behavior with
 103                                environment variables.
 104 * Enabling OpenACC::           How to enable OpenACC for your
 105                                applications.
 106 * OpenACC Runtime Library Routines:: The OpenACC runtime application
 107                                programming interface.
 108 * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
 109                                environment variables.
 110 * CUDA Streams Usage::         Notes on the implementation of
 111                                asynchronous operations.
 112 * OpenACC Library Interoperability:: OpenACC library interoperability with the
 113                                NVIDIA CUBLAS library.
 114 * The libgomp ABI::            Notes on the external ABI presented by libgomp.
 115 * Reporting Bugs::             How to report bugs in the GNU Offloading and
 116                                Multi Processing Runtime Library.
 117 * Copying::                    GNU general public license says
 118                                how you can copy and share libgomp.
 119 * GNU Free Documentation License::
 120                                How you can copy and share this manual.
 121 * Funding::                    How to help assure continued work for free
 122                                software.
 123 * Library Index::              Index of this documentation.
 124 @end menu
 125
 126
 127 @c ---------------------------------------------------------------------
 128 @c Enabling OpenMP
 129 @c ---------------------------------------------------------------------
 130
 131 @node Enabling OpenMP
 132 @chapter Enabling OpenMP
 133
 134 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
 135 flag @command{-fopenmp} must be specified.  This enables the OpenMP directive
 136 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
 137 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
 138 @code{!$} conditional compilation sentinels in free form and @code{c$},
 139 @code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 140 arranges for automatic linking of the OpenMP runtime library
 141 (@ref{Runtime Library Routines}).
 142
 143 A complete description of all OpenMP directives accepted may be found in
 144 the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
 145 version 4.5.
 146
 147
 148 @c ---------------------------------------------------------------------
 149 @c OpenMP Runtime Library Routines
 150 @c ---------------------------------------------------------------------
 151
 152 @node Runtime Library Routines
 153 @chapter OpenMP Runtime Library Routines
 154
 155 The runtime routines described here are defined by Section 3 of the OpenMP
 156 specification in version 4.5.  The routines are structured in following
 157 three parts:
 158
 159 @menu
 160 Control threads, processors and the parallel environment.  They have C
 161 linkage, and do not throw exceptions.
 162
 163 * omp_get_active_level::        Number of active parallel regions
 164 * omp_get_ancestor_thread_num:: Ancestor thread ID
 165 * omp_get_cancellation::        Whether cancellation support is enabled
 166 * omp_get_default_device::      Get the default device for target regions
 167 * omp_get_dynamic::             Dynamic teams setting
 168 * omp_get_level::               Number of parallel regions
 169 * omp_get_max_active_levels::   Maximum number of active regions
 170 * omp_get_max_task_priority::   Maximum task priority value that can be set
 171 * omp_get_max_threads::         Maximum number of threads of parallel region
 172 * omp_get_nested::              Nested parallel regions
 173 * omp_get_num_devices::         Number of target devices
 174 * omp_get_num_procs::           Number of processors online
 175 * omp_get_num_teams::           Number of teams
 176 * omp_get_num_threads::         Size of the active team
 177 * omp_get_proc_bind::           Whether theads may be moved between CPUs
 178 * omp_get_schedule::            Obtain the runtime scheduling method
 179 * omp_get_team_num::            Get team number
 180 * omp_get_team_size::           Number of threads in a team
 181 * omp_get_thread_limit::        Maximum number of threads
 182 * omp_get_thread_num::          Current thread ID
 183 * omp_in_parallel::             Whether a parallel region is active
 184 * omp_in_final::                Whether in final or included task region
 185 * omp_is_initial_device::       Whether executing on the host device
 186 * omp_set_default_device::      Set the default device for target regions
 187 * omp_set_dynamic::             Enable/disable dynamic teams
 188 * omp_set_max_active_levels::   Limits the number of active parallel regions
 189 * omp_set_nested::              Enable/disable nested parallel regions
 190 * omp_set_num_threads::         Set upper team size limit
 191 * omp_set_schedule::            Set the runtime scheduling method
 192
 193 Initialize, set, test, unset and destroy simple and nested locks.
 194
 195 * omp_init_lock::            Initialize simple lock
 196 * omp_set_lock::             Wait for and set simple lock
 197 * omp_test_lock::            Test and set simple lock if available
 198 * omp_unset_lock::           Unset simple lock
 199 * omp_destroy_lock::         Destroy simple lock
 200 * omp_init_nest_lock::       Initialize nested lock
 201 * omp_set_nest_lock::        Wait for and set simple lock
 202 * omp_test_nest_lock::       Test and set nested lock if available
 203 * omp_unset_nest_lock::      Unset nested lock
 204 * omp_destroy_nest_lock::    Destroy nested lock
 205
 206 Portable, thread-based, wall clock timer.
 207
 208 * omp_get_wtick::            Get timer precision.
 209 * omp_get_wtime::            Elapsed wall clock time.
 210 @end menu
 211
 212
 213
 214 @node omp_get_active_level
 215 @section @code{omp_get_active_level} -- Number of parallel regions
 216 @table @asis
 217 @item @emph{Description}:
 218 This function returns the nesting level for the active parallel blocks,
 219 which enclose the calling call.
 220
 221 @item @emph{C/C++}
 222 @multitable @columnfractions .20 .80
 223 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
 224 @end multitable
 225
 226 @item @emph{Fortran}:
 227 @multitable @columnfractions .20 .80
 228 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
 229 @end multitable
 230
 231 @item @emph{See also}:
 232 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
 233
 234 @item @emph{Reference}:
 235 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
 236 @end table
 237
 238
 239
 240 @node omp_get_ancestor_thread_num
 241 @section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
 242 @table @asis
 243 @item @emph{Description}:
 244 This function returns the thread identification number for the given
 245 nesting level of the current thread.  For values of @var{level} outside
 246 zero to @code{omp_get_level} -1 is returned; if @var{level} is
 247 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
 248
 249 @item @emph{C/C++}
 250 @multitable @columnfractions .20 .80
 251 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
 252 @end multitable
 253
 254 @item @emph{Fortran}:
 255 @multitable @columnfractions .20 .80
 256 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
 257 @item                   @tab @code{integer level}
 258 @end multitable
 259
 260 @item @emph{See also}:
 261 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
 262
 263 @item @emph{Reference}:
 264 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
 265 @end table
 266
 267
 268
 269 @node omp_get_cancellation
 270 @section @code{omp_get_cancellation} -- Whether cancellation support is enabled
 271 @table @asis
 272 @item @emph{Description}:
 273 This function returns @code{true} if cancellation is activated, @code{false}
 274 otherwise.  Here, @code{true} and @code{false} represent their language-specific
 275 counterparts.  Unless @env{OMP_CANCELLATION} is set true, cancellations are
 276 deactivated.
 277
 278 @item @emph{C/C++}:
 279 @multitable @columnfractions .20 .80
 280 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
 281 @end multitable
 282
 283 @item @emph{Fortran}:
 284 @multitable @columnfractions .20 .80
 285 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
 286 @end multitable
 287
 288 @item @emph{See also}:
 289 @ref{OMP_CANCELLATION}
 290
 291 @item @emph{Reference}:
 292 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
 293 @end table
 294
 295
 296
 297 @node omp_get_default_device
 298 @section @code{omp_get_default_device} -- Get the default device for target regions
 299 @table @asis
 300 @item @emph{Description}:
 301 Get the default device for target regions without device clause.
 302
 303 @item @emph{C/C++}:
 304 @multitable @columnfractions .20 .80
 305 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
 306 @end multitable
 307
 308 @item @emph{Fortran}:
 309 @multitable @columnfractions .20 .80
 310 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
 311 @end multitable
 312
 313 @item @emph{See also}:
 314 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
 315
 316 @item @emph{Reference}:
 317 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
 318 @end table
 319
 320
 321
 322 @node omp_get_dynamic
 323 @section @code{omp_get_dynamic} -- Dynamic teams setting
 324 @table @asis
 325 @item @emph{Description}:
 326 This function returns @code{true} if enabled, @code{false} otherwise.
 327 Here, @code{true} and @code{false} represent their language-specific
 328 counterparts.
 329
 330 The dynamic team setting may be initialized at startup by the
 331 @env{OMP_DYNAMIC} environment variable or at runtime using
 332 @code{omp_set_dynamic}.  If undefined, dynamic adjustment is
 333 disabled by default.
 334
 335 @item @emph{C/C++}:
 336 @multitable @columnfractions .20 .80
 337 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
 338 @end multitable
 339
 340 @item @emph{Fortran}:
 341 @multitable @columnfractions .20 .80
 342 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
 343 @end multitable
 344
 345 @item @emph{See also}:
 346 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
 347
 348 @item @emph{Reference}:
 349 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
 350 @end table
 351
 352
 353
 354 @node omp_get_level
 355 @section @code{omp_get_level} -- Obtain the current nesting level
 356 @table @asis
 357 @item @emph{Description}:
 358 This function returns the nesting level for the parallel blocks,
 359 which enclose the calling call.
 360
 361 @item @emph{C/C++}
 362 @multitable @columnfractions .20 .80
 363 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
 364 @end multitable
 365
 366 @item @emph{Fortran}:
 367 @multitable @columnfractions .20 .80
 368 @item @emph{Interface}: @tab @code{integer function omp_level()}
 369 @end multitable
 370
 371 @item @emph{See also}:
 372 @ref{omp_get_active_level}
 373
 374 @item @emph{Reference}:
 375 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
 376 @end table
 377
 378
 379
 380 @node omp_get_max_active_levels
 381 @section @code{omp_get_max_active_levels} -- Maximum number of active regions
 382 @table @asis
 383 @item @emph{Description}:
 384 This function obtains the maximum allowed number of nested, active parallel regions.
 385
 386 @item @emph{C/C++}
 387 @multitable @columnfractions .20 .80
 388 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
 389 @end multitable
 390
 391 @item @emph{Fortran}:
 392 @multitable @columnfractions .20 .80
 393 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
 394 @end multitable
 395
 396 @item @emph{See also}:
 397 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
 398
 399 @item @emph{Reference}:
 400 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
 401 @end table
 402
 403
 404 @node omp_get_max_task_priority
 405 @section @code{omp_get_max_task_priority} -- Maximum priority value
 406 that can be set for tasks.
 407 @table @asis
 408 @item @emph{Description}:
 409 This function obtains the maximum allowed priority number for tasks.
 410
 411 @item @emph{C/C++}
 412 @multitable @columnfractions .20 .80
 413 @item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
 414 @end multitable
 415
 416 @item @emph{Fortran}:
 417 @multitable @columnfractions .20 .80
 418 @item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
 419 @end multitable
 420
 421 @item @emph{Reference}:
 422 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
 423 @end table
 424
 425
 426 @node omp_get_max_threads
 427 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
 428 @table @asis
 429 @item @emph{Description}:
 430 Return the maximum number of threads used for the current parallel region
 431 that does not use the clause @code{num_threads}.
 432
 433 @item @emph{C/C++}:
 434 @multitable @columnfractions .20 .80
 435 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
 436 @end multitable
 437
 438 @item @emph{Fortran}:
 439 @multitable @columnfractions .20 .80
 440 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
 441 @end multitable
 442
 443 @item @emph{See also}:
 444 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
 445
 446 @item @emph{Reference}:
 447 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
 448 @end table
 449
 450
 451
 452 @node omp_get_nested
 453 @section @code{omp_get_nested} -- Nested parallel regions
 454 @table @asis
 455 @item @emph{Description}:
 456 This function returns @code{true} if nested parallel regions are
 457 enabled, @code{false} otherwise.  Here, @code{true} and @code{false}
 458 represent their language-specific counterparts.
 459
 460 Nested parallel regions may be initialized at startup by the
 461 @env{OMP_NESTED} environment variable or at runtime using
 462 @code{omp_set_nested}.  If undefined, nested parallel regions are
 463 disabled by default.
 464
 465 @item @emph{C/C++}:
 466 @multitable @columnfractions .20 .80
 467 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
 468 @end multitable
 469
 470 @item @emph{Fortran}:
 471 @multitable @columnfractions .20 .80
 472 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
 473 @end multitable
 474
 475 @item @emph{See also}:
 476 @ref{omp_set_nested}, @ref{OMP_NESTED}
 477
 478 @item @emph{Reference}:
 479 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
 480 @end table
 481
 482
 483
 484 @node omp_get_num_devices
 485 @section @code{omp_get_num_devices} -- Number of target devices
 486 @table @asis
 487 @item @emph{Description}:
 488 Returns the number of target devices.
 489
 490 @item @emph{C/C++}:
 491 @multitable @columnfractions .20 .80
 492 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
 493 @end multitable
 494
 495 @item @emph{Fortran}:
 496 @multitable @columnfractions .20 .80
 497 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
 498 @end multitable
 499
 500 @item @emph{Reference}:
 501 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
 502 @end table
 503
 504
 505
 506 @node omp_get_num_procs
 507 @section @code{omp_get_num_procs} -- Number of processors online
 508 @table @asis
 509 @item @emph{Description}:
 510 Returns the number of processors online on that device.
 511
 512 @item @emph{C/C++}:
 513 @multitable @columnfractions .20 .80
 514 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
 515 @end multitable
 516
 517 @item @emph{Fortran}:
 518 @multitable @columnfractions .20 .80
 519 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
 520 @end multitable
 521
 522 @item @emph{Reference}:
 523 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
 524 @end table
 525
 526
 527
 528 @node omp_get_num_teams
 529 @section @code{omp_get_num_teams} -- Number of teams
 530 @table @asis
 531 @item @emph{Description}:
 532 Returns the number of teams in the current team region.
 533
 534 @item @emph{C/C++}:
 535 @multitable @columnfractions .20 .80
 536 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
 537 @end multitable
 538
 539 @item @emph{Fortran}:
 540 @multitable @columnfractions .20 .80
 541 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
 542 @end multitable
 543
 544 @item @emph{Reference}:
 545 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
 546 @end table
 547
 548
 549
 550 @node omp_get_num_threads
 551 @section @code{omp_get_num_threads} -- Size of the active team
 552 @table @asis
 553 @item @emph{Description}:
 554 Returns the number of threads in the current team.  In a sequential section of
 555 the program @code{omp_get_num_threads} returns 1.
 556
 557 The default team size may be initialized at startup by the
 558 @env{OMP_NUM_THREADS} environment variable.  At runtime, the size
 559 of the current team may be set either by the @code{NUM_THREADS}
 560 clause or by @code{omp_set_num_threads}.  If none of the above were
 561 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
 562 one thread per CPU online is used.
 563
 564 @item @emph{C/C++}:
 565 @multitable @columnfractions .20 .80
 566 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
 567 @end multitable
 568
 569 @item @emph{Fortran}:
 570 @multitable @columnfractions .20 .80
 571 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
 572 @end multitable
 573
 574 @item @emph{See also}:
 575 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
 576
 577 @item @emph{Reference}:
 578 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
 579 @end table
 580
 581
 582
 583 @node omp_get_proc_bind
 584 @section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
 585 @table @asis
 586 @item @emph{Description}:
 587 This functions returns the currently active thread affinity policy, which is
 588 set via @env{OMP_PROC_BIND}.  Possible values are @code{omp_proc_bind_false},
 589 @code{omp_proc_bind_true}, @code{omp_proc_bind_master},
 590 @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
 591
 592 @item @emph{C/C++}:
 593 @multitable @columnfractions .20 .80
 594 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
 595 @end multitable
 596
 597 @item @emph{Fortran}:
 598 @multitable @columnfractions .20 .80
 599 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
 600 @end multitable
 601
 602 @item @emph{See also}:
 603 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
 604
 605 @item @emph{Reference}:
 606 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
 607 @end table
 608
 609
 610
 611 @node omp_get_schedule
 612 @section @code{omp_get_schedule} -- Obtain the runtime scheduling method
 613 @table @asis
 614 @item @emph{Description}:
 615 Obtain the runtime scheduling method.  The @var{kind} argument will be
 616 set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
 617 @code{omp_sched_guided} or @code{omp_sched_auto}.  The second argument,
 618 @var{chunk_size}, is set to the chunk size.
 619
 620 @item @emph{C/C++}
 621 @multitable @columnfractions .20 .80
 622 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
 623 @end multitable
 624
 625 @item @emph{Fortran}:
 626 @multitable @columnfractions .20 .80
 627 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
 628 @item                   @tab @code{integer(kind=omp_sched_kind) kind}
 629 @item                   @tab @code{integer chunk_size}
 630 @end multitable
 631
 632 @item @emph{See also}:
 633 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
 634
 635 @item @emph{Reference}:
 636 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
 637 @end table
 638
 639
 640
 641 @node omp_get_team_num
 642 @section @code{omp_get_team_num} -- Get team number
 643 @table @asis
 644 @item @emph{Description}:
 645 Returns the team number of the calling thread.
 646
 647 @item @emph{C/C++}:
 648 @multitable @columnfractions .20 .80
 649 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
 650 @end multitable
 651
 652 @item @emph{Fortran}:
 653 @multitable @columnfractions .20 .80
 654 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
 655 @end multitable
 656
 657 @item @emph{Reference}:
 658 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
 659 @end table
 660
 661
 662
 663 @node omp_get_team_size
 664 @section @code{omp_get_team_size} -- Number of threads in a team
 665 @table @asis
 666 @item @emph{Description}:
 667 This function returns the number of threads in a thread team to which
 668 either the current thread or its ancestor belongs.  For values of @var{level}
 669 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
 670 1 is returned, and for @code{omp_get_level}, the result is identical
 671 to @code{omp_get_num_threads}.
 672
 673 @item @emph{C/C++}:
 674 @multitable @columnfractions .20 .80
 675 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
 676 @end multitable
 677
 678 @item @emph{Fortran}:
 679 @multitable @columnfractions .20 .80
 680 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
 681 @item                   @tab @code{integer level}
 682 @end multitable
 683
 684 @item @emph{See also}:
 685 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
 686
 687 @item @emph{Reference}:
 688 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
 689 @end table
 690
 691
 692
 693 @node omp_get_thread_limit
 694 @section @code{omp_get_thread_limit} -- Maximum number of threads
 695 @table @asis
 696 @item @emph{Description}:
 697 Return the maximum number of threads of the program.
 698
 699 @item @emph{C/C++}:
 700 @multitable @columnfractions .20 .80
 701 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
 702 @end multitable
 703
 704 @item @emph{Fortran}:
 705 @multitable @columnfractions .20 .80
 706 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
 707 @end multitable
 708
 709 @item @emph{See also}:
 710 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
 711
 712 @item @emph{Reference}:
 713 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
 714 @end table
 715
 716
 717
 718 @node omp_get_thread_num
 719 @section @code{omp_get_thread_num} -- Current thread ID
 720 @table @asis
 721 @item @emph{Description}:
 722 Returns a unique thread identification number within the current team.
 723 In a sequential parts of the program, @code{omp_get_thread_num}
 724 always returns 0.  In parallel regions the return value varies
 725 from 0 to @code{omp_get_num_threads}-1 inclusive.  The return
 726 value of the master thread of a team is always 0.
 727
 728 @item @emph{C/C++}:
 729 @multitable @columnfractions .20 .80
 730 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
 731 @end multitable
 732
 733 @item @emph{Fortran}:
 734 @multitable @columnfractions .20 .80
 735 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
 736 @end multitable
 737
 738 @item @emph{See also}:
 739 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
 740
 741 @item @emph{Reference}:
 742 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
 743 @end table
 744
 745
 746
 747 @node omp_in_parallel
 748 @section @code{omp_in_parallel} -- Whether a parallel region is active
 749 @table @asis
 750 @item @emph{Description}:
 751 This function returns @code{true} if currently running in parallel,
 752 @code{false} otherwise.  Here, @code{true} and @code{false} represent
 753 their language-specific counterparts.
 754
 755 @item @emph{C/C++}:
 756 @multitable @columnfractions .20 .80
 757 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
 758 @end multitable
 759
 760 @item @emph{Fortran}:
 761 @multitable @columnfractions .20 .80
 762 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
 763 @end multitable
 764
 765 @item @emph{Reference}:
 766 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
 767 @end table
 768
 769
 770 @node omp_in_final
 771 @section @code{omp_in_final} -- Whether in final or included task region
 772 @table @asis
 773 @item @emph{Description}:
 774 This function returns @code{true} if currently running in a final
 775 or included task region, @code{false} otherwise.  Here, @code{true}
 776 and @code{false} represent their language-specific counterparts.
 777
 778 @item @emph{C/C++}:
 779 @multitable @columnfractions .20 .80
 780 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
 781 @end multitable
 782
 783 @item @emph{Fortran}:
 784 @multitable @columnfractions .20 .80
 785 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
 786 @end multitable
 787
 788 @item @emph{Reference}:
 789 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
 790 @end table
 791
 792
 793
 794 @node omp_is_initial_device
 795 @section @code{omp_is_initial_device} -- Whether executing on the host device
 796 @table @asis
 797 @item @emph{Description}:
 798 This function returns @code{true} if currently running on the host device,
 799 @code{false} otherwise.  Here, @code{true} and @code{false} represent
 800 their language-specific counterparts.
 801
 802 @item @emph{C/C++}:
 803 @multitable @columnfractions .20 .80
 804 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
 805 @end multitable
 806
 807 @item @emph{Fortran}:
 808 @multitable @columnfractions .20 .80
 809 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
 810 @end multitable
 811
 812 @item @emph{Reference}:
 813 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
 814 @end table
 815
 816
 817
 818 @node omp_set_default_device
 819 @section @code{omp_set_default_device} -- Set the default device for target regions
 820 @table @asis
 821 @item @emph{Description}:
 822 Set the default device for target regions without device clause.  The argument
 823 shall be a nonnegative device number.
 824
 825 @item @emph{C/C++}:
 826 @multitable @columnfractions .20 .80
 827 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
 828 @end multitable
 829
 830 @item @emph{Fortran}:
 831 @multitable @columnfractions .20 .80
 832 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
 833 @item                   @tab @code{integer device_num}
 834 @end multitable
 835
 836 @item @emph{See also}:
 837 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
 838
 839 @item @emph{Reference}:
 840 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
 841 @end table
 842
 843
 844
 845 @node omp_set_dynamic
 846 @section @code{omp_set_dynamic} -- Enable/disable dynamic teams
 847 @table @asis
 848 @item @emph{Description}:
 849 Enable or disable the dynamic adjustment of the number of threads
 850 within a team.  The function takes the language-specific equivalent
 851 of @code{true} and @code{false}, where @code{true} enables dynamic
 852 adjustment of team sizes and @code{false} disables it.
 853
 854 @item @emph{C/C++}:
 855 @multitable @columnfractions .20 .80
 856 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
 857 @end multitable
 858
 859 @item @emph{Fortran}:
 860 @multitable @columnfractions .20 .80
 861 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
 862 @item                   @tab @code{logical, intent(in) :: dynamic_threads}
 863 @end multitable
 864
 865 @item @emph{See also}:
 866 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
 867
 868 @item @emph{Reference}:
 869 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
 870 @end table
 871
 872
 873
 874 @node omp_set_max_active_levels
 875 @section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
 876 @table @asis
 877 @item @emph{Description}:
 878 This function limits the maximum allowed number of nested, active
 879 parallel regions.
 880
 881 @item @emph{C/C++}
 882 @multitable @columnfractions .20 .80
 883 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
 884 @end multitable
 885
 886 @item @emph{Fortran}:
 887 @multitable @columnfractions .20 .80
 888 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
 889 @item                   @tab @code{integer max_levels}
 890 @end multitable
 891
 892 @item @emph{See also}:
 893 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
 894
 895 @item @emph{Reference}:
 896 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
 897 @end table
 898
 899
 900
 901 @node omp_set_nested
 902 @section @code{omp_set_nested} -- Enable/disable nested parallel regions
 903 @table @asis
 904 @item @emph{Description}:
 905 Enable or disable nested parallel regions, i.e., whether team members
 906 are allowed to create new teams.  The function takes the language-specific
 907 equivalent of @code{true} and @code{false}, where @code{true} enables
 908 dynamic adjustment of team sizes and @code{false} disables it.
 909
 910 @item @emph{C/C++}:
 911 @multitable @columnfractions .20 .80
 912 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
 913 @end multitable
 914
 915 @item @emph{Fortran}:
 916 @multitable @columnfractions .20 .80
 917 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
 918 @item                   @tab @code{logical, intent(in) :: nested}
 919 @end multitable
 920
 921 @item @emph{See also}:
 922 @ref{OMP_NESTED}, @ref{omp_get_nested}
 923
 924 @item @emph{Reference}:
 925 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
 926 @end table
 927
 928
 929
 930 @node omp_set_num_threads
 931 @section @code{omp_set_num_threads} -- Set upper team size limit
 932 @table @asis
 933 @item @emph{Description}:
 934 Specifies the number of threads used by default in subsequent parallel
 935 sections, if those do not specify a @code{num_threads} clause.  The
 936 argument of @code{omp_set_num_threads} shall be a positive integer.
 937
 938 @item @emph{C/C++}:
 939 @multitable @columnfractions .20 .80
 940 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
 941 @end multitable
 942
 943 @item @emph{Fortran}:
 944 @multitable @columnfractions .20 .80
 945 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
 946 @item                   @tab @code{integer, intent(in) :: num_threads}
 947 @end multitable
 948
 949 @item @emph{See also}:
 950 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
 951
 952 @item @emph{Reference}:
 953 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
 954 @end table
 955
 956
 957
 958 @node omp_set_schedule
 959 @section @code{omp_set_schedule} -- Set the runtime scheduling method
 960 @table @asis
 961 @item @emph{Description}:
 962 Sets the runtime scheduling method.  The @var{kind} argument can have the
 963 value @code{omp_sched_static}, @code{omp_sched_dynamic},
 964 @code{omp_sched_guided} or @code{omp_sched_auto}.  Except for
 965 @code{omp_sched_auto}, the chunk size is set to the value of
 966 @var{chunk_size} if positive, or to the default value if zero or negative.
 967 For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
 968
 969 @item @emph{C/C++}
 970 @multitable @columnfractions .20 .80
 971 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
 972 @end multitable
 973
 974 @item @emph{Fortran}:
 975 @multitable @columnfractions .20 .80
 976 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
 977 @item                   @tab @code{integer(kind=omp_sched_kind) kind}
 978 @item                   @tab @code{integer chunk_size}
 979 @end multitable
 980
 981 @item @emph{See also}:
 982 @ref{omp_get_schedule}
 983 @ref{OMP_SCHEDULE}
 984
 985 @item @emph{Reference}:
 986 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
 987 @end table
 988
 989
 990
 991 @node omp_init_lock
 992 @section @code{omp_init_lock} -- Initialize simple lock
 993 @table @asis
 994 @item @emph{Description}:
 995 Initialize a simple lock.  After initialization, the lock is in
 996 an unlocked state.
 997
 998 @item @emph{C/C++}:
 999 @multitable @columnfractions .20 .80
1000 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1001 @end multitable
1002
1003 @item @emph{Fortran}:
1004 @multitable @columnfractions .20 .80
1005 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1006 @item                   @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1007 @end multitable
1008
1009 @item @emph{See also}:
1010 @ref{omp_destroy_lock}
1011
1012 @item @emph{Reference}:
1013 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1014 @end table
1015
1016
1017
1018 @node omp_set_lock
1019 @section @code{omp_set_lock} -- Wait for and set simple lock
1020 @table @asis
1021 @item @emph{Description}:
1022 Before setting a simple lock, the lock variable must be initialized by
1023 @code{omp_init_lock}.  The calling thread is blocked until the lock
1024 is available.  If the lock is already held by the current thread,
1025 a deadlock occurs.
1026
1027 @item @emph{C/C++}:
1028 @multitable @columnfractions .20 .80
1029 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1030 @end multitable
1031
1032 @item @emph{Fortran}:
1033 @multitable @columnfractions .20 .80
1034 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1035 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1036 @end multitable
1037
1038 @item @emph{See also}:
1039 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1040
1041 @item @emph{Reference}:
1042 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1043 @end table
1044
1045
1046
1047 @node omp_test_lock
1048 @section @code{omp_test_lock} -- Test and set simple lock if available
1049 @table @asis
1050 @item @emph{Description}:
1051 Before setting a simple lock, the lock variable must be initialized by
1052 @code{omp_init_lock}.  Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1053 does not block if the lock is not available.  This function returns
1054 @code{true} upon success, @code{false} otherwise.  Here, @code{true} and
1055 @code{false} represent their language-specific counterparts.
1056
1057 @item @emph{C/C++}:
1058 @multitable @columnfractions .20 .80
1059 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1060 @end multitable
1061
1062 @item @emph{Fortran}:
1063 @multitable @columnfractions .20 .80
1064 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1065 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1066 @end multitable
1067
1068 @item @emph{See also}:
1069 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1070
1071 @item @emph{Reference}:
1072 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1073 @end table
1074
1075
1076
1077 @node omp_unset_lock
1078 @section @code{omp_unset_lock} -- Unset simple lock
1079 @table @asis
1080 @item @emph{Description}:
1081 A simple lock about to be unset must have been locked by @code{omp_set_lock}
1082 or @code{omp_test_lock} before.  In addition, the lock must be held by the
1083 thread calling @code{omp_unset_lock}.  Then, the lock becomes unlocked.  If one
1084 or more threads attempted to set the lock before, one of them is chosen to,
1085 again, set the lock to itself.
1086
1087 @item @emph{C/C++}:
1088 @multitable @columnfractions .20 .80
1089 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1090 @end multitable
1091
1092 @item @emph{Fortran}:
1093 @multitable @columnfractions .20 .80
1094 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1095 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1096 @end multitable
1097
1098 @item @emph{See also}:
1099 @ref{omp_set_lock}, @ref{omp_test_lock}
1100
1101 @item @emph{Reference}:
1102 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1103 @end table
1104
1105
1106
1107 @node omp_destroy_lock
1108 @section @code{omp_destroy_lock} -- Destroy simple lock
1109 @table @asis
1110 @item @emph{Description}:
1111 Destroy a simple lock.  In order to be destroyed, a simple lock must be
1112 in the unlocked state.
1113
1114 @item @emph{C/C++}:
1115 @multitable @columnfractions .20 .80
1116 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1117 @end multitable
1118
1119 @item @emph{Fortran}:
1120 @multitable @columnfractions .20 .80
1121 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1122 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1123 @end multitable
1124
1125 @item @emph{See also}:
1126 @ref{omp_init_lock}
1127
1128 @item @emph{Reference}:
1129 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1130 @end table
1131
1132
1133
1134 @node omp_init_nest_lock
1135 @section @code{omp_init_nest_lock} -- Initialize nested lock
1136 @table @asis
1137 @item @emph{Description}:
1138 Initialize a nested lock.  After initialization, the lock is in
1139 an unlocked state and the nesting count is set to zero.
1140
1141 @item @emph{C/C++}:
1142 @multitable @columnfractions .20 .80
1143 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1144 @end multitable
1145
1146 @item @emph{Fortran}:
1147 @multitable @columnfractions .20 .80
1148 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1149 @item                   @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1150 @end multitable
1151
1152 @item @emph{See also}:
1153 @ref{omp_destroy_nest_lock}
1154
1155 @item @emph{Reference}:
1156 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1157 @end table
1158
1159
1160 @node omp_set_nest_lock
1161 @section @code{omp_set_nest_lock} -- Wait for and set nested lock
1162 @table @asis
1163 @item @emph{Description}:
1164 Before setting a nested lock, the lock variable must be initialized by
1165 @code{omp_init_nest_lock}.  The calling thread is blocked until the lock
1166 is available.  If the lock is already held by the current thread, the
1167 nesting count for the lock is incremented.
1168
1169 @item @emph{C/C++}:
1170 @multitable @columnfractions .20 .80
1171 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1172 @end multitable
1173
1174 @item @emph{Fortran}:
1175 @multitable @columnfractions .20 .80
1176 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1177 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1178 @end multitable
1179
1180 @item @emph{See also}:
1181 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1182
1183 @item @emph{Reference}:
1184 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1185 @end table
1186
1187
1188
1189 @node omp_test_nest_lock
1190 @section @code{omp_test_nest_lock} -- Test and set nested lock if available
1191 @table @asis
1192 @item @emph{Description}:
1193 Before setting a nested lock, the lock variable must be initialized by
1194 @code{omp_init_nest_lock}.  Contrary to @code{omp_set_nest_lock},
1195 @code{omp_test_nest_lock} does not block if the lock is not available.
1196 If the lock is already held by the current thread, the new nesting count
1197 is returned.  Otherwise, the return value equals zero.
1198
1199 @item @emph{C/C++}:
1200 @multitable @columnfractions .20 .80
1201 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1202 @end multitable
1203
1204 @item @emph{Fortran}:
1205 @multitable @columnfractions .20 .80
1206 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1207 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1208 @end multitable
1209
1210
1211 @item @emph{See also}:
1212 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1213
1214 @item @emph{Reference}:
1215 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1216 @end table
1217
1218
1219
1220 @node omp_unset_nest_lock
1221 @section @code{omp_unset_nest_lock} -- Unset nested lock
1222 @table @asis
1223 @item @emph{Description}:
1224 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1225 or @code{omp_test_nested_lock} before.  In addition, the lock must be held by the
1226 thread calling @code{omp_unset_nested_lock}.  If the nesting count drops to zero, the
1227 lock becomes unlocked.  If one ore more threads attempted to set the lock before,
1228 one of them is chosen to, again, set the lock to itself.
1229
1230 @item @emph{C/C++}:
1231 @multitable @columnfractions .20 .80
1232 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1233 @end multitable
1234
1235 @item @emph{Fortran}:
1236 @multitable @columnfractions .20 .80
1237 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1238 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1239 @end multitable
1240
1241 @item @emph{See also}:
1242 @ref{omp_set_nest_lock}
1243
1244 @item @emph{Reference}:
1245 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1246 @end table
1247
1248
1249
1250 @node omp_destroy_nest_lock
1251 @section @code{omp_destroy_nest_lock} -- Destroy nested lock
1252 @table @asis
1253 @item @emph{Description}:
1254 Destroy a nested lock.  In order to be destroyed, a nested lock must be
1255 in the unlocked state and its nesting count must equal zero.
1256
1257 @item @emph{C/C++}:
1258 @multitable @columnfractions .20 .80
1259 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1260 @end multitable
1261
1262 @item @emph{Fortran}:
1263 @multitable @columnfractions .20 .80
1264 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1265 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1266 @end multitable
1267
1268 @item @emph{See also}:
1269 @ref{omp_init_lock}
1270
1271 @item @emph{Reference}:
1272 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1273 @end table
1274
1275
1276
1277 @node omp_get_wtick
1278 @section @code{omp_get_wtick} -- Get timer precision
1279 @table @asis
1280 @item @emph{Description}:
1281 Gets the timer precision, i.e., the number of seconds between two
1282 successive clock ticks.
1283
1284 @item @emph{C/C++}:
1285 @multitable @columnfractions .20 .80
1286 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1287 @end multitable
1288
1289 @item @emph{Fortran}:
1290 @multitable @columnfractions .20 .80
1291 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1292 @end multitable
1293
1294 @item @emph{See also}:
1295 @ref{omp_get_wtime}
1296
1297 @item @emph{Reference}:
1298 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
1299 @end table
1300
1301
1302
1303 @node omp_get_wtime
1304 @section @code{omp_get_wtime} -- Elapsed wall clock time
1305 @table @asis
1306 @item @emph{Description}:
1307 Elapsed wall clock time in seconds.  The time is measured per thread, no
1308 guarantee can be made that two distinct threads measure the same time.
1309 Time is measured from some "time in the past", which is an arbitrary time
1310 guaranteed not to change during the execution of the program.
1311
1312 @item @emph{C/C++}:
1313 @multitable @columnfractions .20 .80
1314 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1315 @end multitable
1316
1317 @item @emph{Fortran}:
1318 @multitable @columnfractions .20 .80
1319 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1320 @end multitable
1321
1322 @item @emph{See also}:
1323 @ref{omp_get_wtick}
1324
1325 @item @emph{Reference}:
1326 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
1327 @end table
1328
1329
1330
1331 @c ---------------------------------------------------------------------
1332 @c OpenMP Environment Variables
1333 @c ---------------------------------------------------------------------
1334
1335 @node Environment Variables
1336 @chapter OpenMP Environment Variables
1337
1338 The environment variables which beginning with @env{OMP_} are defined by
1339 section 4 of the OpenMP specification in version 4.5, while those
1340 beginning with @env{GOMP_} are GNU extensions.
1341
1342 @menu
1343 * OMP_CANCELLATION::        Set whether cancellation is activated
1344 * OMP_DISPLAY_ENV::         Show OpenMP version and environment variables
1345 * OMP_DEFAULT_DEVICE::      Set the device used in target regions
1346 * OMP_DYNAMIC::             Dynamic adjustment of threads
1347 * OMP_MAX_ACTIVE_LEVELS::   Set the maximum number of nested parallel regions
1348 * OMP_MAX_TASK_PRIORITY::   Set the maximum task priority value
1349 * OMP_NESTED::              Nested parallel regions
1350 * OMP_NUM_THREADS::         Specifies the number of threads to use
1351 * OMP_PROC_BIND::           Whether theads may be moved between CPUs
1352 * OMP_PLACES::              Specifies on which CPUs the theads should be placed
1353 * OMP_STACKSIZE::           Set default thread stack size
1354 * OMP_SCHEDULE::            How threads are scheduled
1355 * OMP_THREAD_LIMIT::        Set the maximum number of threads
1356 * OMP_WAIT_POLICY::         How waiting threads are handled
1357 * GOMP_CPU_AFFINITY::       Bind threads to specific CPUs
1358 * GOMP_DEBUG::              Enable debugging output
1359 * GOMP_STACKSIZE::          Set default thread stack size
1360 * GOMP_SPINCOUNT::          Set the busy-wait spin count
1361 * GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
1362 @end menu
1363
1364
1365 @node OMP_CANCELLATION
1366 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1367 @cindex Environment Variable
1368 @table @asis
1369 @item @emph{Description}:
1370 If set to @code{TRUE}, the cancellation is activated.  If set to @code{FALSE} or
1371 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1372
1373 @item @emph{See also}:
1374 @ref{omp_get_cancellation}
1375
1376 @item @emph{Reference}:
1377 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
1378 @end table
1379
1380
1381
1382 @node OMP_DISPLAY_ENV
1383 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1384 @cindex Environment Variable
1385 @table @asis
1386 @item @emph{Description}:
1387 If set to @code{TRUE}, the OpenMP version number and the values
1388 associated with the OpenMP environment variables are printed to @code{stderr}.
1389 If set to @code{VERBOSE}, it additionally shows the value of the environment
1390 variables which are GNU extensions.  If undefined or set to @code{FALSE},
1391 this information will not be shown.
1392
1393
1394 @item @emph{Reference}:
1395 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
1396 @end table
1397
1398
1399
1400 @node OMP_DEFAULT_DEVICE
1401 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1402 @cindex Environment Variable
1403 @table @asis
1404 @item @emph{Description}:
1405 Set to choose the device which is used in a @code{target} region, unless the
1406 value is overridden by @code{omp_set_default_device} or by a @code{device}
1407 clause.  The value shall be the nonnegative device number. If no device with
1408 the given device number exists, the code is executed on the host.  If unset,
1409 device number 0 will be used.
1410
1411
1412 @item @emph{See also}:
1413 @ref{omp_get_default_device}, @ref{omp_set_default_device},
1414
1415 @item @emph{Reference}:
1416 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
1417 @end table
1418
1419
1420
1421 @node OMP_DYNAMIC
1422 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1423 @cindex Environment Variable
1424 @table @asis
1425 @item @emph{Description}:
1426 Enable or disable the dynamic adjustment of the number of threads
1427 within a team.  The value of this environment variable shall be
1428 @code{TRUE} or @code{FALSE}.  If undefined, dynamic adjustment is
1429 disabled by default.
1430
1431 @item @emph{See also}:
1432 @ref{omp_set_dynamic}
1433
1434 @item @emph{Reference}:
1435 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
1436 @end table
1437
1438
1439
1440 @node OMP_MAX_ACTIVE_LEVELS
1441 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
1442 @cindex Environment Variable
1443 @table @asis
1444 @item @emph{Description}:
1445 Specifies the initial value for the maximum number of nested parallel
1446 regions.  The value of this variable shall be a positive integer.
1447 If undefined, the number of active levels is unlimited.
1448
1449 @item @emph{See also}:
1450 @ref{omp_set_max_active_levels}
1451
1452 @item @emph{Reference}:
1453 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
1454 @end table
1455
1456
1457
1458 @node OMP_MAX_TASK_PRIORITY
1459 @section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
1460 number that can be set for a task.
1461 @cindex Environment Variable
1462 @table @asis
1463 @item @emph{Description}:
1464 Specifies the initial value for the maximum priority value that can be
1465 set for a task.  The value of this variable shall be a non-negative
1466 integer, and zero is allowed.  If undefined, the default priority is
1467 0.
1468
1469 @item @emph{See also}:
1470 @ref{omp_get_max_task_priority}
1471
1472 @item @emph{Reference}:
1473 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
1474 @end table
1475
1476
1477
1478 @node OMP_NESTED
1479 @section @env{OMP_NESTED} -- Nested parallel regions
1480 @cindex Environment Variable
1481 @cindex Implementation specific setting
1482 @table @asis
1483 @item @emph{Description}:
1484 Enable or disable nested parallel regions, i.e., whether team members
1485 are allowed to create new teams.  The value of this environment variable
1486 shall be @code{TRUE} or @code{FALSE}.  If undefined, nested parallel
1487 regions are disabled by default.
1488
1489 @item @emph{See also}:
1490 @ref{omp_set_nested}
1491
1492 @item @emph{Reference}:
1493 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
1494 @end table
1495
1496
1497
1498 @node OMP_NUM_THREADS
1499 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
1500 @cindex Environment Variable
1501 @cindex Implementation specific setting
1502 @table @asis
1503 @item @emph{Description}:
1504 Specifies the default number of threads to use in parallel regions.  The
1505 value of this variable shall be a comma-separated list of positive integers;
1506 the value specified the number of threads to use for the corresponding nested
1507 level.  If undefined one thread per CPU is used.
1508
1509 @item @emph{See also}:
1510 @ref{omp_set_num_threads}
1511
1512 @item @emph{Reference}:
1513 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
1514 @end table
1515
1516
1517
1518 @node OMP_PROC_BIND
1519 @section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
1520 @cindex Environment Variable
1521 @table @asis
1522 @item @emph{Description}:
1523 Specifies whether threads may be moved between processors.  If set to
1524 @code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
1525 they may be moved.  Alternatively, a comma separated list with the
1526 values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
1527 the thread affinity policy for the corresponding nesting level.  With
1528 @code{MASTER} the worker threads are in the same place partition as the
1529 master thread.  With @code{CLOSE} those are kept close to the master thread
1530 in contiguous place partitions.  And with @code{SPREAD} a sparse distribution
1531 across the place partitions is used.
1532
1533 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
1534 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
1535
1536 @item @emph{See also}:
1537 @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
1538
1539 @item @emph{Reference}:
1540 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
1541 @end table
1542
1543
1544
1545 @node OMP_PLACES
1546 @section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
1547 @cindex Environment Variable
1548 @table @asis
1549 @item @emph{Description}:
1550 The thread placement can be either specified using an abstract name or by an
1551 explicit list of the places.  The abstract names @code{threads}, @code{cores}
1552 and @code{sockets} can be optionally followed by a positive number in
1553 parentheses, which denotes the how many places shall be created.  With
1554 @code{threads} each place corresponds to a single hardware thread; @code{cores}
1555 to a single core with the corresponding number of hardware threads; and with
1556 @code{sockets} the place corresponds to a single socket.  The resulting
1557 placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
1558 variable.
1559
1560 Alternatively, the placement can be specified explicitly as comma-separated
1561 list of places.  A place is specified by set of nonnegative numbers in curly
1562 braces, denoting the denoting the hardware threads.  The hardware threads
1563 belonging to a place can either be specified as comma-separated list of
1564 nonnegative thread numbers or using an interval.  Multiple places can also be
1565 either specified by a comma-separated list of places or by an interval.  To
1566 specify an interval, a colon followed by the count is placed after after
1567 the hardware thread number or the place.  Optionally, the length can be
1568 followed by a colon and the stride number -- otherwise a unit stride is
1569 assumed.  For instance, the following specifies the same places list:
1570 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
1571 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
1572
1573 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
1574 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
1575 between CPUs following no placement policy.
1576
1577 @item @emph{See also}:
1578 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
1579 @ref{OMP_DISPLAY_ENV}
1580
1581 @item @emph{Reference}:
1582 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
1583 @end table
1584
1585
1586
1587 @node OMP_STACKSIZE
1588 @section @env{OMP_STACKSIZE} -- Set default thread stack size
1589 @cindex Environment Variable
1590 @table @asis
1591 @item @emph{Description}:
1592 Set the default thread stack size in kilobytes, unless the number
1593 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
1594 case the size is, respectively, in bytes, kilobytes, megabytes
1595 or gigabytes.  This is different from @code{pthread_attr_setstacksize}
1596 which gets the number of bytes as an argument.  If the stack size cannot
1597 be set due to system constraints, an error is reported and the initial
1598 stack size is left unchanged.  If undefined, the stack size is system
1599 dependent.
1600
1601 @item @emph{Reference}:
1602 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
1603 @end table
1604
1605
1606
1607 @node OMP_SCHEDULE
1608 @section @env{OMP_SCHEDULE} -- How threads are scheduled
1609 @cindex Environment Variable
1610 @cindex Implementation specific setting
1611 @table @asis
1612 @item @emph{Description}:
1613 Allows to specify @code{schedule type} and @code{chunk size}.
1614 The value of the variable shall have the form: @code{type[,chunk]} where
1615 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
1616 The optional @code{chunk} size shall be a positive integer.  If undefined,
1617 dynamic scheduling and a chunk size of 1 is used.
1618
1619 @item @emph{See also}:
1620 @ref{omp_set_schedule}
1621
1622 @item @emph{Reference}:
1623 @uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
1624 @end table
1625
1626
1627
1628 @node OMP_THREAD_LIMIT
1629 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
1630 @cindex Environment Variable
1631 @table @asis
1632 @item @emph{Description}:
1633 Specifies the number of threads to use for the whole program.  The
1634 value of this variable shall be a positive integer.  If undefined,
1635 the number of threads is not limited.
1636
1637 @item @emph{See also}:
1638 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
1639
1640 @item @emph{Reference}:
1641 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
1642 @end table
1643
1644
1645
1646 @node OMP_WAIT_POLICY
1647 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
1648 @cindex Environment Variable
1649 @table @asis
1650 @item @emph{Description}:
1651 Specifies whether waiting threads should be active or passive.  If
1652 the value is @code{PASSIVE}, waiting threads should not consume CPU
1653 power while waiting; while the value is @code{ACTIVE} specifies that
1654 they should.  If undefined, threads wait actively for a short time
1655 before waiting passively.
1656
1657 @item @emph{See also}:
1658 @ref{GOMP_SPINCOUNT}
1659
1660 @item @emph{Reference}:
1661 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
1662 @end table
1663
1664
1665
1666 @node GOMP_CPU_AFFINITY
1667 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
1668 @cindex Environment Variable
1669 @table @asis
1670 @item @emph{Description}:
1671 Binds threads to specific CPUs.  The variable should contain a space-separated
1672 or comma-separated list of CPUs.  This list may contain different kinds of
1673 entries: either single CPU numbers in any order, a range of CPUs (M-N)
1674 or a range with some stride (M-N:S).  CPU numbers are zero based.  For example,
1675 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
1676 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
1677 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
1678 and 14 respectively and then start assigning back from the beginning of
1679 the list.  @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
1680
1681 There is no libgomp library routine to determine whether a CPU affinity
1682 specification is in effect.  As a workaround, language-specific library
1683 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
1684 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
1685 environment variable.  A defined CPU affinity on startup cannot be changed
1686 or disabled during the runtime of the application.
1687
1688 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
1689 @env{OMP_PROC_BIND} has a higher precedence.  If neither has been set and
1690 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
1691 @code{FALSE}, the host system will handle the assignment of threads to CPUs.
1692
1693 @item @emph{See also}:
1694 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
1695 @end table
1696
1697
1698
1699 @node GOMP_DEBUG
1700 @section @env{GOMP_DEBUG} -- Enable debugging output
1701 @cindex Environment Variable
1702 @table @asis
1703 @item @emph{Description}:
1704 Enable debugging output.  The variable should be set to @code{0}
1705 (disabled, also the default if not set), or @code{1} (enabled).
1706
1707 If enabled, some debugging output will be printed during execution.
1708 This is currently not specified in more detail, and subject to change.
1709 @end table
1710
1711
1712
1713 @node GOMP_STACKSIZE
1714 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
1715 @cindex Environment Variable
1716 @cindex Implementation specific setting
1717 @table @asis
1718 @item @emph{Description}:
1719 Set the default thread stack size in kilobytes.  This is different from
1720 @code{pthread_attr_setstacksize} which gets the number of bytes as an
1721 argument.  If the stack size cannot be set due to system constraints, an
1722 error is reported and the initial stack size is left unchanged.  If undefined,
1723 the stack size is system dependent.
1724
1725 @item @emph{See also}:
1726 @ref{OMP_STACKSIZE}
1727
1728 @item @emph{Reference}:
1729 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
1730 GCC Patches Mailinglist},
1731 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
1732 GCC Patches Mailinglist}
1733 @end table
1734
1735
1736
1737 @node GOMP_SPINCOUNT
1738 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
1739 @cindex Environment Variable
1740 @cindex Implementation specific setting
1741 @table @asis
1742 @item @emph{Description}:
1743 Determines how long a threads waits actively with consuming CPU power
1744 before waiting passively without consuming CPU power.  The value may be
1745 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
1746 integer which gives the number of spins of the busy-wait loop.  The
1747 integer may optionally be followed by the following suffixes acting
1748 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
1749 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
1750 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
1751 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
1752 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
1753 If there are more OpenMP threads than available CPUs, 1000 and 100
1754 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
1755 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
1756 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
1757
1758 @item @emph{See also}:
1759 @ref{OMP_WAIT_POLICY}
1760 @end table
1761
1762
1763
1764 @node GOMP_RTEMS_THREAD_POOLS
1765 @section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
1766 @cindex Environment Variable
1767 @cindex Implementation specific setting
1768 @table @asis
1769 @item @emph{Description}:
1770 This environment variable is only used on the RTEMS real-time operating system.
1771 It determines the scheduler instance specific thread pools.  The format for
1772 @env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
1773 @code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
1774 separated by @code{:} where:
1775 @itemize @bullet
1776 @item @code{<thread-pool-count>} is the thread pool count for this scheduler
1777 instance.
1778 @item @code{$<priority>} is an optional priority for the worker threads of a
1779 thread pool according to @code{pthread_setschedparam}.  In case a priority
1780 value is omitted, then a worker thread will inherit the priority of the OpenMP
1781 master thread that created it.  The priority of the worker thread is not
1782 changed after creation, even if a new OpenMP master thread using the worker has
1783 a different priority.
1784 @item @code{@@<scheduler-name>} is the scheduler instance name according to the
1785 RTEMS application configuration.
1786 @end itemize
1787 In case no thread pool configuration is specified for a scheduler instance,
1788 then each OpenMP master thread of this scheduler instance will use its own
1789 dynamically allocated thread pool.  To limit the worker thread count of the
1790 thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
1791 @item @emph{Example}:
1792 Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
1793 @code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
1794 @code{"1@@WRK0:3$4@@WRK1"}.  Then there are no thread pool restrictions for
1795 scheduler instance @code{IO}.  In the scheduler instance @code{WRK0} there is
1796 one thread pool available.  Since no priority is specified for this scheduler
1797 instance, the worker thread inherits the priority of the OpenMP master thread
1798 that created it.  In the scheduler instance @code{WRK1} there are three thread
1799 pools available and their worker threads run at priority four.
1800 @end table
1801
1802
1803
1804 @c ---------------------------------------------------------------------
1805 @c Enabling OpenACC
1806 @c ---------------------------------------------------------------------
1807
1808 @node Enabling OpenACC
1809 @chapter Enabling OpenACC
1810
1811 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
1812 flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
1813 @code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
1814 @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
1815 @code{!$} conditional compilation sentinels in free form and @code{c$},
1816 @code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
1817 arranges for automatic linking of the OpenACC runtime library
1818 (@ref{OpenACC Runtime Library Routines}).
1819
1820 A complete description of all OpenACC directives accepted may be found in
1821 the @uref{https://www.openacc.org, OpenACC} Application Programming
1822 Interface manual, version 2.0.
1823
1824 Note that this is an experimental feature and subject to
1825 change in future versions of GCC.  See
1826 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
1827
1828
1829
1830 @c ---------------------------------------------------------------------
1831 @c OpenACC Runtime Library Routines
1832 @c ---------------------------------------------------------------------
1833
1834 @node OpenACC Runtime Library Routines
1835 @chapter OpenACC Runtime Library Routines
1836
1837 The runtime routines described here are defined by section 3 of the OpenACC
1838 specifications in version 2.0.
1839 They have C linkage, and do not throw exceptions.
1840 Generally, they are available only for the host, with the exception of
1841 @code{acc_on_device}, which is available for both the host and the
1842 acceleration device.
1843
1844 @menu
1845 * acc_get_num_devices::         Get number of devices for the given device
1846                                 type.
1847 * acc_set_device_type::         Set type of device accelerator to use.
1848 * acc_get_device_type::         Get type of device accelerator to be used.
1849 * acc_set_device_num::          Set device number to use.
1850 * acc_get_device_num::          Get device number to be used.
1851 * acc_async_test::              Tests for completion of a specific asynchronous
1852                                 operation.
1853 * acc_async_test_all::          Tests for completion of all asychronous
1854                                 operations.
1855 * acc_wait::                    Wait for completion of a specific asynchronous
1856                                 operation.
1857 * acc_wait_all::                Waits for completion of all asyncrhonous
1858                                 operations.
1859 * acc_wait_all_async::          Wait for completion of all asynchronous
1860                                 operations.
1861 * acc_wait_async::              Wait for completion of asynchronous operations.
1862 * acc_init::                    Initialize runtime for a specific device type.
1863 * acc_shutdown::                Shuts down the runtime for a specific device
1864                                 type.
1865 * acc_on_device::               Whether executing on a particular device
1866 * acc_malloc::                  Allocate device memory.
1867 * acc_free::                    Free device memory.
1868 * acc_copyin::                  Allocate device memory and copy host memory to
1869                                 it.
1870 * acc_present_or_copyin::       If the data is not present on the device,
1871                                 allocate device memory and copy from host
1872                                 memory.
1873 * acc_create::                  Allocate device memory and map it to host
1874                                 memory.
1875 * acc_present_or_create::       If the data is not present on the device,
1876                                 allocate device memory and map it to host
1877                                 memory.
1878 * acc_copyout::                 Copy device memory to host memory.
1879 * acc_delete::                  Free device memory.
1880 * acc_update_device::           Update device memory from mapped host memory.
1881 * acc_update_self::             Update host memory from mapped device memory.
1882 * acc_map_data::                Map previously allocated device memory to host
1883                                 memory.
1884 * acc_unmap_data::              Unmap device memory from host memory.
1885 * acc_deviceptr::               Get device pointer associated with specific
1886                                 host address.
1887 * acc_hostptr::                 Get host pointer associated with specific
1888                                 device address.
1889 * acc_is_present::              Indiciate whether host variable / array is
1890                                 present on device.
1891 * acc_memcpy_to_device::        Copy host memory to device memory.
1892 * acc_memcpy_from_device::      Copy device memory to host memory.
1893
1894 API routines for target platforms.
1895
1896 * acc_get_current_cuda_device:: Get CUDA device handle.
1897 * acc_get_current_cuda_context::Get CUDA context handle.
1898 * acc_get_cuda_stream::         Get CUDA stream handle.
1899 * acc_set_cuda_stream::         Set CUDA stream handle.
1900 @end menu
1901
1902
1903
1904 @node acc_get_num_devices
1905 @section @code{acc_get_num_devices} -- Get number of devices for given device type
1906 @table @asis
1907 @item @emph{Description}
1908 This function returns a value indicating the number of devices available
1909 for the device type specified in @var{devicetype}.
1910
1911 @item @emph{C/C++}:
1912 @multitable @columnfractions .20 .80
1913 @item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
1914 @end multitable
1915
1916 @item @emph{Fortran}:
1917 @multitable @columnfractions .20 .80
1918 @item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
1919 @item                  @tab @code{integer(kind=acc_device_kind) devicetype}
1920 @end multitable
1921
1922 @item @emph{Reference}:
1923 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
1924 3.2.1.
1925 @end table
1926
1927
1928
1929 @node acc_set_device_type
1930 @section @code{acc_set_device_type} -- Set type of device accelerator to use.
1931 @table @asis
1932 @item @emph{Description}
1933 This function indicates to the runtime library which device typr, specified
1934 in @var{devicetype}, to use when executing a parallel or kernels region.
1935
1936 @item @emph{C/C++}:
1937 @multitable @columnfractions .20 .80
1938 @item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
1939 @end multitable
1940
1941 @item @emph{Fortran}:
1942 @multitable @columnfractions .20 .80
1943 @item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
1944 @item                   @tab @code{integer(kind=acc_device_kind) devicetype}
1945 @end multitable
1946
1947 @item @emph{Reference}:
1948 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
1949 3.2.2.
1950 @end table
1951
1952
1953
1954 @node acc_get_device_type
1955 @section @code{acc_get_device_type} -- Get type of device accelerator to be used.
1956 @table @asis
1957 @item @emph{Description}
1958 This function returns what device type will be used when executing a
1959 parallel or kernels region.
1960
1961 @item @emph{C/C++}:
1962 @multitable @columnfractions .20 .80
1963 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
1964 @end multitable
1965
1966 @item @emph{Fortran}:
1967 @multitable @columnfractions .20 .80
1968 @item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
1969 @item                  @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
1970 @end multitable
1971
1972 @item @emph{Reference}:
1973 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
1974 3.2.3.
1975 @end table
1976
1977
1978
1979 @node acc_set_device_num
1980 @section @code{acc_set_device_num} -- Set device number to use.
1981 @table @asis
1982 @item @emph{Description}
1983 This function will indicate to the runtime which device number,
1984 specified by @var{num}, associated with the specifed device
1985 type @var{devicetype}.
1986
1987 @item @emph{C/C++}:
1988 @multitable @columnfractions .20 .80
1989 @item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
1990 @end multitable
1991
1992 @item @emph{Fortran}:
1993 @multitable @columnfractions .20 .80
1994 @item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
1995 @item                   @tab @code{integer devicenum}
1996 @item                   @tab @code{integer(kind=acc_device_kind) devicetype}
1997 @end multitable
1998
1999 @item @emph{Reference}:
2000 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2001 3.2.4.
2002 @end table
2003
2004
2005
2006 @node acc_get_device_num
2007 @section @code{acc_get_device_num} -- Get device number to be used.
2008 @table @asis
2009 @item @emph{Description}
2010 This function returns which device number associated with the specified device
2011 type @var{devicetype}, will be used when executing a parallel or kernels
2012 region.
2013
2014 @item @emph{C/C++}:
2015 @multitable @columnfractions .20 .80
2016 @item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
2017 @end multitable
2018
2019 @item @emph{Fortran}:
2020 @multitable @columnfractions .20 .80
2021 @item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
2022 @item                   @tab @code{integer(kind=acc_device_kind) devicetype}
2023 @item                   @tab @code{integer acc_get_device_num}
2024 @end multitable
2025
2026 @item @emph{Reference}:
2027 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2028 3.2.5.
2029 @end table
2030
2031
2032
2033 @node acc_async_test
2034 @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
2035 @table @asis
2036 @item @emph{Description}
2037 This function tests for completion of the asynchrounous operation specified
2038 in @var{arg}. In C/C++, a non-zero value will be returned to indicate
2039 the specified asynchronous operation has completed. While Fortran will return
2040 a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
2041 a zero and Fortran returns a @code{false}.
2042
2043 @item @emph{C/C++}:
2044 @multitable @columnfractions .20 .80
2045 @item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
2046 @end multitable
2047
2048 @item @emph{Fortran}:
2049 @multitable @columnfractions .20 .80
2050 @item @emph{Interface}: @tab @code{function acc_async_test(arg)}
2051 @item                   @tab @code{integer(kind=acc_handle_kind) arg}
2052 @item                   @tab @code{logical acc_async_test}
2053 @end multitable
2054
2055 @item @emph{Reference}:
2056 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2057 3.2.6.
2058 @end table
2059
2060
2061
2062 @node acc_async_test_all
2063 @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
2064 @table @asis
2065 @item @emph{Description}
2066 This function tests for completion of all asynchrounous operations.
2067 In C/C++, a non-zero value will be returned to indicate all asynchronous
2068 operations have completed. While Fortran will return a @code{true}. If
2069 any asynchronous operation has not completed, C/C++ returns a zero and
2070 Fortran returns a @code{false}.
2071
2072 @item @emph{C/C++}:
2073 @multitable @columnfractions .20 .80
2074 @item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
2075 @end multitable
2076
2077 @item @emph{Fortran}:
2078 @multitable @columnfractions .20 .80
2079 @item @emph{Interface}: @tab @code{function acc_async_test()}
2080 @item                   @tab @code{logical acc_get_device_num}
2081 @end multitable
2082
2083 @item @emph{Reference}:
2084 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2085 3.2.7.
2086 @end table
2087
2088
2089
2090 @node acc_wait
2091 @section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
2092 @table @asis
2093 @item @emph{Description}
2094 This function waits for completion of the asynchronous operation
2095 specified in @var{arg}.
2096
2097 @item @emph{C/C++}:
2098 @multitable @columnfractions .20 .80
2099 @item @emph{Prototype}: @tab @code{acc_wait(arg);}
2100 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
2101 @end multitable
2102
2103 @item @emph{Fortran}:
2104 @multitable @columnfractions .20 .80
2105 @item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
2106 @item                   @tab @code{integer(acc_handle_kind) arg}
2107 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
2108 @item                                               @tab @code{integer(acc_handle_kind) arg}
2109 @end multitable
2110
2111 @item @emph{Reference}:
2112 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2113 3.2.8.
2114 @end table
2115
2116
2117
2118 @node acc_wait_all
2119 @section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
2120 @table @asis
2121 @item @emph{Description}
2122 This function waits for the completion of all asynchronous operations.
2123
2124 @item @emph{C/C++}:
2125 @multitable @columnfractions .20 .80
2126 @item @emph{Prototype}: @tab @code{acc_wait_all(void);}
2127 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
2128 @end multitable
2129
2130 @item @emph{Fortran}:
2131 @multitable @columnfractions .20 .80
2132 @item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
2133 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
2134 @end multitable
2135
2136 @item @emph{Reference}:
2137 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2138 3.2.10.
2139 @end table
2140
2141
2142
2143 @node acc_wait_all_async
2144 @section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
2145 @table @asis
2146 @item @emph{Description}
2147 This function enqueues a wait operation on the queue @var{async} for any
2148 and all asynchronous operations that have been previously enqueued on
2149 any queue.
2150
2151 @item @emph{C/C++}:
2152 @multitable @columnfractions .20 .80
2153 @item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
2154 @end multitable
2155
2156 @item @emph{Fortran}:
2157 @multitable @columnfractions .20 .80
2158 @item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
2159 @item                   @tab @code{integer(acc_handle_kind) async}
2160 @end multitable
2161
2162 @item @emph{Reference}:
2163 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2164 3.2.11.
2165 @end table
2166
2167
2168
2169 @node acc_wait_async
2170 @section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
2171 @table @asis
2172 @item @emph{Description}
2173 This function enqueues a wait operation on queue @var{async} for any and all
2174 asynchronous operations enqueued on queue @var{arg}.
2175
2176 @item @emph{C/C++}:
2177 @multitable @columnfractions .20 .80
2178 @item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
2179 @end multitable
2180
2181 @item @emph{Fortran}:
2182 @multitable @columnfractions .20 .80
2183 @item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
2184 @item                   @tab @code{integer(acc_handle_kind) arg, async}
2185 @end multitable
2186
2187 @item @emph{Reference}:
2188 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2189 3.2.9.
2190 @end table
2191
2192
2193
2194 @node acc_init
2195 @section @code{acc_init} -- Initialize runtime for a specific device type.
2196 @table @asis
2197 @item @emph{Description}
2198 This function initializes the runtime for the device type specified in
2199 @var{devicetype}.
2200
2201 @item @emph{C/C++}:
2202 @multitable @columnfractions .20 .80
2203 @item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
2204 @end multitable
2205
2206 @item @emph{Fortran}:
2207 @multitable @columnfractions .20 .80
2208 @item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
2209 @item                   @tab @code{integer(acc_device_kind) devicetype}
2210 @end multitable
2211
2212 @item @emph{Reference}:
2213 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2214 3.2.12.
2215 @end table
2216
2217
2218
2219 @node acc_shutdown
2220 @section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
2221 @table @asis
2222 @item @emph{Description}
2223 This function shuts down the runtime for the device type specified in
2224 @var{devicetype}.
2225
2226 @item @emph{C/C++}:
2227 @multitable @columnfractions .20 .80
2228 @item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
2229 @end multitable
2230
2231 @item @emph{Fortran}:
2232 @multitable @columnfractions .20 .80
2233 @item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
2234 @item                   @tab @code{integer(acc_device_kind) devicetype}
2235 @end multitable
2236
2237 @item @emph{Reference}:
2238 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2239 3.2.13.
2240 @end table
2241
2242
2243
2244 @node acc_on_device
2245 @section @code{acc_on_device} -- Whether executing on a particular device
2246 @table @asis
2247 @item @emph{Description}:
2248 This function returns whether the program is executing on a particular
2249 device specified in @var{devicetype}. In C/C++ a non-zero value is
2250 returned to indicate the device is execiting on the specified device type.
2251 In Fortran, @code{true} will be returned. If the program is not executing
2252 on the specified device type C/C++ will return a zero, while Fortran will
2253 return @code{false}.
2254
2255 @item @emph{C/C++}:
2256 @multitable @columnfractions .20 .80
2257 @item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
2258 @end multitable
2259
2260 @item @emph{Fortran}:
2261 @multitable @columnfractions .20 .80
2262 @item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
2263 @item                   @tab @code{integer(acc_device_kind) devicetype}
2264 @item                   @tab @code{logical acc_on_device}
2265 @end multitable
2266
2267
2268 @item @emph{Reference}:
2269 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2270 3.2.14.
2271 @end table
2272
2273
2274
2275 @node acc_malloc
2276 @section @code{acc_malloc} -- Allocate device memory.
2277 @table @asis
2278 @item @emph{Description}
2279 This function allocates @var{len} bytes of device memory. It returns
2280 the device address of the allocated memory.
2281
2282 @item @emph{C/C++}:
2283 @multitable @columnfractions .20 .80
2284 @item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
2285 @end multitable
2286
2287 @item @emph{Reference}:
2288 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2289 3.2.15.
2290 @end table
2291
2292
2293
2294 @node acc_free
2295 @section @code{acc_free} -- Free device memory.
2296 @table @asis
2297 @item @emph{Description}
2298 Free previously allocated device memory at the device address @code{a}.
2299
2300 @item @emph{C/C++}:
2301 @multitable @columnfractions .20 .80
2302 @item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
2303 @end multitable
2304
2305 @item @emph{Reference}:
2306 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2307 3.2.16.
2308 @end table
2309
2310
2311
2312 @node acc_copyin
2313 @section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
2314 @table @asis
2315 @item @emph{Description}
2316 In C/C++, this function allocates @var{len} bytes of device memory
2317 and maps it to the specified host address in @var{a}. The device
2318 address of the newly allocated device memory is returned.
2319
2320 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2321 a contiguous array section. The second form @var{a} specifies a
2322 variable or array element and @var{len} specifies the length in bytes.
2323
2324 @item @emph{C/C++}:
2325 @multitable @columnfractions .20 .80
2326 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
2327 @end multitable
2328
2329 @item @emph{Fortran}:
2330 @multitable @columnfractions .20 .80
2331 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
2332 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2333 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
2334 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2335 @item                   @tab @code{integer len}
2336 @end multitable
2337
2338 @item @emph{Reference}:
2339 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2340 3.2.17.
2341 @end table
2342
2343
2344
2345 @node acc_present_or_copyin
2346 @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
2347 @table @asis
2348 @item @emph{Description}
2349 This function tests if the host data specifed by @var{a} and of length
2350 @var{len} is present or not. If it is not present, then device memory
2351 will be allocated and the host memory copied. The device address of
2352 the newly allocated device memory is returned.
2353
2354 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2355 a contiguous array section. The second form @var{a} specifies a variable or
2356 array element and @var{len} specifies the length in bytes.
2357
2358 @item @emph{C/C++}:
2359 @multitable @columnfractions .20 .80
2360 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
2361 @item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
2362 @end multitable
2363
2364 @item @emph{Fortran}:
2365 @multitable @columnfractions .20 .80
2366 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
2367 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2368 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
2369 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2370 @item                   @tab @code{integer len}
2371 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
2372 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2373 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
2374 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2375 @item                   @tab @code{integer len}
2376 @end multitable
2377
2378 @item @emph{Reference}:
2379 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2380 3.2.18.
2381 @end table
2382
2383
2384
2385 @node acc_create
2386 @section @code{acc_create} -- Allocate device memory and map it to host memory.
2387 @table @asis
2388 @item @emph{Description}
2389 This function allocates device memory and maps it to host memory specified
2390 by the host address @var{a} with a length of @var{len} bytes. In C/C++,
2391 the function returns the device address of the allocated device memory.
2392
2393 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2394 a contiguous array section. The second form @var{a} specifies a variable or
2395 array element and @var{len} specifies the length in bytes.
2396
2397 @item @emph{C/C++}:
2398 @multitable @columnfractions .20 .80
2399 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
2400 @end multitable
2401
2402 @item @emph{Fortran}:
2403 @multitable @columnfractions .20 .80
2404 @item @emph{Interface}: @tab @code{subroutine acc_create(a)}
2405 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2406 @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
2407 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2408 @item                   @tab @code{integer len}
2409 @end multitable
2410
2411 @item @emph{Reference}:
2412 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2413 3.2.19.
2414 @end table
2415
2416
2417
2418 @node acc_present_or_create
2419 @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
2420 @table @asis
2421 @item @emph{Description}
2422 This function tests if the host data specifed by @var{a} and of length
2423 @var{len} is present or not. If it is not present, then device memory
2424 will be allocated and mapped to host memory. In C/C++, the device address
2425 of the newly allocated device memory is returned.
2426
2427 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2428 a contiguous array section. The second form @var{a} specifies a variable or
2429 array element and @var{len} specifies the length in bytes.
2430
2431
2432 @item @emph{C/C++}:
2433 @multitable @columnfractions .20 .80
2434 @item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
2435 @item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
2436 @end multitable
2437
2438 @item @emph{Fortran}:
2439 @multitable @columnfractions .20 .80
2440 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
2441 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2442 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
2443 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2444 @item                   @tab @code{integer len}
2445 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
2446 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2447 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
2448 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2449 @item                   @tab @code{integer len}
2450 @end multitable
2451
2452 @item @emph{Reference}:
2453 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2454 3.2.20.
2455 @end table
2456
2457
2458
2459 @node acc_copyout
2460 @section @code{acc_copyout} -- Copy device memory to host memory.
2461 @table @asis
2462 @item @emph{Description}
2463 This function copies mapped device memory to host memory which is specified
2464 by host address @var{a} for a length @var{len} bytes in C/C++.
2465
2466 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2467 a contiguous array section. The second form @var{a} specifies a variable or
2468 array element and @var{len} specifies the length in bytes.
2469
2470 @item @emph{C/C++}:
2471 @multitable @columnfractions .20 .80
2472 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
2473 @end multitable
2474
2475 @item @emph{Fortran}:
2476 @multitable @columnfractions .20 .80
2477 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
2478 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2479 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
2480 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2481 @item                   @tab @code{integer len}
2482 @end multitable
2483
2484 @item @emph{Reference}:
2485 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2486 3.2.21.
2487 @end table
2488
2489
2490
2491 @node acc_delete
2492 @section @code{acc_delete} -- Free device memory.
2493 @table @asis
2494 @item @emph{Description}
2495 This function frees previously allocated device memory specified by
2496 the device address @var{a} and the length of @var{len} bytes.
2497
2498 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2499 a contiguous array section. The second form @var{a} specifies a variable or
2500 array element and @var{len} specifies the length in bytes.
2501
2502 @item @emph{C/C++}:
2503 @multitable @columnfractions .20 .80
2504 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
2505 @end multitable
2506
2507 @item @emph{Fortran}:
2508 @multitable @columnfractions .20 .80
2509 @item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
2510 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2511 @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
2512 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2513 @item                   @tab @code{integer len}
2514 @end multitable
2515
2516 @item @emph{Reference}:
2517 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2518 3.2.22.
2519 @end table
2520
2521
2522
2523 @node acc_update_device
2524 @section @code{acc_update_device} -- Update device memory from mapped host memory.
2525 @table @asis
2526 @item @emph{Description}
2527 This function updates the device copy from the previously mapped host memory.
2528 The host memory is specified with the host address @var{a} and a length of
2529 @var{len} bytes.
2530
2531 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2532 a contiguous array section. The second form @var{a} specifies a variable or
2533 array element and @var{len} specifies the length in bytes.
2534
2535 @item @emph{C/C++}:
2536 @multitable @columnfractions .20 .80
2537 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
2538 @end multitable
2539
2540 @item @emph{Fortran}:
2541 @multitable @columnfractions .20 .80
2542 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
2543 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2544 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
2545 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2546 @item                   @tab @code{integer len}
2547 @end multitable
2548
2549 @item @emph{Reference}:
2550 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2551 3.2.23.
2552 @end table
2553
2554
2555
2556 @node acc_update_self
2557 @section @code{acc_update_self} -- Update host memory from mapped device memory.
2558 @table @asis
2559 @item @emph{Description}
2560 This function updates the host copy from the previously mapped device memory.
2561 The host memory is specified with the host address @var{a} and a length of
2562 @var{len} bytes.
2563
2564 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2565 a contiguous array section. The second form @var{a} specifies a variable or
2566 array element and @var{len} specifies the length in bytes.
2567
2568 @item @emph{C/C++}:
2569 @multitable @columnfractions .20 .80
2570 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
2571 @end multitable
2572
2573 @item @emph{Fortran}:
2574 @multitable @columnfractions .20 .80
2575 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
2576 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2577 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
2578 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2579 @item                   @tab @code{integer len}
2580 @end multitable
2581
2582 @item @emph{Reference}:
2583 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2584 3.2.24.
2585 @end table
2586
2587
2588
2589 @node acc_map_data
2590 @section @code{acc_map_data} -- Map previously allocated device memory to host memory.
2591 @table @asis
2592 @item @emph{Description}
2593 This function maps previously allocated device and host memory. The device
2594 memory is specified with the device address @var{d}. The host memory is
2595 specified with the host address @var{h} and a length of @var{len}.
2596
2597 @item @emph{C/C++}:
2598 @multitable @columnfractions .20 .80
2599 @item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
2600 @end multitable
2601
2602 @item @emph{Reference}:
2603 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2604 3.2.25.
2605 @end table
2606
2607
2608
2609 @node acc_unmap_data
2610 @section @code{acc_unmap_data} -- Unmap device memory from host memory.
2611 @table @asis
2612 @item @emph{Description}
2613 This function unmaps previously mapped device and host memory. The latter
2614 specified by @var{h}.
2615
2616 @item @emph{C/C++}:
2617 @multitable @columnfractions .20 .80
2618 @item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
2619 @end multitable
2620
2621 @item @emph{Reference}:
2622 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2623 3.2.26.
2624 @end table
2625
2626
2627
2628 @node acc_deviceptr
2629 @section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
2630 @table @asis
2631 @item @emph{Description}
2632 This function returns the device address that has been mapped to the
2633 host address specified by @var{h}.
2634
2635 @item @emph{C/C++}:
2636 @multitable @columnfractions .20 .80
2637 @item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
2638 @end multitable
2639
2640 @item @emph{Reference}:
2641 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2642 3.2.27.
2643 @end table
2644
2645
2646
2647 @node acc_hostptr
2648 @section @code{acc_hostptr} -- Get host pointer associated with specific device address.
2649 @table @asis
2650 @item @emph{Description}
2651 This function returns the host address that has been mapped to the
2652 device address specified by @var{d}.
2653
2654 @item @emph{C/C++}:
2655 @multitable @columnfractions .20 .80
2656 @item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
2657 @end multitable
2658
2659 @item @emph{Reference}:
2660 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2661 3.2.28.
2662 @end table
2663
2664
2665
2666 @node acc_is_present
2667 @section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
2668 @table @asis
2669 @item @emph{Description}
2670 This function indicates whether the specified host address in @var{a} and a
2671 length of @var{len} bytes is present on the device. In C/C++, a non-zero
2672 value is returned to indicate the presence of the mapped memory on the
2673 device. A zero is returned to indicate the memory is not mapped on the
2674 device.
2675
2676 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2677 a contiguous array section. The second form @var{a} specifies a variable or
2678 array element and @var{len} specifies the length in bytes. If the host
2679 memory is mapped to device memory, then a @code{true} is returned. Otherwise,
2680 a @code{false} is return to indicate the mapped memory is not present.
2681
2682 @item @emph{C/C++}:
2683 @multitable @columnfractions .20 .80
2684 @item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
2685 @end multitable
2686
2687 @item @emph{Fortran}:
2688 @multitable @columnfractions .20 .80
2689 @item @emph{Interface}: @tab @code{function acc_is_present(a)}
2690 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2691 @item                   @tab @code{logical acc_is_present}
2692 @item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
2693 @item                   @tab @code{type, dimension(:[,:]...) :: a}
2694 @item                   @tab @code{integer len}
2695 @item                   @tab @code{logical acc_is_present}
2696 @end multitable
2697
2698 @item @emph{Reference}:
2699 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2700 3.2.29.
2701 @end table
2702
2703
2704
2705 @node acc_memcpy_to_device
2706 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
2707 @table @asis
2708 @item @emph{Description}
2709 This function copies host memory specified by host address of @var{src} to
2710 device memory specified by the device address @var{dest} for a length of
2711 @var{bytes} bytes.
2712
2713 @item @emph{C/C++}:
2714 @multitable @columnfractions .20 .80
2715 @item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
2716 @end multitable
2717
2718 @item @emph{Reference}:
2719 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2720 3.2.30.
2721 @end table
2722
2723
2724
2725 @node acc_memcpy_from_device
2726 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
2727 @table @asis
2728 @item @emph{Description}
2729 This function copies host memory specified by host address of @var{src} from
2730 device memory specified by the device address @var{dest} for a length of
2731 @var{bytes} bytes.
2732
2733 @item @emph{C/C++}:
2734 @multitable @columnfractions .20 .80
2735 @item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
2736 @end multitable
2737
2738 @item @emph{Reference}:
2739 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2740 3.2.31.
2741 @end table
2742
2743
2744
2745 @node acc_get_current_cuda_device
2746 @section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
2747 @table @asis
2748 @item @emph{Description}
2749 This function returns the CUDA device handle. This handle is the same
2750 as used by the CUDA Runtime or Driver API's.
2751
2752 @item @emph{C/C++}:
2753 @multitable @columnfractions .20 .80
2754 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
2755 @end multitable
2756
2757 @item @emph{Reference}:
2758 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2759 A.2.1.1.
2760 @end table
2761
2762
2763
2764 @node acc_get_current_cuda_context
2765 @section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
2766 @table @asis
2767 @item @emph{Description}
2768 This function returns the CUDA context handle. This handle is the same
2769 as used by the CUDA Runtime or Driver API's.
2770
2771 @item @emph{C/C++}:
2772 @multitable @columnfractions .20 .80
2773 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
2774 @end multitable
2775
2776 @item @emph{Reference}:
2777 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2778 A.2.1.2.
2779 @end table
2780
2781
2782
2783 @node acc_get_cuda_stream
2784 @section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
2785 @table @asis
2786 @item @emph{Description}
2787 This function returns the CUDA stream handle for the queue @var{async}.
2788 This handle is the same as used by the CUDA Runtime or Driver API's.
2789
2790 @item @emph{C/C++}:
2791 @multitable @columnfractions .20 .80
2792 @item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
2793 @end multitable
2794
2795 @item @emph{Reference}:
2796 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2797 A.2.1.3.
2798 @end table
2799
2800
2801
2802 @node acc_set_cuda_stream
2803 @section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
2804 @table @asis
2805 @item @emph{Description}
2806 This function associates the stream handle specified by @var{stream} with
2807 the queue @var{async}.
2808
2809 This cannot be used to change the stream handle associated with
2810 @code{acc_async_sync}.
2811
2812 The return value is not specified.
2813
2814 @item @emph{C/C++}:
2815 @multitable @columnfractions .20 .80
2816 @item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
2817 @end multitable
2818
2819 @item @emph{Reference}:
2820 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2821 A.2.1.4.
2822 @end table
2823
2824
2825
2826 @c ---------------------------------------------------------------------
2827 @c OpenACC Environment Variables
2828 @c ---------------------------------------------------------------------
2829
2830 @node OpenACC Environment Variables
2831 @chapter OpenACC Environment Variables
2832
2833 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
2834 are defined by section 4 of the OpenACC specification in version 2.0.
2835 The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
2836
2837 @menu
2838 * ACC_DEVICE_TYPE::
2839 * ACC_DEVICE_NUM::
2840 * GCC_ACC_NOTIFY::
2841 @end menu
2842
2843
2844
2845 @node ACC_DEVICE_TYPE
2846 @section @code{ACC_DEVICE_TYPE}
2847 @table @asis
2848 @item @emph{Reference}:
2849 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2850 4.1.
2851 @end table
2852
2853
2854
2855 @node ACC_DEVICE_NUM
2856 @section @code{ACC_DEVICE_NUM}
2857 @table @asis
2858 @item @emph{Reference}:
2859 @uref{https://www.openacc.org, OpenACC specification v2.0}, section
2860 4.2.
2861 @end table
2862
2863
2864
2865 @node GCC_ACC_NOTIFY
2866 @section @code{GCC_ACC_NOTIFY}
2867 @table @asis
2868 @item @emph{Description}:
2869 Print debug information pertaining to the accelerator.
2870 @end table
2871
2872
2873
2874 @c ---------------------------------------------------------------------
2875 @c CUDA Streams Usage
2876 @c ---------------------------------------------------------------------
2877
2878 @node CUDA Streams Usage
2879 @chapter CUDA Streams Usage
2880
2881 This applies to the @code{nvptx} plugin only.
2882
2883 The library provides elements that perform asynchronous movement of
2884 data and asynchronous operation of computing constructs.  This
2885 asynchronous functionality is implemented by making use of CUDA
2886 streams@footnote{See "Stream Management" in "CUDA Driver API",
2887 TRM-06703-001, Version 5.5, for additional information}.
2888
2889 The primary means by that the asychronous functionality is accessed
2890 is through the use of those OpenACC directives which make use of the
2891 @code{async} and @code{wait} clauses.  When the @code{async} clause is
2892 first used with a directive, it creates a CUDA stream.  If an
2893 @code{async-argument} is used with the @code{async} clause, then the
2894 stream is associated with the specified @code{async-argument}.
2895
2896 Following the creation of an association between a CUDA stream and the
2897 @code{async-argument} of an @code{async} clause, both the @code{wait}
2898 clause and the @code{wait} directive can be used.  When either the
2899 clause or directive is used after stream creation, it creates a
2900 rendezvous point whereby execution waits until all operations
2901 associated with the @code{async-argument}, that is, stream, have
2902 completed.
2903
2904 Normally, the management of the streams that are created as a result of
2905 using the @code{async} clause, is done without any intervention by the
2906 caller.  This implies the association between the @code{async-argument}
2907 and the CUDA stream will be maintained for the lifetime of the program.
2908 However, this association can be changed through the use of the library
2909 function @code{acc_set_cuda_stream}.  When the function
2910 @code{acc_set_cuda_stream} is called, the CUDA stream that was
2911 originally associated with the @code{async} clause will be destroyed.
2912 Caution should be taken when changing the association as subsequent
2913 references to the @code{async-argument} refer to a different
2914 CUDA stream.
2915
2916
2917
2918 @c ---------------------------------------------------------------------
2919 @c OpenACC Library Interoperability
2920 @c ---------------------------------------------------------------------
2921
2922 @node OpenACC Library Interoperability
2923 @chapter OpenACC Library Interoperability
2924
2925 @section Introduction
2926
2927 The OpenACC library uses the CUDA Driver API, and may interact with
2928 programs that use the Runtime library directly, or another library
2929 based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
2930 "Interactions with the CUDA Driver API" in
2931 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
2932 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
2933 for additional information on library interoperability.}.
2934 This chapter describes the use cases and what changes are
2935 required in order to use both the OpenACC library and the CUBLAS and Runtime
2936 libraries within a program.
2937
2938 @section First invocation: NVIDIA CUBLAS library API
2939
2940 In this first use case (see below), a function in the CUBLAS library is called
2941 prior to any of the functions in the OpenACC library. More specifically, the
2942 function @code{cublasCreate()}.
2943
2944 When invoked, the function initializes the library and allocates the
2945 hardware resources on the host and the device on behalf of the caller. Once
2946 the initialization and allocation has completed, a handle is returned to the
2947 caller. The OpenACC library also requires initialization and allocation of
2948 hardware resources. Since the CUBLAS library has already allocated the
2949 hardware resources for the device, all that is left to do is to initialize
2950 the OpenACC library and acquire the hardware resources on the host.
2951
2952 Prior to calling the OpenACC function that initializes the library and
2953 allocate the host hardware resources, you need to acquire the device number
2954 that was allocated during the call to @code{cublasCreate()}. The invoking of the
2955 runtime library function @code{cudaGetDevice()} accomplishes this. Once
2956 acquired, the device number is passed along with the device type as
2957 parameters to the OpenACC library function @code{acc_set_device_num()}.
2958
2959 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
2960 library uses the  context that was created during the call to
2961 @code{cublasCreate()}. In other words, both libraries will be sharing the
2962 same context.
2963
2964 @smallexample
2965     /* Create the handle */
2966     s = cublasCreate(&h);
2967     if (s != CUBLAS_STATUS_SUCCESS)
2968     @{
2969         fprintf(stderr, "cublasCreate failed %d\n", s);
2970         exit(EXIT_FAILURE);
2971     @}
2972
2973     /* Get the device number */
2974     e = cudaGetDevice(&dev);
2975     if (e != cudaSuccess)
2976     @{
2977         fprintf(stderr, "cudaGetDevice failed %d\n", e);
2978         exit(EXIT_FAILURE);
2979     @}
2980
2981     /* Initialize OpenACC library and use device 'dev' */
2982     acc_set_device_num(dev, acc_device_nvidia);
2983
2984 @end smallexample
2985 @center Use Case 1
2986
2987 @section First invocation: OpenACC library API
2988
2989 In this second use case (see below), a function in the OpenACC library is
2990 called prior to any of the functions in the CUBLAS library. More specificially,
2991 the function @code{acc_set_device_num()}.
2992
2993 In the use case presented here, the function @code{acc_set_device_num()}
2994 is used to both initialize the OpenACC library and allocate the hardware
2995 resources on the host and the device. In the call to the function, the
2996 call parameters specify which device to use and what device
2997 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
2998 is but one method to initialize the OpenACC library and allocate the
2999 appropriate hardware resources. Other methods are available through the
3000 use of environment variables and these will be discussed in the next section.
3001
3002 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
3003 functions can be called as seen with multiple calls being made to
3004 @code{acc_copyin()}. In addition, calls can be made to functions in the
3005 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
3006 subsequent to the calls to @code{acc_copyin()}.
3007 As seen in the previous use case, a call to @code{cublasCreate()}
3008 initializes the CUBLAS library and allocates the hardware resources on the
3009 host and the device.  However, since the device has already been allocated,
3010 @code{cublasCreate()} will only initialize the CUBLAS library and allocate
3011 the appropriate hardware resources on the host. The context that was created
3012 as part of the OpenACC initialization is shared with the CUBLAS library,
3013 similarly to the first use case.
3014
3015 @smallexample
3016     dev = 0;
3017
3018     acc_set_device_num(dev, acc_device_nvidia);
3019
3020     /* Copy the first set to the device */
3021     d_X = acc_copyin(&h_X[0], N * sizeof (float));
3022     if (d_X == NULL)
3023     @{
3024         fprintf(stderr, "copyin error h_X\n");
3025         exit(EXIT_FAILURE);
3026     @}
3027
3028     /* Copy the second set to the device */
3029     d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
3030     if (d_Y == NULL)
3031     @{
3032         fprintf(stderr, "copyin error h_Y1\n");
3033         exit(EXIT_FAILURE);
3034     @}
3035
3036     /* Create the handle */
3037     s = cublasCreate(&h);
3038     if (s != CUBLAS_STATUS_SUCCESS)
3039     @{
3040         fprintf(stderr, "cublasCreate failed %d\n", s);
3041         exit(EXIT_FAILURE);
3042     @}
3043
3044     /* Perform saxpy using CUBLAS library function */
3045     s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
3046     if (s != CUBLAS_STATUS_SUCCESS)
3047     @{
3048         fprintf(stderr, "cublasSaxpy failed %d\n", s);
3049         exit(EXIT_FAILURE);
3050     @}
3051
3052     /* Copy the results from the device */
3053     acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
3054
3055 @end smallexample
3056 @center Use Case 2
3057
3058 @section OpenACC library and environment variables
3059
3060 There are two environment variables associated with the OpenACC library
3061 that may be used to control the device type and device number:
3062 @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
3063 environement variables can be used as an alternative to calling
3064 @code{acc_set_device_num()}. As seen in the second use case, the device
3065 type and device number were specified using @code{acc_set_device_num()}.
3066 If however, the aforementioned environment variables were set, then the
3067 call to @code{acc_set_device_num()} would not be required.
3068
3069
3070 The use of the environment variables is only relevant when an OpenACC function
3071 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
3072 is called prior to a call to an OpenACC function, then you must call
3073 @code{acc_set_device_num()}@footnote{More complete information
3074 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
3075 sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
3076 Application Programming Interface”, Version 2.0.}
3077
3078
3079
3080 @c ---------------------------------------------------------------------
3081 @c The libgomp ABI
3082 @c ---------------------------------------------------------------------
3083
3084 @node The libgomp ABI
3085 @chapter The libgomp ABI
3086
3087 The following sections present notes on the external ABI as
3088 presented by libgomp.  Only maintainers should need them.
3089
3090 @menu
3091 * Implementing MASTER construct::
3092 * Implementing CRITICAL construct::
3093 * Implementing ATOMIC construct::
3094 * Implementing FLUSH construct::
3095 * Implementing BARRIER construct::
3096 * Implementing THREADPRIVATE construct::
3097 * Implementing PRIVATE clause::
3098 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
3099 * Implementing REDUCTION clause::
3100 * Implementing PARALLEL construct::
3101 * Implementing FOR construct::
3102 * Implementing ORDERED construct::
3103 * Implementing SECTIONS construct::
3104 * Implementing SINGLE construct::
3105 * Implementing OpenACC's PARALLEL construct::
3106 @end menu
3107
3108
3109 @node Implementing MASTER construct
3110 @section Implementing MASTER construct
3111
3112 @smallexample
3113 if (omp_get_thread_num () == 0)
3114   block
3115 @end smallexample
3116
3117 Alternately, we generate two copies of the parallel subfunction
3118 and only include this in the version run by the master thread.
3119 Surely this is not worthwhile though...
3120
3121
3122
3123 @node Implementing CRITICAL construct
3124 @section Implementing CRITICAL construct
3125
3126 Without a specified name,
3127
3128 @smallexample
3129   void GOMP_critical_start (void);
3130   void GOMP_critical_end (void);
3131 @end smallexample
3132
3133 so that we don't get COPY relocations from libgomp to the main
3134 application.
3135
3136 With a specified name, use omp_set_lock and omp_unset_lock with
3137 name being transformed into a variable declared like
3138
3139 @smallexample
3140   omp_lock_t gomp_critical_user_<name> __attribute__((common))
3141 @end smallexample
3142
3143 Ideally the ABI would specify that all zero is a valid unlocked
3144 state, and so we wouldn't need to initialize this at
3145 startup.
3146
3147
3148
3149 @node Implementing ATOMIC construct
3150 @section Implementing ATOMIC construct
3151
3152 The target should implement the @code{__sync} builtins.
3153
3154 Failing that we could add
3155
3156 @smallexample
3157   void GOMP_atomic_enter (void)
3158   void GOMP_atomic_exit (void)
3159 @end smallexample
3160
3161 which reuses the regular lock code, but with yet another lock
3162 object private to the library.
3163
3164
3165
3166 @node Implementing FLUSH construct
3167 @section Implementing FLUSH construct
3168
3169 Expands to the @code{__sync_synchronize} builtin.
3170
3171
3172
3173 @node Implementing BARRIER construct
3174 @section Implementing BARRIER construct
3175
3176 @smallexample
3177   void GOMP_barrier (void)
3178 @end smallexample
3179
3180
3181 @node Implementing THREADPRIVATE construct
3182 @section Implementing THREADPRIVATE construct
3183
3184 In _most_ cases we can map this directly to @code{__thread}.  Except
3185 that OMP allows constructors for C++ objects.  We can either
3186 refuse to support this (how often is it used?) or we can
3187 implement something akin to .ctors.
3188
3189 Even more ideally, this ctor feature is handled by extensions
3190 to the main pthreads library.  Failing that, we can have a set
3191 of entry points to register ctor functions to be called.
3192
3193
3194
3195 @node Implementing PRIVATE clause
3196 @section Implementing PRIVATE clause
3197
3198 In association with a PARALLEL, or within the lexical extent
3199 of a PARALLEL block, the variable becomes a local variable in
3200 the parallel subfunction.
3201
3202 In association with FOR or SECTIONS blocks, create a new
3203 automatic variable within the current function.  This preserves
3204 the semantic of new variable creation.
3205
3206
3207
3208 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3209 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3210
3211 This seems simple enough for PARALLEL blocks.  Create a private
3212 struct for communicating between the parent and subfunction.
3213 In the parent, copy in values for scalar and "small" structs;
3214 copy in addresses for others TREE_ADDRESSABLE types.  In the
3215 subfunction, copy the value into the local variable.
3216
3217 It is not clear what to do with bare FOR or SECTION blocks.
3218 The only thing I can figure is that we do something like:
3219
3220 @smallexample
3221 #pragma omp for firstprivate(x) lastprivate(y)
3222 for (int i = 0; i < n; ++i)
3223   body;
3224 @end smallexample
3225
3226 which becomes
3227
3228 @smallexample
3229 @{
3230   int x = x, y;
3231
3232   // for stuff
3233
3234   if (i == n)
3235     y = y;
3236 @}
3237 @end smallexample
3238
3239 where the "x=x" and "y=y" assignments actually have different
3240 uids for the two variables, i.e. not something you could write
3241 directly in C.  Presumably this only makes sense if the "outer"
3242 x and y are global variables.
3243
3244 COPYPRIVATE would work the same way, except the structure
3245 broadcast would have to happen via SINGLE machinery instead.
3246
3247
3248
3249 @node Implementing REDUCTION clause
3250 @section Implementing REDUCTION clause
3251
3252 The private struct mentioned in the previous section should have
3253 a pointer to an array of the type of the variable, indexed by the
3254 thread's @var{team_id}.  The thread stores its final value into the
3255 array, and after the barrier, the master thread iterates over the
3256 array to collect the values.
3257
3258
3259 @node Implementing PARALLEL construct
3260 @section Implementing PARALLEL construct
3261
3262 @smallexample
3263   #pragma omp parallel
3264   @{
3265     body;
3266   @}
3267 @end smallexample
3268
3269 becomes
3270
3271 @smallexample
3272   void subfunction (void *data)
3273   @{
3274     use data;
3275     body;
3276   @}
3277
3278   setup data;
3279   GOMP_parallel_start (subfunction, &data, num_threads);
3280   subfunction (&data);
3281   GOMP_parallel_end ();
3282 @end smallexample
3283
3284 @smallexample
3285   void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
3286 @end smallexample
3287
3288 The @var{FN} argument is the subfunction to be run in parallel.
3289
3290 The @var{DATA} argument is a pointer to a structure used to
3291 communicate data in and out of the subfunction, as discussed
3292 above with respect to FIRSTPRIVATE et al.
3293
3294 The @var{NUM_THREADS} argument is 1 if an IF clause is present
3295 and false, or the value of the NUM_THREADS clause, if
3296 present, or 0.
3297
3298 The function needs to create the appropriate number of
3299 threads and/or launch them from the dock.  It needs to
3300 create the team structure and assign team ids.
3301
3302 @smallexample
3303   void GOMP_parallel_end (void)
3304 @end smallexample
3305
3306 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
3307
3308
3309
3310 @node Implementing FOR construct
3311 @section Implementing FOR construct
3312
3313 @smallexample
3314   #pragma omp parallel for
3315   for (i = lb; i <= ub; i++)
3316     body;
3317 @end smallexample
3318
3319 becomes
3320
3321 @smallexample
3322   void subfunction (void *data)
3323   @{
3324     long _s0, _e0;
3325     while (GOMP_loop_static_next (&_s0, &_e0))
3326     @{
3327       long _e1 = _e0, i;
3328       for (i = _s0; i < _e1; i++)
3329         body;
3330     @}
3331     GOMP_loop_end_nowait ();
3332   @}
3333
3334   GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
3335   subfunction (NULL);
3336   GOMP_parallel_end ();
3337 @end smallexample
3338
3339 @smallexample
3340   #pragma omp for schedule(runtime)
3341   for (i = 0; i < n; i++)
3342     body;
3343 @end smallexample
3344
3345 becomes
3346
3347 @smallexample
3348   @{
3349     long i, _s0, _e0;
3350     if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
3351       do @{
3352         long _e1 = _e0;
3353         for (i = _s0, i < _e0; i++)
3354           body;
3355       @} while (GOMP_loop_runtime_next (&_s0, _&e0));
3356     GOMP_loop_end ();
3357   @}
3358 @end smallexample
3359
3360 Note that while it looks like there is trickiness to propagating
3361 a non-constant STEP, there isn't really.  We're explicitly allowed
3362 to evaluate it as many times as we want, and any variables involved
3363 should automatically be handled as PRIVATE or SHARED like any other
3364 variables.  So the expression should remain evaluable in the
3365 subfunction.  We can also pull it into a local variable if we like,
3366 but since its supposed to remain unchanged, we can also not if we like.
3367
3368 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
3369 able to get away with no work-sharing context at all, since we can
3370 simply perform the arithmetic directly in each thread to divide up
3371 the iterations.  Which would mean that we wouldn't need to call any
3372 of these routines.
3373
3374 There are separate routines for handling loops with an ORDERED
3375 clause.  Bookkeeping for that is non-trivial...
3376
3377
3378
3379 @node Implementing ORDERED construct
3380 @section Implementing ORDERED construct
3381
3382 @smallexample
3383   void GOMP_ordered_start (void)
3384   void GOMP_ordered_end (void)
3385 @end smallexample
3386
3387
3388
3389 @node Implementing SECTIONS construct
3390 @section Implementing SECTIONS construct
3391
3392 A block as
3393
3394 @smallexample
3395   #pragma omp sections
3396   @{
3397     #pragma omp section
3398     stmt1;
3399     #pragma omp section
3400     stmt2;
3401     #pragma omp section
3402     stmt3;
3403   @}
3404 @end smallexample
3405
3406 becomes
3407
3408 @smallexample
3409   for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
3410     switch (i)
3411       @{
3412       case 1:
3413         stmt1;
3414         break;
3415       case 2:
3416         stmt2;
3417         break;
3418       case 3:
3419         stmt3;
3420         break;
3421       @}
3422   GOMP_barrier ();
3423 @end smallexample
3424
3425
3426 @node Implementing SINGLE construct
3427 @section Implementing SINGLE construct
3428
3429 A block like
3430
3431 @smallexample
3432   #pragma omp single
3433   @{
3434     body;
3435   @}
3436 @end smallexample
3437
3438 becomes
3439
3440 @smallexample
3441   if (GOMP_single_start ())
3442     body;
3443   GOMP_barrier ();
3444 @end smallexample
3445
3446 while
3447
3448 @smallexample
3449   #pragma omp single copyprivate(x)
3450     body;
3451 @end smallexample
3452
3453 becomes
3454
3455 @smallexample
3456   datap = GOMP_single_copy_start ();
3457   if (datap == NULL)
3458     @{
3459       body;
3460       data.x = x;
3461       GOMP_single_copy_end (&data);
3462     @}
3463   else
3464     x = datap->x;
3465   GOMP_barrier ();
3466 @end smallexample
3467
3468
3469
3470 @node Implementing OpenACC's PARALLEL construct
3471 @section Implementing OpenACC's PARALLEL construct
3472
3473 @smallexample
3474   void GOACC_parallel ()
3475 @end smallexample
3476
3477
3478
3479 @c ---------------------------------------------------------------------
3480 @c Reporting Bugs
3481 @c ---------------------------------------------------------------------
3482
3483 @node Reporting Bugs
3484 @chapter Reporting Bugs
3485
3486 Bugs in the GNU Offloading and Multi Processing Runtime Library should
3487 be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  Please add
3488 "openacc", or "openmp", or both to the keywords field in the bug
3489 report, as appropriate.
3490
3491
3492
3493 @c ---------------------------------------------------------------------
3494 @c GNU General Public License
3495 @c ---------------------------------------------------------------------
3496
3497 @include gpl_v3.texi
3498
3499
3500
3501 @c ---------------------------------------------------------------------
3502 @c GNU Free Documentation License
3503 @c ---------------------------------------------------------------------
3504
3505 @include fdl.texi
3506
3507
3508
3509 @c ---------------------------------------------------------------------
3510 @c Funding Free Software
3511 @c ---------------------------------------------------------------------
3512
3513 @include funding.texi
3514
3515 @c ---------------------------------------------------------------------
3516 @c Index
3517 @c ---------------------------------------------------------------------
3518
3519 @node Library Index
3520 @unnumbered Library Index
3521
3522 @printindex cp
3523
3524 @bye