libgomp/libgomp.texi

   1 \input texinfo @c -*-texinfo-*-
   2
   3 @c %**start of header
   4 @setfilename libgomp.info
   5 @settitle GNU libgomp
   6 @c %**end of header
   7
   8
   9 @copying
  10 Copyright @copyright{} 2006-2015 Free Software Foundation, Inc.
  11
  12 Permission is granted to copy, distribute and/or modify this document
  13 under the terms of the GNU Free Documentation License, Version 1.3 or
  14 any later version published by the Free Software Foundation; with the
  15 Invariant Sections being ``Funding Free Software'', the Front-Cover
  16 texts being (a) (see below), and with the Back-Cover Texts being (b)
  17 (see below).  A copy of the license is included in the section entitled
  18 ``GNU Free Documentation License''.
  19
  20 (a) The FSF's Front-Cover Text is:
  21
  22      A GNU Manual
  23
  24 (b) The FSF's Back-Cover Text is:
  25
  26      You have freedom to copy and modify this GNU Manual, like GNU
  27      software.  Copies published by the Free Software Foundation raise
  28      funds for GNU development.
  29 @end copying
  30
  31 @ifinfo
  32 @dircategory GNU Libraries
  33 @direntry
  34 * libgomp: (libgomp).          GNU Offloading and Multi Processing Runtime Library.
  35 @end direntry
  36
  37 This manual documents libgomp, the GNU Offloading and Multi Processing
  38 Runtime library.  This is the GNU implementation of the OpenMP and
  39 OpenACC APIs for parallel and accelerator programming in C/C++ and
  40 Fortran.
  41
  42 Published by the Free Software Foundation
  43 51 Franklin Street, Fifth Floor
  44 Boston, MA 02110-1301 USA
  45
  46 @insertcopying
  47 @end ifinfo
  48
  49
  50 @setchapternewpage odd
  51
  52 @titlepage
  53 @title GNU Offloading and Multi Processing Runtime Library
  54 @subtitle The GNU OpenMP and OpenACC Implementation
  55 @page
  56 @vskip 0pt plus 1filll
  57 @comment For the @value{version-GCC} Version*
  58 @sp 1
  59 Published by the Free Software Foundation @*
  60 51 Franklin Street, Fifth Floor@*
  61 Boston, MA 02110-1301, USA@*
  62 @sp 1
  63 @insertcopying
  64 @end titlepage
  65
  66 @summarycontents
  67 @contents
  68 @page
  69
  70
  71 @node Top
  72 @top Introduction
  73 @cindex Introduction
  74
  75 This manual documents the usage of libgomp, the GNU Offloading and
  76 Multi Processing Runtime Library.  This includes the GNU
  77 implementation of the @uref{http://www.openmp.org, OpenMP} Application
  78 Programming Interface (API) for multi-platform shared-memory parallel
  79 programming in C/C++ and Fortran, and the GNU implementation of the
  80 @uref{http://www.openacc.org/, OpenACC} Application Programming
  81 Interface (API) for offloading of code to accelerator devices in C/C++
  82 and Fortran.
  83
  84 Originally, libgomp implemented the GNU OpenMP Runtime Library.  Based
  85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
  86 4's target construct) has been added later on, and the library's name
  87 changed to GNU Offloading and Multi Processing Runtime Library.
  88
  89
  90
  91 @comment
  92 @comment  When you add a new menu item, please keep the right hand
  93 @comment  aligned to the same column.  Do not use tabs.  This provides
  94 @comment  better formatting.
  95 @comment
  96 @menu
  97 * Enabling OpenACC::                 How to enable OpenACC for your
  98                                      applications.
  99 * OpenACC Runtime Library Routines:: The OpenACC runtime application
 100                                       programming interface.
 101 * OpenACC Environment Variables::    Influencing OpenACC runtime behavior with
 102                                      environment variables.
 103 * OpenACC Library Interoperability:: OpenACC library interoperability with the
 104                                      NVIDIA CUBLAS library.
 105 * Enabling OpenMP::                  How to enable OpenMP for your
 106                                      applications.
 107 * OpenMP Runtime Library Routines: Runtime Library Routines.
 108                                      The OpenMP runtime application programming
 109                                      interface.
 110 * OpenMP Environment Variables: Environment Variables.
 111                                      Influencing OpenMP runtime behavior with
 112                                      environment variables.
 113 * The libgomp ABI::                  Notes on the external libgomp ABI.
 114 * Reporting Bugs::                   How to report bugs in the GNU Offloading
 115                                      and Multi Processing Runtime Library.
 116 * Copying::                          GNU general public license says how you
 117                                      can copy and share libgomp.
 118 * GNU Free Documentation License::   How you can copy and share this manual.
 119 * Funding::                          How to help assure continued work for free
 120                                      software.
 121 * Library Index::                    Index of this documentation.
 122 @end menu
 123
 124
 125
 126 @c ---------------------------------------------------------------------
 127 @c Enabling OpenACC
 128 @c ---------------------------------------------------------------------
 129
 130 @node Enabling OpenACC
 131 @chapter Enabling OpenACC
 132
 133 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
 134 flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
 135 @code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
 136 @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
 137 @code{!$} conditional compilation sentinels in free form and @code{c$},
 138 @code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 139 arranges for automatic linking of the OpenACC runtime library
 140 (@ref{OpenACC Runtime Library Routines}).
 141
 142 A complete description of all OpenACC directives accepted may be found in
 143 the @uref{http://www.openacc.org/, OpenMP Application Programming
 144 Interface} manual, version 2.0.
 145
 146 Note that this is an experimental feature, incomplete, and subject to
 147 change in future versions of GCC.  See
 148 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
 149
 150
 151
 152 @c ---------------------------------------------------------------------
 153 @c OpenACC Runtime Library Routines
 154 @c ---------------------------------------------------------------------
 155
 156 @node OpenACC Runtime Library Routines
 157 @chapter OpenACC Runtime Library Routines
 158
 159 The runtime routines described here are defined by section 3 of the OpenACC
 160 specifications in version 2.0.
 161 They have C linkage, and do not throw exceptions.
 162 Generally, they are available only for the host, with the exception of
 163 @code{acc_on_device}, which is available for both the host and the
 164 acceleration device.
 165
 166 @menu
 167 * acc_get_num_devices::         Get number of devices for the given device type
 168 * acc_set_device_type::
 169 * acc_get_device_type::
 170 * acc_set_device_num::
 171 * acc_get_device_num::
 172 * acc_init::
 173 * acc_shutdown::
 174 * acc_on_device::               Whether executing on a particular device
 175 * acc_malloc::
 176 * acc_free::
 177 * acc_copyin::
 178 * acc_present_or_copyin::
 179 * acc_create::
 180 * acc_present_or_create::
 181 * acc_copyout::
 182 * acc_delete::
 183 * acc_update_device::
 184 * acc_update_self::
 185 * acc_map_data::
 186 * acc_unmap_data::
 187 * acc_deviceptr::
 188 * acc_hostptr::
 189 * acc_is_present::
 190 * acc_memcpy_to_device::
 191 * acc_memcpy_from_device::
 192
 193 API routines for target platforms.
 194
 195 * acc_get_current_cuda_device::
 196 * acc_get_current_cuda_context::
 197 * acc_get_cuda_stream::
 198 * acc_set_cuda_stream::
 199 @end menu
 200
 201
 202
 203 @node acc_get_num_devices
 204 @section @code{acc_get_num_devices} -- Get number of devices for given device type
 205 @table @asis
 206 @item @emph{Description}
 207 This routine returns a value indicating the
 208 number of devices available for the given device type.  It determines
 209 the number of devices in a @emph{passive} manner.  In other words, it
 210 does not alter the state within the runtime environment aside from
 211 possibly initializing an uninitialized device.  This aspect allows
 212 the routine to be called without concern for altering the interaction
 213 with an attached accelerator device.
 214
 215 @item @emph{Reference}:
 216 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 217 3.2.1.
 218 @end table
 219
 220
 221
 222 @node acc_set_device_type
 223 @section @code{acc_set_device_type}
 224 @table @asis
 225 @item @emph{Reference}:
 226 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 227 3.2.2.
 228 @end table
 229
 230
 231
 232 @node acc_get_device_type
 233 @section @code{acc_get_device_type}
 234 @table @asis
 235 @item @emph{Reference}:
 236 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 237 3.2.3.
 238 @end table
 239
 240
 241
 242 @node acc_set_device_num
 243 @section @code{acc_set_device_num}
 244 @table @asis
 245 @item @emph{Reference}:
 246 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 247 3.2.4.
 248 @end table
 249
 250
 251
 252 @node acc_get_device_num
 253 @section @code{acc_get_device_num}
 254 @table @asis
 255 @item @emph{Reference}:
 256 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 257 3.2.5.
 258 @end table
 259
 260
 261
 262 @node acc_init
 263 @section @code{acc_init}
 264 @table @asis
 265 @item @emph{Reference}:
 266 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 267 3.2.12.
 268 @end table
 269
 270
 271
 272 @node acc_shutdown
 273 @section @code{acc_shutdown}
 274 @table @asis
 275 @item @emph{Reference}:
 276 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 277 3.2.13.
 278 @end table
 279
 280
 281
 282 @node acc_on_device
 283 @section @code{acc_on_device} -- Whether executing on a particular device
 284 @table @asis
 285 @item @emph{Description}:
 286 This routine tells the program whether it is executing on a particular
 287 device.  Based on the argument passed, GCC tries to evaluate this to a
 288 constant at compile time, but library functions are also provided, for
 289 both the host and the acceleration device.
 290
 291 @item @emph{Reference}:
 292 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 293 3.2.14.
 294 @end table
 295
 296
 297
 298 @node acc_malloc
 299 @section @code{acc_malloc}
 300 @table @asis
 301 @item @emph{Reference}:
 302 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 303 3.2.15.
 304 @end table
 305
 306
 307
 308 @node acc_free
 309 @section @code{acc_free}
 310 @table @asis
 311 @item @emph{Reference}:
 312 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 313 3.2.16.
 314 @end table
 315
 316
 317
 318 @node acc_copyin
 319 @section @code{acc_copyin}
 320 @table @asis
 321 @item @emph{Reference}:
 322 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 323 3.2.17.
 324 @end table
 325
 326
 327
 328 @node acc_present_or_copyin
 329 @section @code{acc_present_or_copyin}
 330 @table @asis
 331 @item @emph{Reference}:
 332 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 333 3.2.18.
 334 @end table
 335
 336
 337
 338 @node acc_create
 339 @section @code{acc_create}
 340 @table @asis
 341 @item @emph{Reference}:
 342 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 343 3.2.19.
 344 @end table
 345
 346
 347
 348 @node acc_present_or_create
 349 @section @code{acc_present_or_create}
 350 @table @asis
 351 @item @emph{Reference}:
 352 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 353 3.2.20.
 354 @end table
 355
 356
 357
 358 @node acc_copyout
 359 @section @code{acc_copyout}
 360 @table @asis
 361 @item @emph{Reference}:
 362 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 363 3.2.21.
 364 @end table
 365
 366
 367
 368 @node acc_delete
 369 @section @code{acc_delete}
 370 @table @asis
 371 @item @emph{Reference}:
 372 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 373 3.2.22.
 374 @end table
 375
 376
 377
 378 @node acc_update_device
 379 @section @code{acc_update_device}
 380 @table @asis
 381 @item @emph{Reference}:
 382 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 383 3.2.23.
 384 @end table
 385
 386
 387
 388 @node acc_update_self
 389 @section @code{acc_update_self}
 390 @table @asis
 391 @item @emph{Reference}:
 392 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 393 3.2.24.
 394 @end table
 395
 396
 397
 398 @node acc_map_data
 399 @section @code{acc_map_data}
 400 @table @asis
 401 @item @emph{Reference}:
 402 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 403 3.2.25.
 404 @end table
 405
 406
 407
 408 @node acc_unmap_data
 409 @section @code{acc_unmap_data}
 410 @table @asis
 411 @item @emph{Reference}:
 412 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 413 3.2.26.
 414 @end table
 415
 416
 417
 418 @node acc_deviceptr
 419 @section @code{acc_deviceptr}
 420 @table @asis
 421 @item @emph{Reference}:
 422 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 423 3.2.27.
 424 @end table
 425
 426
 427
 428 @node acc_hostptr
 429 @section @code{acc_hostptr}
 430 @table @asis
 431 @item @emph{Reference}:
 432 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 433 3.2.28.
 434 @end table
 435
 436
 437
 438 @node acc_is_present
 439 @section @code{acc_is_present}
 440 @table @asis
 441 @item @emph{Reference}:
 442 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 443 3.2.29.
 444 @end table
 445
 446
 447
 448 @node acc_memcpy_to_device
 449 @section @code{acc_memcpy_to_device}
 450 @table @asis
 451 @item @emph{Reference}:
 452 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 453 3.2.30.
 454 @end table
 455
 456
 457
 458 @node acc_memcpy_from_device
 459 @section @code{acc_memcpy_from_device}
 460 @table @asis
 461 @item @emph{Reference}:
 462 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 463 3.2.31.
 464 @end table
 465
 466
 467
 468 @node acc_get_current_cuda_device
 469 @section @code{acc_get_current_cuda_device}
 470 @table @asis
 471 @item @emph{Reference}:
 472 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 473 A.2.1.1.
 474 @end table
 475
 476
 477
 478 @node acc_get_current_cuda_context
 479 @section @code{acc_get_current_cuda_context}
 480 @table @asis
 481 @item @emph{Reference}:
 482 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 483 A.2.1.2.
 484 @end table
 485
 486
 487
 488 @node acc_get_cuda_stream
 489 @section @code{acc_get_cuda_stream}
 490 @table @asis
 491 @item @emph{Reference}:
 492 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 493 A.2.1.3.
 494 @end table
 495
 496
 497
 498 @node acc_set_cuda_stream
 499 @section @code{acc_set_cuda_stream}
 500 @table @asis
 501 @item @emph{Reference}:
 502 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 503 A.2.1.4.
 504 @end table
 505
 506
 507
 508 @c ---------------------------------------------------------------------
 509 @c OpenACC Environment Variables
 510 @c ---------------------------------------------------------------------
 511
 512 @node OpenACC Environment Variables
 513 @chapter OpenACC Environment Variables
 514
 515 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
 516 are defined by section 4 of the OpenACC specification in version 2.0.
 517 The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
 518
 519 @menu
 520 * ACC_DEVICE_TYPE::
 521 * ACC_DEVICE_NUM::
 522 * GCC_ACC_NOTIFY::
 523 @end menu
 524
 525
 526
 527 @node ACC_DEVICE_TYPE
 528 @section @code{ACC_DEVICE_TYPE}
 529 @table @asis
 530 @item @emph{Reference}:
 531 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 532 4.1.
 533 @end table
 534
 535
 536
 537 @node ACC_DEVICE_NUM
 538 @section @code{ACC_DEVICE_NUM}
 539 @table @asis
 540 @item @emph{Reference}:
 541 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 542 4.2.
 543 @end table
 544
 545
 546
 547 @node GCC_ACC_NOTIFY
 548 @section @code{GCC_ACC_NOTIFY}
 549 @table @asis
 550 @item @emph{Description}:
 551 Print debug information pertaining to the accelerator.
 552 @end table
 553
 554
 555 @c ---------------------------------------------------------------------
 556 @c OpenACC Library Interoperability
 557 @c ---------------------------------------------------------------------
 558
 559 @node OpenACC Library Interoperability
 560 @chapter OpenACC Library Interoperability
 561
 562 @section Introduction
 563
 564 As the OpenACC library is built using the CUDA Driver API, the question has
 565 arisen on what impact does using the OpenACC library have on a program that
 566 uses the Runtime library, or a library based on the Runtime library, e.g.,
 567 CUBLAS@footnote{Seee section 2.26, "Interactions with the CUDA Driver API" in
 568 "CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU
 569 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
 570 July 2013, for additional information on library interoperability.}.
 571 This chapter will describe the use cases and what changes are
 572 required in order to use both the OpenACC library and the CUBLAS and Runtime
 573 libraries within a program.
 574
 575 @section First invocation: NVIDIA CUBLAS library API
 576
 577 In this first use case (see below), a function in the CUBLAS library is called
 578 prior to any of the functions in the OpenACC library. More specifically, the
 579 function @code{cublasCreate()}.
 580
 581 When invoked, the function will initialize the library and allocate the
 582 hardware resources on the host and the device on behalf of the caller. Once
 583 the initialization and allocation has completed, a handle is returned to the
 584 caller. The OpenACC library also requires initialization and allocation of
 585 hardware resources. Since the CUBLAS library has already allocated the
 586 hardware resources for the device, all that is left to do is to initialize
 587 the OpenACC library and acquire the hardware resources on the host.
 588
 589 Prior to calling the OpenACC function that will initialize the library and
 590 allocate the host hardware resources, one needs to acquire the device number
 591 that was allocated during the call to @code{cublasCreate()}. The invoking of the
 592 runtime library function @code{cudaGetDevice()} will accomplish this. Once
 593 acquired, the device number is passed along with the device type as
 594 parameters to the OpenACC library function @code{acc_set_device_num()}.
 595
 596 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
 597 library will be using the  context that was created during the call to
 598 @code{cublasCreate()}. In other words, both libraries will be sharing the
 599 same context.
 600
 601 @verbatim
 602     /* Create the handle */
 603     s = cublasCreate(&h);
 604     if (s != CUBLAS_STATUS_SUCCESS)
 605     {
 606         fprintf(stderr, "cublasCreate failed %d\n", s);
 607         exit(EXIT_FAILURE);
 608     }
 609
 610     /* Get the device number */
 611     e = cudaGetDevice(&dev);
 612     if (e != cudaSuccess)
 613     {
 614         fprintf(stderr, "cudaGetDevice failed %d\n", e);
 615         exit(EXIT_FAILURE);
 616     }
 617
 618     /* Initialize OpenACC library and use device 'dev' */
 619     acc_set_device_num(dev, acc_device_nvidia);
 620
 621 @end verbatim
 622 @center Use Case 1
 623
 624 @section First invocation: OpenACC library API
 625
 626 In this second use case (see below), a function in the OpenACC library is
 627 called prior to any of the functions in the CUBLAS library. More specificially,
 628 the function acc_set_device_num().
 629
 630 In the use case presented here, the function @code{acc_set_device_num()}
 631 is used to both initialize the OpenACC library and allocate the hardware
 632 resources on the host and the device. In the call to the function, the
 633 call parameters specify which device to use, i.e., 'dev', and what device
 634 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
 635 is but one method to initialize the OpenACC library and allocate the
 636 appropriate hardware resources. Other methods are available through the
 637 use of environment variables and these will be discussed in the next section.
 638
 639 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
 640 functions can be called as seen with multiple calls being made to
 641 @code{acc_copyin()}. In addition, calls can be made to functions in the
 642 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
 643 subsequent to the calls to @code{acc_copyin()}.
 644 As seen in the previous use case, a call to @code{cublasCreate()} will
 645 initialize the CUBLAS library and allocate the hardware resources on the
 646 host and the device.  However, since the device has already been allocated,
 647 @code{cublasCreate()} will only initialize the CUBLAS library and allocate
 648 the appropriate hardware resources on the host. The context that was created
 649 as part of the OpenACC initialization will be shared with the CUBLAS library,
 650 similarly to the first use case.
 651
 652 @verbatim
 653     dev = 0;
 654
 655     acc_set_device_num(dev, acc_device_nvidia);
 656
 657     /* Copy the first set to the device */
 658     d_X = acc_copyin(&h_X[0], N * sizeof (float));
 659     if (d_X == NULL)
 660     {
 661         fprintf(stderr, "copyin error h_X\n");
 662         exit(EXIT_FAILURE);
 663     }
 664
 665     /* Copy the second set to the device */
 666     d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
 667     if (d_Y == NULL)
 668     {
 669         fprintf(stderr, "copyin error h_Y1\n");
 670         exit(EXIT_FAILURE);
 671     }
 672
 673     /* Create the handle */
 674     s = cublasCreate(&h);
 675     if (s != CUBLAS_STATUS_SUCCESS)
 676     {
 677         fprintf(stderr, "cublasCreate failed %d\n", s);
 678         exit(EXIT_FAILURE);
 679     }
 680
 681     /* Perform saxpy using CUBLAS library function */
 682     s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
 683     if (s != CUBLAS_STATUS_SUCCESS)
 684     {
 685         fprintf(stderr, "cublasSaxpy failed %d\n", s);
 686         exit(EXIT_FAILURE);
 687     }
 688
 689     /* Copy the results from the device */
 690     acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
 691
 692 }
 693 @end verbatim
 694 @center Use Case 2
 695
 696 @section OpenACC library and environment variables
 697
 698 There are two environment variables associated with the OpenACC library that
 699 may be used to control the device type and device number.
 700 Namely, @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}. In the second
 701 use case, the device type and device number were specified using
 702 @code{acc_set_device_num()}. However, @env{ACC_DEVICE_TYPE} and
 703 @env{ACC_DEVICE_NUM} could have been defined and the call to
 704 @code{acc_set_device_num()} would be not be required. At the time of the
 705 call to @code{acc_copyin()}, these two environment variables would be
 706 sampled and their values used.
 707
 708 The use of the environment variables is only relevant when an OpenACC function
 709 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
 710 is called prior to a call to an OpenACC function, then a call to
 711 @code{acc_set_device_num()}, must be done@footnote{More complete information
 712 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
 713 sections 4.1 and 4.2 of the “The OpenACC
 714 Application Programming Interface”, Version 2.0, June, 2013.}.
 715
 716
 717
 718 @c ---------------------------------------------------------------------
 719 @c Enabling OpenMP
 720 @c ---------------------------------------------------------------------
 721
 722 @node Enabling OpenMP
 723 @chapter Enabling OpenMP
 724
 725 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
 726 flag @command{-fopenmp} must be specified.  This enables the OpenMP directive
 727 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
 728 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
 729 @code{!$} conditional compilation sentinels in free form and @code{c$},
 730 @code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 731 arranges for automatic linking of the OpenMP runtime library
 732 (@ref{Runtime Library Routines}).
 733
 734 A complete description of all OpenMP directives accepted may be found in
 735 the @uref{http://www.openmp.org, OpenMP Application Program Interface} manual,
 736 version 4.0.
 737
 738
 739 @c ---------------------------------------------------------------------
 740 @c OpenMP Runtime Library Routines
 741 @c ---------------------------------------------------------------------
 742
 743 @node Runtime Library Routines
 744 @chapter OpenMP Runtime Library Routines
 745
 746 The runtime routines described here are defined by Section 3 of the OpenMP
 747 specification in version 4.0.  The routines are structured in following
 748 three parts:
 749
 750 @menu
 751 Control threads, processors and the parallel environment.  They have C
 752 linkage, and do not throw exceptions.
 753
 754 * omp_get_active_level::        Number of active parallel regions
 755 * omp_get_ancestor_thread_num:: Ancestor thread ID
 756 * omp_get_cancellation::        Whether cancellation support is enabled
 757 * omp_get_default_device::      Get the default device for target regions
 758 * omp_get_dynamic::             Dynamic teams setting
 759 * omp_get_level::               Number of parallel regions
 760 * omp_get_max_active_levels::   Maximum number of active regions
 761 * omp_get_max_threads::         Maximum number of threads of parallel region
 762 * omp_get_nested::              Nested parallel regions
 763 * omp_get_num_devices::         Number of target devices
 764 * omp_get_num_procs::           Number of processors online
 765 * omp_get_num_teams::           Number of teams
 766 * omp_get_num_threads::         Size of the active team
 767 * omp_get_proc_bind::           Whether theads may be moved between CPUs
 768 * omp_get_schedule::            Obtain the runtime scheduling method
 769 * omp_get_team_num::            Get team number
 770 * omp_get_team_size::           Number of threads in a team
 771 * omp_get_thread_limit::        Maximum number of threads
 772 * omp_get_thread_num::          Current thread ID
 773 * omp_in_parallel::             Whether a parallel region is active
 774 * omp_in_final::                Whether in final or included task region
 775 * omp_is_initial_device::       Whether executing on the host device
 776 * omp_set_default_device::      Set the default device for target regions
 777 * omp_set_dynamic::             Enable/disable dynamic teams
 778 * omp_set_max_active_levels::   Limits the number of active parallel regions
 779 * omp_set_nested::              Enable/disable nested parallel regions
 780 * omp_set_num_threads::         Set upper team size limit
 781 * omp_set_schedule::            Set the runtime scheduling method
 782
 783 Initialize, set, test, unset and destroy simple and nested locks.
 784
 785 * omp_init_lock::            Initialize simple lock
 786 * omp_set_lock::             Wait for and set simple lock
 787 * omp_test_lock::            Test and set simple lock if available
 788 * omp_unset_lock::           Unset simple lock
 789 * omp_destroy_lock::         Destroy simple lock
 790 * omp_init_nest_lock::       Initialize nested lock
 791 * omp_set_nest_lock::        Wait for and set simple lock
 792 * omp_test_nest_lock::       Test and set nested lock if available
 793 * omp_unset_nest_lock::      Unset nested lock
 794 * omp_destroy_nest_lock::    Destroy nested lock
 795
 796 Portable, thread-based, wall clock timer.
 797
 798 * omp_get_wtick::            Get timer precision.
 799 * omp_get_wtime::            Elapsed wall clock time.
 800 @end menu
 801
 802
 803
 804 @node omp_get_active_level
 805 @section @code{omp_get_active_level} -- Number of parallel regions
 806 @table @asis
 807 @item @emph{Description}:
 808 This function returns the nesting level for the active parallel blocks,
 809 which enclose the calling call.
 810
 811 @item @emph{C/C++}
 812 @multitable @columnfractions .20 .80
 813 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
 814 @end multitable
 815
 816 @item @emph{Fortran}:
 817 @multitable @columnfractions .20 .80
 818 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
 819 @end multitable
 820
 821 @item @emph{See also}:
 822 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
 823
 824 @item @emph{Reference}:
 825 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.20.
 826 @end table
 827
 828
 829
 830 @node omp_get_ancestor_thread_num
 831 @section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
 832 @table @asis
 833 @item @emph{Description}:
 834 This function returns the thread identification number for the given
 835 nesting level of the current thread.  For values of @var{level} outside
 836 zero to @code{omp_get_level} -1 is returned; if @var{level} is
 837 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
 838
 839 @item @emph{C/C++}
 840 @multitable @columnfractions .20 .80
 841 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
 842 @end multitable
 843
 844 @item @emph{Fortran}:
 845 @multitable @columnfractions .20 .80
 846 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
 847 @item                   @tab @code{integer level}
 848 @end multitable
 849
 850 @item @emph{See also}:
 851 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
 852
 853 @item @emph{Reference}:
 854 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.18.
 855 @end table
 856
 857
 858
 859 @node omp_get_cancellation
 860 @section @code{omp_get_cancellation} -- Whether cancellation support is enabled
 861 @table @asis
 862 @item @emph{Description}:
 863 This function returns @code{true} if cancellation is activated, @code{false}
 864 otherwise.  Here, @code{true} and @code{false} represent their language-specific
 865 counterparts.  Unless @env{OMP_CANCELLATION} is set true, cancellations are
 866 deactivated.
 867
 868 @item @emph{C/C++}:
 869 @multitable @columnfractions .20 .80
 870 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
 871 @end multitable
 872
 873 @item @emph{Fortran}:
 874 @multitable @columnfractions .20 .80
 875 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
 876 @end multitable
 877
 878 @item @emph{See also}:
 879 @ref{OMP_CANCELLATION}
 880
 881 @item @emph{Reference}:
 882 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.9.
 883 @end table
 884
 885
 886
 887 @node omp_get_default_device
 888 @section @code{omp_get_default_device} -- Get the default device for target regions
 889 @table @asis
 890 @item @emph{Description}:
 891 Get the default device for target regions without device clause.
 892
 893 @item @emph{C/C++}:
 894 @multitable @columnfractions .20 .80
 895 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
 896 @end multitable
 897
 898 @item @emph{Fortran}:
 899 @multitable @columnfractions .20 .80
 900 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
 901 @end multitable
 902
 903 @item @emph{See also}:
 904 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
 905
 906 @item @emph{Reference}:
 907 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.24.
 908 @end table
 909
 910
 911
 912 @node omp_get_dynamic
 913 @section @code{omp_get_dynamic} -- Dynamic teams setting
 914 @table @asis
 915 @item @emph{Description}:
 916 This function returns @code{true} if enabled, @code{false} otherwise.
 917 Here, @code{true} and @code{false} represent their language-specific
 918 counterparts.
 919
 920 The dynamic team setting may be initialized at startup by the
 921 @env{OMP_DYNAMIC} environment variable or at runtime using
 922 @code{omp_set_dynamic}.  If undefined, dynamic adjustment is
 923 disabled by default.
 924
 925 @item @emph{C/C++}:
 926 @multitable @columnfractions .20 .80
 927 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
 928 @end multitable
 929
 930 @item @emph{Fortran}:
 931 @multitable @columnfractions .20 .80
 932 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
 933 @end multitable
 934
 935 @item @emph{See also}:
 936 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
 937
 938 @item @emph{Reference}:
 939 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.8.
 940 @end table
 941
 942
 943
 944 @node omp_get_level
 945 @section @code{omp_get_level} -- Obtain the current nesting level
 946 @table @asis
 947 @item @emph{Description}:
 948 This function returns the nesting level for the parallel blocks,
 949 which enclose the calling call.
 950
 951 @item @emph{C/C++}
 952 @multitable @columnfractions .20 .80
 953 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
 954 @end multitable
 955
 956 @item @emph{Fortran}:
 957 @multitable @columnfractions .20 .80
 958 @item @emph{Interface}: @tab @code{integer function omp_level()}
 959 @end multitable
 960
 961 @item @emph{See also}:
 962 @ref{omp_get_active_level}
 963
 964 @item @emph{Reference}:
 965 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.17.
 966 @end table
 967
 968
 969
 970 @node omp_get_max_active_levels
 971 @section @code{omp_get_max_active_levels} -- Maximum number of active regions
 972 @table @asis
 973 @item @emph{Description}:
 974 This function obtains the maximum allowed number of nested, active parallel regions.
 975
 976 @item @emph{C/C++}
 977 @multitable @columnfractions .20 .80
 978 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
 979 @end multitable
 980
 981 @item @emph{Fortran}:
 982 @multitable @columnfractions .20 .80
 983 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
 984 @end multitable
 985
 986 @item @emph{See also}:
 987 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
 988
 989 @item @emph{Reference}:
 990 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.16.
 991 @end table
 992
 993
 994
 995 @node omp_get_max_threads
 996 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
 997 @table @asis
 998 @item @emph{Description}:
 999 Return the maximum number of threads used for the current parallel region
1000 that does not use the clause @code{num_threads}.
1001
1002 @item @emph{C/C++}:
1003 @multitable @columnfractions .20 .80
1004 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
1005 @end multitable
1006
1007 @item @emph{Fortran}:
1008 @multitable @columnfractions .20 .80
1009 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
1010 @end multitable
1011
1012 @item @emph{See also}:
1013 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
1014
1015 @item @emph{Reference}:
1016 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.3.
1017 @end table
1018
1019
1020
1021 @node omp_get_nested
1022 @section @code{omp_get_nested} -- Nested parallel regions
1023 @table @asis
1024 @item @emph{Description}:
1025 This function returns @code{true} if nested parallel regions are
1026 enabled, @code{false} otherwise.  Here, @code{true} and @code{false}
1027 represent their language-specific counterparts.
1028
1029 Nested parallel regions may be initialized at startup by the
1030 @env{OMP_NESTED} environment variable or at runtime using
1031 @code{omp_set_nested}.  If undefined, nested parallel regions are
1032 disabled by default.
1033
1034 @item @emph{C/C++}:
1035 @multitable @columnfractions .20 .80
1036 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
1037 @end multitable
1038
1039 @item @emph{Fortran}:
1040 @multitable @columnfractions .20 .80
1041 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
1042 @end multitable
1043
1044 @item @emph{See also}:
1045 @ref{omp_set_nested}, @ref{OMP_NESTED}
1046
1047 @item @emph{Reference}:
1048 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.11.
1049 @end table
1050
1051
1052
1053 @node omp_get_num_devices
1054 @section @code{omp_get_num_devices} -- Number of target devices
1055 @table @asis
1056 @item @emph{Description}:
1057 Returns the number of target devices.
1058
1059 @item @emph{C/C++}:
1060 @multitable @columnfractions .20 .80
1061 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
1062 @end multitable
1063
1064 @item @emph{Fortran}:
1065 @multitable @columnfractions .20 .80
1066 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
1067 @end multitable
1068
1069 @item @emph{Reference}:
1070 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.25.
1071 @end table
1072
1073
1074
1075 @node omp_get_num_procs
1076 @section @code{omp_get_num_procs} -- Number of processors online
1077 @table @asis
1078 @item @emph{Description}:
1079 Returns the number of processors online on that device.
1080
1081 @item @emph{C/C++}:
1082 @multitable @columnfractions .20 .80
1083 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
1084 @end multitable
1085
1086 @item @emph{Fortran}:
1087 @multitable @columnfractions .20 .80
1088 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
1089 @end multitable
1090
1091 @item @emph{Reference}:
1092 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.5.
1093 @end table
1094
1095
1096
1097 @node omp_get_num_teams
1098 @section @code{omp_get_num_teams} -- Number of teams
1099 @table @asis
1100 @item @emph{Description}:
1101 Returns the number of teams in the current team region.
1102
1103 @item @emph{C/C++}:
1104 @multitable @columnfractions .20 .80
1105 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
1106 @end multitable
1107
1108 @item @emph{Fortran}:
1109 @multitable @columnfractions .20 .80
1110 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
1111 @end multitable
1112
1113 @item @emph{Reference}:
1114 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.26.
1115 @end table
1116
1117
1118
1119 @node omp_get_num_threads
1120 @section @code{omp_get_num_threads} -- Size of the active team
1121 @table @asis
1122 @item @emph{Description}:
1123 Returns the number of threads in the current team.  In a sequential section of
1124 the program @code{omp_get_num_threads} returns 1.
1125
1126 The default team size may be initialized at startup by the
1127 @env{OMP_NUM_THREADS} environment variable.  At runtime, the size
1128 of the current team may be set either by the @code{NUM_THREADS}
1129 clause or by @code{omp_set_num_threads}.  If none of the above were
1130 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
1131 one thread per CPU online is used.
1132
1133 @item @emph{C/C++}:
1134 @multitable @columnfractions .20 .80
1135 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
1136 @end multitable
1137
1138 @item @emph{Fortran}:
1139 @multitable @columnfractions .20 .80
1140 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
1141 @end multitable
1142
1143 @item @emph{See also}:
1144 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
1145
1146 @item @emph{Reference}:
1147 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.2.
1148 @end table
1149
1150
1151
1152 @node omp_get_proc_bind
1153 @section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
1154 @table @asis
1155 @item @emph{Description}:
1156 This functions returns the currently active thread affinity policy, which is
1157 set via @env{OMP_PROC_BIND}.  Possible values are @code{omp_proc_bind_false},
1158 @code{omp_proc_bind_true}, @code{omp_proc_bind_master},
1159 @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
1160
1161 @item @emph{C/C++}:
1162 @multitable @columnfractions .20 .80
1163 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1164 @end multitable
1165
1166 @item @emph{Fortran}:
1167 @multitable @columnfractions .20 .80
1168 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1169 @end multitable
1170
1171 @item @emph{See also}:
1172 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1173
1174 @item @emph{Reference}:
1175 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.22.
1176 @end table
1177
1178
1179
1180 @node omp_get_schedule
1181 @section @code{omp_get_schedule} -- Obtain the runtime scheduling method
1182 @table @asis
1183 @item @emph{Description}:
1184 Obtain the runtime scheduling method.  The @var{kind} argument will be
1185 set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
1186 @code{omp_sched_guided} or @code{omp_sched_auto}.  The second argument,
1187 @var{modifier}, is set to the chunk size.
1188
1189 @item @emph{C/C++}
1190 @multitable @columnfractions .20 .80
1191 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *modifier);}
1192 @end multitable
1193
1194 @item @emph{Fortran}:
1195 @multitable @columnfractions .20 .80
1196 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, modifier)}
1197 @item                   @tab @code{integer(kind=omp_sched_kind) kind}
1198 @item                   @tab @code{integer modifier}
1199 @end multitable
1200
1201 @item @emph{See also}:
1202 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
1203
1204 @item @emph{Reference}:
1205 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.13.
1206 @end table
1207
1208
1209
1210 @node omp_get_team_num
1211 @section @code{omp_get_team_num} -- Get team number
1212 @table @asis
1213 @item @emph{Description}:
1214 Returns the team number of the calling thread.
1215
1216 @item @emph{C/C++}:
1217 @multitable @columnfractions .20 .80
1218 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1219 @end multitable
1220
1221 @item @emph{Fortran}:
1222 @multitable @columnfractions .20 .80
1223 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1224 @end multitable
1225
1226 @item @emph{Reference}:
1227 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.27.
1228 @end table
1229
1230
1231
1232 @node omp_get_team_size
1233 @section @code{omp_get_team_size} -- Number of threads in a team
1234 @table @asis
1235 @item @emph{Description}:
1236 This function returns the number of threads in a thread team to which
1237 either the current thread or its ancestor belongs.  For values of @var{level}
1238 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
1239 1 is returned, and for @code{omp_get_level}, the result is identical
1240 to @code{omp_get_num_threads}.
1241
1242 @item @emph{C/C++}:
1243 @multitable @columnfractions .20 .80
1244 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
1245 @end multitable
1246
1247 @item @emph{Fortran}:
1248 @multitable @columnfractions .20 .80
1249 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1250 @item                   @tab @code{integer level}
1251 @end multitable
1252
1253 @item @emph{See also}:
1254 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1255
1256 @item @emph{Reference}:
1257 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.19.
1258 @end table
1259
1260
1261
1262 @node omp_get_thread_limit
1263 @section @code{omp_get_thread_limit} -- Maximum number of threads
1264 @table @asis
1265 @item @emph{Description}:
1266 Return the maximum number of threads of the program.
1267
1268 @item @emph{C/C++}:
1269 @multitable @columnfractions .20 .80
1270 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
1271 @end multitable
1272
1273 @item @emph{Fortran}:
1274 @multitable @columnfractions .20 .80
1275 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
1276 @end multitable
1277
1278 @item @emph{See also}:
1279 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
1280
1281 @item @emph{Reference}:
1282 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.14.
1283 @end table
1284
1285
1286
1287 @node omp_get_thread_num
1288 @section @code{omp_get_thread_num} -- Current thread ID
1289 @table @asis
1290 @item @emph{Description}:
1291 Returns a unique thread identification number within the current team.
1292 In a sequential parts of the program, @code{omp_get_thread_num}
1293 always returns 0.  In parallel regions the return value varies
1294 from 0 to @code{omp_get_num_threads}-1 inclusive.  The return
1295 value of the master thread of a team is always 0.
1296
1297 @item @emph{C/C++}:
1298 @multitable @columnfractions .20 .80
1299 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
1300 @end multitable
1301
1302 @item @emph{Fortran}:
1303 @multitable @columnfractions .20 .80
1304 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
1305 @end multitable
1306
1307 @item @emph{See also}:
1308 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
1309
1310 @item @emph{Reference}:
1311 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.4.
1312 @end table
1313
1314
1315
1316 @node omp_in_parallel
1317 @section @code{omp_in_parallel} -- Whether a parallel region is active
1318 @table @asis
1319 @item @emph{Description}:
1320 This function returns @code{true} if currently running in parallel,
1321 @code{false} otherwise.  Here, @code{true} and @code{false} represent
1322 their language-specific counterparts.
1323
1324 @item @emph{C/C++}:
1325 @multitable @columnfractions .20 .80
1326 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
1327 @end multitable
1328
1329 @item @emph{Fortran}:
1330 @multitable @columnfractions .20 .80
1331 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
1332 @end multitable
1333
1334 @item @emph{Reference}:
1335 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.6.
1336 @end table
1337
1338
1339 @node omp_in_final
1340 @section @code{omp_in_final} -- Whether in final or included task region
1341 @table @asis
1342 @item @emph{Description}:
1343 This function returns @code{true} if currently running in a final
1344 or included task region, @code{false} otherwise.  Here, @code{true}
1345 and @code{false} represent their language-specific counterparts.
1346
1347 @item @emph{C/C++}:
1348 @multitable @columnfractions .20 .80
1349 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1350 @end multitable
1351
1352 @item @emph{Fortran}:
1353 @multitable @columnfractions .20 .80
1354 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
1355 @end multitable
1356
1357 @item @emph{Reference}:
1358 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.21.
1359 @end table
1360
1361
1362
1363 @node omp_is_initial_device
1364 @section @code{omp_is_initial_device} -- Whether executing on the host device
1365 @table @asis
1366 @item @emph{Description}:
1367 This function returns @code{true} if currently running on the host device,
1368 @code{false} otherwise.  Here, @code{true} and @code{false} represent
1369 their language-specific counterparts.
1370
1371 @item @emph{C/C++}:
1372 @multitable @columnfractions .20 .80
1373 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
1374 @end multitable
1375
1376 @item @emph{Fortran}:
1377 @multitable @columnfractions .20 .80
1378 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
1379 @end multitable
1380
1381 @item @emph{Reference}:
1382 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.28.
1383 @end table
1384
1385
1386
1387 @node omp_set_default_device
1388 @section @code{omp_set_default_device} -- Set the default device for target regions
1389 @table @asis
1390 @item @emph{Description}:
1391 Set the default device for target regions without device clause.  The argument
1392 shall be a nonnegative device number.
1393
1394 @item @emph{C/C++}:
1395 @multitable @columnfractions .20 .80
1396 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1397 @end multitable
1398
1399 @item @emph{Fortran}:
1400 @multitable @columnfractions .20 .80
1401 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1402 @item                   @tab @code{integer device_num}
1403 @end multitable
1404
1405 @item @emph{See also}:
1406 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1407
1408 @item @emph{Reference}:
1409 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.23.
1410 @end table
1411
1412
1413
1414 @node omp_set_dynamic
1415 @section @code{omp_set_dynamic} -- Enable/disable dynamic teams
1416 @table @asis
1417 @item @emph{Description}:
1418 Enable or disable the dynamic adjustment of the number of threads
1419 within a team.  The function takes the language-specific equivalent
1420 of @code{true} and @code{false}, where @code{true} enables dynamic
1421 adjustment of team sizes and @code{false} disables it.
1422
1423 @item @emph{C/C++}:
1424 @multitable @columnfractions .20 .80
1425 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
1426 @end multitable
1427
1428 @item @emph{Fortran}:
1429 @multitable @columnfractions .20 .80
1430 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
1431 @item                   @tab @code{logical, intent(in) :: dynamic_threads}
1432 @end multitable
1433
1434 @item @emph{See also}:
1435 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
1436
1437 @item @emph{Reference}:
1438 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.7.
1439 @end table
1440
1441
1442
1443 @node omp_set_max_active_levels
1444 @section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
1445 @table @asis
1446 @item @emph{Description}:
1447 This function limits the maximum allowed number of nested, active
1448 parallel regions.
1449
1450 @item @emph{C/C++}
1451 @multitable @columnfractions .20 .80
1452 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
1453 @end multitable
1454
1455 @item @emph{Fortran}:
1456 @multitable @columnfractions .20 .80
1457 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1458 @item                   @tab @code{integer max_levels}
1459 @end multitable
1460
1461 @item @emph{See also}:
1462 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
1463
1464 @item @emph{Reference}:
1465 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.15.
1466 @end table
1467
1468
1469
1470 @node omp_set_nested
1471 @section @code{omp_set_nested} -- Enable/disable nested parallel regions
1472 @table @asis
1473 @item @emph{Description}:
1474 Enable or disable nested parallel regions, i.e., whether team members
1475 are allowed to create new teams.  The function takes the language-specific
1476 equivalent of @code{true} and @code{false}, where @code{true} enables
1477 dynamic adjustment of team sizes and @code{false} disables it.
1478
1479 @item @emph{C/C++}:
1480 @multitable @columnfractions .20 .80
1481 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
1482 @end multitable
1483
1484 @item @emph{Fortran}:
1485 @multitable @columnfractions .20 .80
1486 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
1487 @item                   @tab @code{logical, intent(in) :: nested}
1488 @end multitable
1489
1490 @item @emph{See also}:
1491 @ref{OMP_NESTED}, @ref{omp_get_nested}
1492
1493 @item @emph{Reference}:
1494 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.10.
1495 @end table
1496
1497
1498
1499 @node omp_set_num_threads
1500 @section @code{omp_set_num_threads} -- Set upper team size limit
1501 @table @asis
1502 @item @emph{Description}:
1503 Specifies the number of threads used by default in subsequent parallel
1504 sections, if those do not specify a @code{num_threads} clause.  The
1505 argument of @code{omp_set_num_threads} shall be a positive integer.
1506
1507 @item @emph{C/C++}:
1508 @multitable @columnfractions .20 .80
1509 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
1510 @end multitable
1511
1512 @item @emph{Fortran}:
1513 @multitable @columnfractions .20 .80
1514 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
1515 @item                   @tab @code{integer, intent(in) :: num_threads}
1516 @end multitable
1517
1518 @item @emph{See also}:
1519 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
1520
1521 @item @emph{Reference}:
1522 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.1.
1523 @end table
1524
1525
1526
1527 @node omp_set_schedule
1528 @section @code{omp_set_schedule} -- Set the runtime scheduling method
1529 @table @asis
1530 @item @emph{Description}:
1531 Sets the runtime scheduling method.  The @var{kind} argument can have the
1532 value @code{omp_sched_static}, @code{omp_sched_dynamic},
1533 @code{omp_sched_guided} or @code{omp_sched_auto}.  Except for
1534 @code{omp_sched_auto}, the chunk size is set to the value of
1535 @var{modifier} if positive, or to the default value if zero or negative.
1536 For @code{omp_sched_auto} the @var{modifier} argument is ignored.
1537
1538 @item @emph{C/C++}
1539 @multitable @columnfractions .20 .80
1540 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int modifier);}
1541 @end multitable
1542
1543 @item @emph{Fortran}:
1544 @multitable @columnfractions .20 .80
1545 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, modifier)}
1546 @item                   @tab @code{integer(kind=omp_sched_kind) kind}
1547 @item                   @tab @code{integer modifier}
1548 @end multitable
1549
1550 @item @emph{See also}:
1551 @ref{omp_get_schedule}
1552 @ref{OMP_SCHEDULE}
1553
1554 @item @emph{Reference}:
1555 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.12.
1556 @end table
1557
1558
1559
1560 @node omp_init_lock
1561 @section @code{omp_init_lock} -- Initialize simple lock
1562 @table @asis
1563 @item @emph{Description}:
1564 Initialize a simple lock.  After initialization, the lock is in
1565 an unlocked state.
1566
1567 @item @emph{C/C++}:
1568 @multitable @columnfractions .20 .80
1569 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1570 @end multitable
1571
1572 @item @emph{Fortran}:
1573 @multitable @columnfractions .20 .80
1574 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1575 @item                   @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1576 @end multitable
1577
1578 @item @emph{See also}:
1579 @ref{omp_destroy_lock}
1580
1581 @item @emph{Reference}:
1582 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
1583 @end table
1584
1585
1586
1587 @node omp_set_lock
1588 @section @code{omp_set_lock} -- Wait for and set simple lock
1589 @table @asis
1590 @item @emph{Description}:
1591 Before setting a simple lock, the lock variable must be initialized by
1592 @code{omp_init_lock}.  The calling thread is blocked until the lock
1593 is available.  If the lock is already held by the current thread,
1594 a deadlock occurs.
1595
1596 @item @emph{C/C++}:
1597 @multitable @columnfractions .20 .80
1598 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1599 @end multitable
1600
1601 @item @emph{Fortran}:
1602 @multitable @columnfractions .20 .80
1603 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1604 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1605 @end multitable
1606
1607 @item @emph{See also}:
1608 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1609
1610 @item @emph{Reference}:
1611 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
1612 @end table
1613
1614
1615
1616 @node omp_test_lock
1617 @section @code{omp_test_lock} -- Test and set simple lock if available
1618 @table @asis
1619 @item @emph{Description}:
1620 Before setting a simple lock, the lock variable must be initialized by
1621 @code{omp_init_lock}.  Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1622 does not block if the lock is not available.  This function returns
1623 @code{true} upon success, @code{false} otherwise.  Here, @code{true} and
1624 @code{false} represent their language-specific counterparts.
1625
1626 @item @emph{C/C++}:
1627 @multitable @columnfractions .20 .80
1628 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1629 @end multitable
1630
1631 @item @emph{Fortran}:
1632 @multitable @columnfractions .20 .80
1633 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1634 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1635 @end multitable
1636
1637 @item @emph{See also}:
1638 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1639
1640 @item @emph{Reference}:
1641 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
1642 @end table
1643
1644
1645
1646 @node omp_unset_lock
1647 @section @code{omp_unset_lock} -- Unset simple lock
1648 @table @asis
1649 @item @emph{Description}:
1650 A simple lock about to be unset must have been locked by @code{omp_set_lock}
1651 or @code{omp_test_lock} before.  In addition, the lock must be held by the
1652 thread calling @code{omp_unset_lock}.  Then, the lock becomes unlocked.  If one
1653 or more threads attempted to set the lock before, one of them is chosen to,
1654 again, set the lock to itself.
1655
1656 @item @emph{C/C++}:
1657 @multitable @columnfractions .20 .80
1658 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1659 @end multitable
1660
1661 @item @emph{Fortran}:
1662 @multitable @columnfractions .20 .80
1663 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1664 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1665 @end multitable
1666
1667 @item @emph{See also}:
1668 @ref{omp_set_lock}, @ref{omp_test_lock}
1669
1670 @item @emph{Reference}:
1671 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
1672 @end table
1673
1674
1675
1676 @node omp_destroy_lock
1677 @section @code{omp_destroy_lock} -- Destroy simple lock
1678 @table @asis
1679 @item @emph{Description}:
1680 Destroy a simple lock.  In order to be destroyed, a simple lock must be
1681 in the unlocked state.
1682
1683 @item @emph{C/C++}:
1684 @multitable @columnfractions .20 .80
1685 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1686 @end multitable
1687
1688 @item @emph{Fortran}:
1689 @multitable @columnfractions .20 .80
1690 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1691 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1692 @end multitable
1693
1694 @item @emph{See also}:
1695 @ref{omp_init_lock}
1696
1697 @item @emph{Reference}:
1698 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
1699 @end table
1700
1701
1702
1703 @node omp_init_nest_lock
1704 @section @code{omp_init_nest_lock} -- Initialize nested lock
1705 @table @asis
1706 @item @emph{Description}:
1707 Initialize a nested lock.  After initialization, the lock is in
1708 an unlocked state and the nesting count is set to zero.
1709
1710 @item @emph{C/C++}:
1711 @multitable @columnfractions .20 .80
1712 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1713 @end multitable
1714
1715 @item @emph{Fortran}:
1716 @multitable @columnfractions .20 .80
1717 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1718 @item                   @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1719 @end multitable
1720
1721 @item @emph{See also}:
1722 @ref{omp_destroy_nest_lock}
1723
1724 @item @emph{Reference}:
1725 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
1726 @end table
1727
1728
1729 @node omp_set_nest_lock
1730 @section @code{omp_set_nest_lock} -- Wait for and set nested lock
1731 @table @asis
1732 @item @emph{Description}:
1733 Before setting a nested lock, the lock variable must be initialized by
1734 @code{omp_init_nest_lock}.  The calling thread is blocked until the lock
1735 is available.  If the lock is already held by the current thread, the
1736 nesting count for the lock is incremented.
1737
1738 @item @emph{C/C++}:
1739 @multitable @columnfractions .20 .80
1740 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1741 @end multitable
1742
1743 @item @emph{Fortran}:
1744 @multitable @columnfractions .20 .80
1745 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1746 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1747 @end multitable
1748
1749 @item @emph{See also}:
1750 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1751
1752 @item @emph{Reference}:
1753 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
1754 @end table
1755
1756
1757
1758 @node omp_test_nest_lock
1759 @section @code{omp_test_nest_lock} -- Test and set nested lock if available
1760 @table @asis
1761 @item @emph{Description}:
1762 Before setting a nested lock, the lock variable must be initialized by
1763 @code{omp_init_nest_lock}.  Contrary to @code{omp_set_nest_lock},
1764 @code{omp_test_nest_lock} does not block if the lock is not available.
1765 If the lock is already held by the current thread, the new nesting count
1766 is returned.  Otherwise, the return value equals zero.
1767
1768 @item @emph{C/C++}:
1769 @multitable @columnfractions .20 .80
1770 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1771 @end multitable
1772
1773 @item @emph{Fortran}:
1774 @multitable @columnfractions .20 .80
1775 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1776 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1777 @end multitable
1778
1779
1780 @item @emph{See also}:
1781 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1782
1783 @item @emph{Reference}:
1784 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
1785 @end table
1786
1787
1788
1789 @node omp_unset_nest_lock
1790 @section @code{omp_unset_nest_lock} -- Unset nested lock
1791 @table @asis
1792 @item @emph{Description}:
1793 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1794 or @code{omp_test_nested_lock} before.  In addition, the lock must be held by the
1795 thread calling @code{omp_unset_nested_lock}.  If the nesting count drops to zero, the
1796 lock becomes unlocked.  If one ore more threads attempted to set the lock before,
1797 one of them is chosen to, again, set the lock to itself.
1798
1799 @item @emph{C/C++}:
1800 @multitable @columnfractions .20 .80
1801 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1802 @end multitable
1803
1804 @item @emph{Fortran}:
1805 @multitable @columnfractions .20 .80
1806 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1807 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1808 @end multitable
1809
1810 @item @emph{See also}:
1811 @ref{omp_set_nest_lock}
1812
1813 @item @emph{Reference}:
1814 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
1815 @end table
1816
1817
1818
1819 @node omp_destroy_nest_lock
1820 @section @code{omp_destroy_nest_lock} -- Destroy nested lock
1821 @table @asis
1822 @item @emph{Description}:
1823 Destroy a nested lock.  In order to be destroyed, a nested lock must be
1824 in the unlocked state and its nesting count must equal zero.
1825
1826 @item @emph{C/C++}:
1827 @multitable @columnfractions .20 .80
1828 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1829 @end multitable
1830
1831 @item @emph{Fortran}:
1832 @multitable @columnfractions .20 .80
1833 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1834 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1835 @end multitable
1836
1837 @item @emph{See also}:
1838 @ref{omp_init_lock}
1839
1840 @item @emph{Reference}:
1841 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
1842 @end table
1843
1844
1845
1846 @node omp_get_wtick
1847 @section @code{omp_get_wtick} -- Get timer precision
1848 @table @asis
1849 @item @emph{Description}:
1850 Gets the timer precision, i.e., the number of seconds between two
1851 successive clock ticks.
1852
1853 @item @emph{C/C++}:
1854 @multitable @columnfractions .20 .80
1855 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1856 @end multitable
1857
1858 @item @emph{Fortran}:
1859 @multitable @columnfractions .20 .80
1860 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1861 @end multitable
1862
1863 @item @emph{See also}:
1864 @ref{omp_get_wtime}
1865
1866 @item @emph{Reference}:
1867 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.2.
1868 @end table
1869
1870
1871
1872 @node omp_get_wtime
1873 @section @code{omp_get_wtime} -- Elapsed wall clock time
1874 @table @asis
1875 @item @emph{Description}:
1876 Elapsed wall clock time in seconds.  The time is measured per thread, no
1877 guarantee can be made that two distinct threads measure the same time.
1878 Time is measured from some "time in the past", which is an arbitrary time
1879 guaranteed not to change during the execution of the program.
1880
1881 @item @emph{C/C++}:
1882 @multitable @columnfractions .20 .80
1883 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1884 @end multitable
1885
1886 @item @emph{Fortran}:
1887 @multitable @columnfractions .20 .80
1888 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1889 @end multitable
1890
1891 @item @emph{See also}:
1892 @ref{omp_get_wtick}
1893
1894 @item @emph{Reference}:
1895 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.1.
1896 @end table
1897
1898
1899
1900 @c ---------------------------------------------------------------------
1901 @c OpenMP Environment Variables
1902 @c ---------------------------------------------------------------------
1903
1904 @node Environment Variables
1905 @chapter OpenMP Environment Variables
1906
1907 The environment variables which beginning with @env{OMP_} are defined by
1908 section 4 of the OpenMP specification in version 4.0, while those
1909 beginning with @env{GOMP_} are GNU extensions.
1910
1911 @menu
1912 * OMP_CANCELLATION::      Set whether cancellation is activated
1913 * OMP_DISPLAY_ENV::       Show OpenMP version and environment variables
1914 * OMP_DEFAULT_DEVICE::    Set the device used in target regions
1915 * OMP_DYNAMIC::           Dynamic adjustment of threads
1916 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
1917 * OMP_NESTED::            Nested parallel regions
1918 * OMP_NUM_THREADS::       Specifies the number of threads to use
1919 * OMP_PROC_BIND::         Whether theads may be moved between CPUs
1920 * OMP_PLACES::            Specifies on which CPUs the theads should be placed
1921 * OMP_STACKSIZE::         Set default thread stack size
1922 * OMP_SCHEDULE::          How threads are scheduled
1923 * OMP_THREAD_LIMIT::      Set the maximum number of threads
1924 * OMP_WAIT_POLICY::       How waiting threads are handled
1925 * GOMP_CPU_AFFINITY::     Bind threads to specific CPUs
1926 * GOMP_DEBUG::            Enable debugging output
1927 * GOMP_STACKSIZE::        Set default thread stack size
1928 * GOMP_SPINCOUNT::        Set the busy-wait spin count
1929 @end menu
1930
1931
1932 @node OMP_CANCELLATION
1933 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1934 @cindex Environment Variable
1935 @table @asis
1936 @item @emph{Description}:
1937 If set to @code{TRUE}, the cancellation is activated.  If set to @code{FALSE} or
1938 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1939
1940 @item @emph{See also}:
1941 @ref{omp_get_cancellation}
1942
1943 @item @emph{Reference}:
1944 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
1945 @end table
1946
1947
1948
1949 @node OMP_DISPLAY_ENV
1950 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1951 @cindex Environment Variable
1952 @table @asis
1953 @item @emph{Description}:
1954 If set to @code{TRUE}, the OpenMP version number and the values
1955 associated with the OpenMP environment variables are printed to @code{stderr}.
1956 If set to @code{VERBOSE}, it additionally shows the value of the environment
1957 variables which are GNU extensions.  If undefined or set to @code{FALSE},
1958 this information will not be shown.
1959
1960
1961 @item @emph{Reference}:
1962 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.12
1963 @end table
1964
1965
1966
1967 @node OMP_DEFAULT_DEVICE
1968 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1969 @cindex Environment Variable
1970 @table @asis
1971 @item @emph{Description}:
1972 Set to choose the device which is used in a @code{target} region, unless the
1973 value is overridden by @code{omp_set_default_device} or by a @code{device}
1974 clause.  The value shall be the nonnegative device number. If no device with
1975 the given device number exists, the code is executed on the host.  If unset,
1976 device number 0 will be used.
1977
1978
1979 @item @emph{See also}:
1980 @ref{omp_get_default_device}, @ref{omp_set_default_device},
1981
1982 @item @emph{Reference}:
1983 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
1984 @end table
1985
1986
1987
1988 @node OMP_DYNAMIC
1989 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1990 @cindex Environment Variable
1991 @table @asis
1992 @item @emph{Description}:
1993 Enable or disable the dynamic adjustment of the number of threads
1994 within a team.  The value of this environment variable shall be
1995 @code{TRUE} or @code{FALSE}.  If undefined, dynamic adjustment is
1996 disabled by default.
1997
1998 @item @emph{See also}:
1999 @ref{omp_set_dynamic}
2000
2001 @item @emph{Reference}:
2002 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.3
2003 @end table
2004
2005
2006
2007 @node OMP_MAX_ACTIVE_LEVELS
2008 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
2009 @cindex Environment Variable
2010 @table @asis
2011 @item @emph{Description}:
2012 Specifies the initial value for the maximum number of nested parallel
2013 regions.  The value of this variable shall be a positive integer.
2014 If undefined, the number of active levels is unlimited.
2015
2016 @item @emph{See also}:
2017 @ref{omp_set_max_active_levels}
2018
2019 @item @emph{Reference}:
2020 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.9
2021 @end table
2022
2023
2024
2025 @node OMP_NESTED
2026 @section @env{OMP_NESTED} -- Nested parallel regions
2027 @cindex Environment Variable
2028 @cindex Implementation specific setting
2029 @table @asis
2030 @item @emph{Description}:
2031 Enable or disable nested parallel regions, i.e., whether team members
2032 are allowed to create new teams.  The value of this environment variable
2033 shall be @code{TRUE} or @code{FALSE}.  If undefined, nested parallel
2034 regions are disabled by default.
2035
2036 @item @emph{See also}:
2037 @ref{omp_set_nested}
2038
2039 @item @emph{Reference}:
2040 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.6
2041 @end table
2042
2043
2044
2045 @node OMP_NUM_THREADS
2046 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
2047 @cindex Environment Variable
2048 @cindex Implementation specific setting
2049 @table @asis
2050 @item @emph{Description}:
2051 Specifies the default number of threads to use in parallel regions.  The
2052 value of this variable shall be a comma-separated list of positive integers;
2053 the value specified the number of threads to use for the corresponding nested
2054 level.  If undefined one thread per CPU is used.
2055
2056 @item @emph{See also}:
2057 @ref{omp_set_num_threads}
2058
2059 @item @emph{Reference}:
2060 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.2
2061 @end table
2062
2063
2064
2065 @node OMP_PROC_BIND
2066 @section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
2067 @cindex Environment Variable
2068 @table @asis
2069 @item @emph{Description}:
2070 Specifies whether threads may be moved between processors.  If set to
2071 @code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
2072 they may be moved.  Alternatively, a comma separated list with the
2073 values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
2074 the thread affinity policy for the corresponding nesting level.  With
2075 @code{MASTER} the worker threads are in the same place partition as the
2076 master thread.  With @code{CLOSE} those are kept close to the master thread
2077 in contiguous place partitions.  And with @code{SPREAD} a sparse distribution
2078 across the place partitions is used.
2079
2080 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
2081 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
2082
2083 @item @emph{See also}:
2084 @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
2085
2086 @item @emph{Reference}:
2087 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.4
2088 @end table
2089
2090
2091
2092 @node OMP_PLACES
2093 @section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
2094 @cindex Environment Variable
2095 @table @asis
2096 @item @emph{Description}:
2097 The thread placement can be either specified using an abstract name or by an
2098 explicit list of the places.  The abstract names @code{threads}, @code{cores}
2099 and @code{sockets} can be optionally followed by a positive number in
2100 parentheses, which denotes the how many places shall be created.  With
2101 @code{threads} each place corresponds to a single hardware thread; @code{cores}
2102 to a single core with the corresponding number of hardware threads; and with
2103 @code{sockets} the place corresponds to a single socket.  The resulting
2104 placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
2105 variable.
2106
2107 Alternatively, the placement can be specified explicitly as comma-separated
2108 list of places.  A place is specified by set of nonnegative numbers in curly
2109 braces, denoting the denoting the hardware threads.  The hardware threads
2110 belonging to a place can either be specified as comma-separated list of
2111 nonnegative thread numbers or using an interval.  Multiple places can also be
2112 either specified by a comma-separated list of places or by an interval.  To
2113 specify an interval, a colon followed by the count is placed after after
2114 the hardware thread number or the place.  Optionally, the length can be
2115 followed by a colon and the stride number -- otherwise a unit stride is
2116 assumed.  For instance, the following specifies the same places list:
2117 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
2118 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
2119
2120 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
2121 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
2122 between CPUs following no placement policy.
2123
2124 @item @emph{See also}:
2125 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
2126 @ref{OMP_DISPLAY_ENV}
2127
2128 @item @emph{Reference}:
2129 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.5
2130 @end table
2131
2132
2133
2134 @node OMP_STACKSIZE
2135 @section @env{OMP_STACKSIZE} -- Set default thread stack size
2136 @cindex Environment Variable
2137 @table @asis
2138 @item @emph{Description}:
2139 Set the default thread stack size in kilobytes, unless the number
2140 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
2141 case the size is, respectively, in bytes, kilobytes, megabytes
2142 or gigabytes.  This is different from @code{pthread_attr_setstacksize}
2143 which gets the number of bytes as an argument.  If the stack size cannot
2144 be set due to system constraints, an error is reported and the initial
2145 stack size is left unchanged.  If undefined, the stack size is system
2146 dependent.
2147
2148 @item @emph{Reference}:
2149 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.7
2150 @end table
2151
2152
2153
2154 @node OMP_SCHEDULE
2155 @section @env{OMP_SCHEDULE} -- How threads are scheduled
2156 @cindex Environment Variable
2157 @cindex Implementation specific setting
2158 @table @asis
2159 @item @emph{Description}:
2160 Allows to specify @code{schedule type} and @code{chunk size}.
2161 The value of the variable shall have the form: @code{type[,chunk]} where
2162 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
2163 The optional @code{chunk} size shall be a positive integer.  If undefined,
2164 dynamic scheduling and a chunk size of 1 is used.
2165
2166 @item @emph{See also}:
2167 @ref{omp_set_schedule}
2168
2169 @item @emph{Reference}:
2170 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Sections 2.7.1 and 4.1
2171 @end table
2172
2173
2174
2175 @node OMP_THREAD_LIMIT
2176 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
2177 @cindex Environment Variable
2178 @table @asis
2179 @item @emph{Description}:
2180 Specifies the number of threads to use for the whole program.  The
2181 value of this variable shall be a positive integer.  If undefined,
2182 the number of threads is not limited.
2183
2184 @item @emph{See also}:
2185 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
2186
2187 @item @emph{Reference}:
2188 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.10
2189 @end table
2190
2191
2192
2193 @node OMP_WAIT_POLICY
2194 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
2195 @cindex Environment Variable
2196 @table @asis
2197 @item @emph{Description}:
2198 Specifies whether waiting threads should be active or passive.  If
2199 the value is @code{PASSIVE}, waiting threads should not consume CPU
2200 power while waiting; while the value is @code{ACTIVE} specifies that
2201 they should.  If undefined, threads wait actively for a short time
2202 before waiting passively.
2203
2204 @item @emph{See also}:
2205 @ref{GOMP_SPINCOUNT}
2206
2207 @item @emph{Reference}:
2208 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.8
2209 @end table
2210
2211
2212
2213 @node GOMP_CPU_AFFINITY
2214 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
2215 @cindex Environment Variable
2216 @table @asis
2217 @item @emph{Description}:
2218 Binds threads to specific CPUs.  The variable should contain a space-separated
2219 or comma-separated list of CPUs.  This list may contain different kinds of
2220 entries: either single CPU numbers in any order, a range of CPUs (M-N)
2221 or a range with some stride (M-N:S).  CPU numbers are zero based.  For example,
2222 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
2223 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
2224 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
2225 and 14 respectively and then start assigning back from the beginning of
2226 the list.  @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
2227
2228 There is no libgomp library routine to determine whether a CPU affinity
2229 specification is in effect.  As a workaround, language-specific library
2230 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
2231 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
2232 environment variable.  A defined CPU affinity on startup cannot be changed
2233 or disabled during the runtime of the application.
2234
2235 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
2236 @env{OMP_PROC_BIND} has a higher precedence.  If neither has been set and
2237 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
2238 @code{FALSE}, the host system will handle the assignment of threads to CPUs.
2239
2240 @item @emph{See also}:
2241 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
2242 @end table
2243
2244
2245
2246 @node GOMP_DEBUG
2247 @section @env{GOMP_DEBUG} -- Enable debugging output
2248 @cindex Environment Variable
2249 @table @asis
2250 @item @emph{Description}:
2251 Enable debugging output.  The variable should be set to @code{0}
2252 (disabled, also the default if not set), or @code{1} (enabled).
2253
2254 If enabled, some debugging output will be printed during execution.
2255 This is currently not specified in more detail, and subject to change.
2256 @end table
2257
2258
2259
2260 @node GOMP_STACKSIZE
2261 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
2262 @cindex Environment Variable
2263 @cindex Implementation specific setting
2264 @table @asis
2265 @item @emph{Description}:
2266 Set the default thread stack size in kilobytes.  This is different from
2267 @code{pthread_attr_setstacksize} which gets the number of bytes as an
2268 argument.  If the stack size cannot be set due to system constraints, an
2269 error is reported and the initial stack size is left unchanged.  If undefined,
2270 the stack size is system dependent.
2271
2272 @item @emph{See also}:
2273 @ref{OMP_STACKSIZE}
2274
2275 @item @emph{Reference}:
2276 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
2277 GCC Patches Mailinglist},
2278 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
2279 GCC Patches Mailinglist}
2280 @end table
2281
2282
2283
2284 @node GOMP_SPINCOUNT
2285 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
2286 @cindex Environment Variable
2287 @cindex Implementation specific setting
2288 @table @asis
2289 @item @emph{Description}:
2290 Determines how long a threads waits actively with consuming CPU power
2291 before waiting passively without consuming CPU power.  The value may be
2292 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
2293 integer which gives the number of spins of the busy-wait loop.  The
2294 integer may optionally be followed by the following suffixes acting
2295 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
2296 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
2297 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
2298 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
2299 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
2300 If there are more OpenMP threads than available CPUs, 1000 and 100
2301 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
2302 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
2303 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
2304
2305 @item @emph{See also}:
2306 @ref{OMP_WAIT_POLICY}
2307 @end table
2308
2309
2310
2311 @c ---------------------------------------------------------------------
2312 @c The libgomp ABI
2313 @c ---------------------------------------------------------------------
2314
2315 @node The libgomp ABI
2316 @chapter The libgomp ABI
2317
2318 The following sections present notes on the external ABI as
2319 presented by libgomp.  Only maintainers should need them.
2320
2321 @menu
2322 * Implementing MASTER construct::
2323 * Implementing CRITICAL construct::
2324 * Implementing ATOMIC construct::
2325 * Implementing FLUSH construct::
2326 * Implementing BARRIER construct::
2327 * Implementing THREADPRIVATE construct::
2328 * Implementing PRIVATE clause::
2329 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
2330 * Implementing REDUCTION clause::
2331 * Implementing PARALLEL construct::
2332 * Implementing FOR construct::
2333 * Implementing ORDERED construct::
2334 * Implementing SECTIONS construct::
2335 * Implementing SINGLE construct::
2336 * Implementing OpenACC's PARALLEL construct::
2337 @end menu
2338
2339
2340 @node Implementing MASTER construct
2341 @section Implementing MASTER construct
2342
2343 @smallexample
2344 if (omp_get_thread_num () == 0)
2345   block
2346 @end smallexample
2347
2348 Alternately, we generate two copies of the parallel subfunction
2349 and only include this in the version run by the master thread.
2350 Surely this is not worthwhile though...
2351
2352
2353
2354 @node Implementing CRITICAL construct
2355 @section Implementing CRITICAL construct
2356
2357 Without a specified name,
2358
2359 @smallexample
2360   void GOMP_critical_start (void);
2361   void GOMP_critical_end (void);
2362 @end smallexample
2363
2364 so that we don't get COPY relocations from libgomp to the main
2365 application.
2366
2367 With a specified name, use omp_set_lock and omp_unset_lock with
2368 name being transformed into a variable declared like
2369
2370 @smallexample
2371   omp_lock_t gomp_critical_user_<name> __attribute__((common))
2372 @end smallexample
2373
2374 Ideally the ABI would specify that all zero is a valid unlocked
2375 state, and so we wouldn't need to initialize this at
2376 startup.
2377
2378
2379
2380 @node Implementing ATOMIC construct
2381 @section Implementing ATOMIC construct
2382
2383 The target should implement the @code{__sync} builtins.
2384
2385 Failing that we could add
2386
2387 @smallexample
2388   void GOMP_atomic_enter (void)
2389   void GOMP_atomic_exit (void)
2390 @end smallexample
2391
2392 which reuses the regular lock code, but with yet another lock
2393 object private to the library.
2394
2395
2396
2397 @node Implementing FLUSH construct
2398 @section Implementing FLUSH construct
2399
2400 Expands to the @code{__sync_synchronize} builtin.
2401
2402
2403
2404 @node Implementing BARRIER construct
2405 @section Implementing BARRIER construct
2406
2407 @smallexample
2408   void GOMP_barrier (void)
2409 @end smallexample
2410
2411
2412 @node Implementing THREADPRIVATE construct
2413 @section Implementing THREADPRIVATE construct
2414
2415 In _most_ cases we can map this directly to @code{__thread}.  Except
2416 that OMP allows constructors for C++ objects.  We can either
2417 refuse to support this (how often is it used?) or we can
2418 implement something akin to .ctors.
2419
2420 Even more ideally, this ctor feature is handled by extensions
2421 to the main pthreads library.  Failing that, we can have a set
2422 of entry points to register ctor functions to be called.
2423
2424
2425
2426 @node Implementing PRIVATE clause
2427 @section Implementing PRIVATE clause
2428
2429 In association with a PARALLEL, or within the lexical extent
2430 of a PARALLEL block, the variable becomes a local variable in
2431 the parallel subfunction.
2432
2433 In association with FOR or SECTIONS blocks, create a new
2434 automatic variable within the current function.  This preserves
2435 the semantic of new variable creation.
2436
2437
2438
2439 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
2440 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
2441
2442 This seems simple enough for PARALLEL blocks.  Create a private
2443 struct for communicating between the parent and subfunction.
2444 In the parent, copy in values for scalar and "small" structs;
2445 copy in addresses for others TREE_ADDRESSABLE types.  In the
2446 subfunction, copy the value into the local variable.
2447
2448 It is not clear what to do with bare FOR or SECTION blocks.
2449 The only thing I can figure is that we do something like:
2450
2451 @smallexample
2452 #pragma omp for firstprivate(x) lastprivate(y)
2453 for (int i = 0; i < n; ++i)
2454   body;
2455 @end smallexample
2456
2457 which becomes
2458
2459 @smallexample
2460 @{
2461   int x = x, y;
2462
2463   // for stuff
2464
2465   if (i == n)
2466     y = y;
2467 @}
2468 @end smallexample
2469
2470 where the "x=x" and "y=y" assignments actually have different
2471 uids for the two variables, i.e. not something you could write
2472 directly in C.  Presumably this only makes sense if the "outer"
2473 x and y are global variables.
2474
2475 COPYPRIVATE would work the same way, except the structure
2476 broadcast would have to happen via SINGLE machinery instead.
2477
2478
2479
2480 @node Implementing REDUCTION clause
2481 @section Implementing REDUCTION clause
2482
2483 The private struct mentioned in the previous section should have
2484 a pointer to an array of the type of the variable, indexed by the
2485 thread's @var{team_id}.  The thread stores its final value into the
2486 array, and after the barrier, the master thread iterates over the
2487 array to collect the values.
2488
2489
2490 @node Implementing PARALLEL construct
2491 @section Implementing PARALLEL construct
2492
2493 @smallexample
2494   #pragma omp parallel
2495   @{
2496     body;
2497   @}
2498 @end smallexample
2499
2500 becomes
2501
2502 @smallexample
2503   void subfunction (void *data)
2504   @{
2505     use data;
2506     body;
2507   @}
2508
2509   setup data;
2510   GOMP_parallel_start (subfunction, &data, num_threads);
2511   subfunction (&data);
2512   GOMP_parallel_end ();
2513 @end smallexample
2514
2515 @smallexample
2516   void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
2517 @end smallexample
2518
2519 The @var{FN} argument is the subfunction to be run in parallel.
2520
2521 The @var{DATA} argument is a pointer to a structure used to
2522 communicate data in and out of the subfunction, as discussed
2523 above with respect to FIRSTPRIVATE et al.
2524
2525 The @var{NUM_THREADS} argument is 1 if an IF clause is present
2526 and false, or the value of the NUM_THREADS clause, if
2527 present, or 0.
2528
2529 The function needs to create the appropriate number of
2530 threads and/or launch them from the dock.  It needs to
2531 create the team structure and assign team ids.
2532
2533 @smallexample
2534   void GOMP_parallel_end (void)
2535 @end smallexample
2536
2537 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
2538
2539
2540
2541 @node Implementing FOR construct
2542 @section Implementing FOR construct
2543
2544 @smallexample
2545   #pragma omp parallel for
2546   for (i = lb; i <= ub; i++)
2547     body;
2548 @end smallexample
2549
2550 becomes
2551
2552 @smallexample
2553   void subfunction (void *data)
2554   @{
2555     long _s0, _e0;
2556     while (GOMP_loop_static_next (&_s0, &_e0))
2557     @{
2558       long _e1 = _e0, i;
2559       for (i = _s0; i < _e1; i++)
2560         body;
2561     @}
2562     GOMP_loop_end_nowait ();
2563   @}
2564
2565   GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
2566   subfunction (NULL);
2567   GOMP_parallel_end ();
2568 @end smallexample
2569
2570 @smallexample
2571   #pragma omp for schedule(runtime)
2572   for (i = 0; i < n; i++)
2573     body;
2574 @end smallexample
2575
2576 becomes
2577
2578 @smallexample
2579   @{
2580     long i, _s0, _e0;
2581     if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
2582       do @{
2583         long _e1 = _e0;
2584         for (i = _s0, i < _e0; i++)
2585           body;
2586       @} while (GOMP_loop_runtime_next (&_s0, _&e0));
2587     GOMP_loop_end ();
2588   @}
2589 @end smallexample
2590
2591 Note that while it looks like there is trickiness to propagating
2592 a non-constant STEP, there isn't really.  We're explicitly allowed
2593 to evaluate it as many times as we want, and any variables involved
2594 should automatically be handled as PRIVATE or SHARED like any other
2595 variables.  So the expression should remain evaluable in the
2596 subfunction.  We can also pull it into a local variable if we like,
2597 but since its supposed to remain unchanged, we can also not if we like.
2598
2599 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
2600 able to get away with no work-sharing context at all, since we can
2601 simply perform the arithmetic directly in each thread to divide up
2602 the iterations.  Which would mean that we wouldn't need to call any
2603 of these routines.
2604
2605 There are separate routines for handling loops with an ORDERED
2606 clause.  Bookkeeping for that is non-trivial...
2607
2608
2609
2610 @node Implementing ORDERED construct
2611 @section Implementing ORDERED construct
2612
2613 @smallexample
2614   void GOMP_ordered_start (void)
2615   void GOMP_ordered_end (void)
2616 @end smallexample
2617
2618
2619
2620 @node Implementing SECTIONS construct
2621 @section Implementing SECTIONS construct
2622
2623 A block as
2624
2625 @smallexample
2626   #pragma omp sections
2627   @{
2628     #pragma omp section
2629     stmt1;
2630     #pragma omp section
2631     stmt2;
2632     #pragma omp section
2633     stmt3;
2634   @}
2635 @end smallexample
2636
2637 becomes
2638
2639 @smallexample
2640   for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
2641     switch (i)
2642       @{
2643       case 1:
2644         stmt1;
2645         break;
2646       case 2:
2647         stmt2;
2648         break;
2649       case 3:
2650         stmt3;
2651         break;
2652       @}
2653   GOMP_barrier ();
2654 @end smallexample
2655
2656
2657 @node Implementing SINGLE construct
2658 @section Implementing SINGLE construct
2659
2660 A block like
2661
2662 @smallexample
2663   #pragma omp single
2664   @{
2665     body;
2666   @}
2667 @end smallexample
2668
2669 becomes
2670
2671 @smallexample
2672   if (GOMP_single_start ())
2673     body;
2674   GOMP_barrier ();
2675 @end smallexample
2676
2677 while
2678
2679 @smallexample
2680   #pragma omp single copyprivate(x)
2681     body;
2682 @end smallexample
2683
2684 becomes
2685
2686 @smallexample
2687   datap = GOMP_single_copy_start ();
2688   if (datap == NULL)
2689     @{
2690       body;
2691       data.x = x;
2692       GOMP_single_copy_end (&data);
2693     @}
2694   else
2695     x = datap->x;
2696   GOMP_barrier ();
2697 @end smallexample
2698
2699
2700
2701 @node Implementing OpenACC's PARALLEL construct
2702 @section Implementing OpenACC's PARALLEL construct
2703
2704 @smallexample
2705   void GOACC_parallel ()
2706 @end smallexample
2707
2708
2709
2710 @c ---------------------------------------------------------------------
2711 @c Reporting Bugs
2712 @c ---------------------------------------------------------------------
2713
2714 @node Reporting Bugs
2715 @chapter Reporting Bugs
2716
2717 Bugs in the GNU Offloading and Multi Processing Runtime Library should
2718 be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  Please add
2719 "openacc", or "openmp", or both to the keywords field in the bug
2720 report, as appropriate.
2721
2722
2723
2724 @c ---------------------------------------------------------------------
2725 @c GNU General Public License
2726 @c ---------------------------------------------------------------------
2727
2728 @include gpl_v3.texi
2729
2730
2731
2732 @c ---------------------------------------------------------------------
2733 @c GNU Free Documentation License
2734 @c ---------------------------------------------------------------------
2735
2736 @include fdl.texi
2737
2738
2739
2740 @c ---------------------------------------------------------------------
2741 @c Funding Free Software
2742 @c ---------------------------------------------------------------------
2743
2744 @include funding.texi
2745
2746 @c ---------------------------------------------------------------------
2747 @c Index
2748 @c ---------------------------------------------------------------------
2749
2750 @node Library Index
2751 @unnumbered Library Index
2752
2753 @printindex cp
2754
2755 @bye