libgomp/libgomp.texi

   1 \input texinfo @c -*-texinfo-*-
   2
   3 @c %**start of header
   4 @setfilename libgomp.info
   5 @settitle GNU libgomp
   6 @c %**end of header
   7
   8
   9 @copying
  10 Copyright @copyright{} 2006-2014 Free Software Foundation, Inc.
  11
  12 Permission is granted to copy, distribute and/or modify this document
  13 under the terms of the GNU Free Documentation License, Version 1.3 or
  14 any later version published by the Free Software Foundation; with the
  15 Invariant Sections being ``Funding Free Software'', the Front-Cover
  16 texts being (a) (see below), and with the Back-Cover Texts being (b)
  17 (see below).  A copy of the license is included in the section entitled
  18 ``GNU Free Documentation License''.
  19
  20 (a) The FSF's Front-Cover Text is:
  21
  22      A GNU Manual
  23
  24 (b) The FSF's Back-Cover Text is:
  25
  26      You have freedom to copy and modify this GNU Manual, like GNU
  27      software.  Copies published by the Free Software Foundation raise
  28      funds for GNU development.
  29 @end copying
  30
  31 @ifinfo
  32 @dircategory GNU Libraries
  33 @direntry
  34 * libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime Library
  35 @end direntry
  36
  37 This manual documents libgomp, the GNU Offloading and Multi Processing
  38 Runtime library.  This is the GNU implementation of the OpenMP and
  39 OpenACC APIs for parallel and accelerator programming in C/C++ and
  40 Fortran.
  41
  42 Published by the Free Software Foundation
  43 51 Franklin Street, Fifth Floor
  44 Boston, MA 02110-1301 USA
  45
  46 @insertcopying
  47 @end ifinfo
  48
  49
  50 @setchapternewpage odd
  51
  52 @titlepage
  53 @title The GNU OpenACC and OpenMP Implementation
  54 @page
  55 @vskip 0pt plus 1filll
  56 @comment For the @value{version-GCC} Version*
  57 @sp 1
  58 Published by the Free Software Foundation @*
  59 51 Franklin Street, Fifth Floor@*
  60 Boston, MA 02110-1301, USA@*
  61 @sp 1
  62 @insertcopying
  63 @end titlepage
  64
  65 @summarycontents
  66 @contents
  67 @page
  68
  69
  70 @node Top
  71 @top Introduction
  72 @cindex Introduction
  73
  74 This manual documents the usage of libgomp, the GNU Offloading and
  75 Multi Processing Runtime Library.  This includes the GNU
  76 implementation of the @uref{http://www.openmp.org, OpenMP} Application
  77 Programming Interface (API) for multi-platform shared-memory parallel
  78 programming in C/C++ and Fortran, and the GNU implementation of the
  79 @uref{http://www.openacc.org/, OpenACC} Application Programming
  80 Interface (API) for offloading of code to accelerator devices in C/C++
  81 and Fortran.
  82
  83 Originally, libgomp implemented the GNU OpenMP Runtime Library.  Based
  84 on this, support for OpenACC and offloading (both OpenACC and OpenMP
  85 4's target construct) has been added later on, and the library's name
  86 changed to GNU Offloading and Multi Processing Runtime Library.
  87
  88
  89 @comment
  90 @comment  When you add a new menu item, please keep the right hand
  91 @comment  aligned to the same column.  Do not use tabs.  This provides
  92 @comment  better formatting.
  93 @comment
  94 @menu
  95 * Enabling OpenACC::                 How to enable OpenACC for your
  96                                      applications.
  97 * OpenACC Runtime Library Routines:: The OpenACC runtime application
  98                                       programming interface.
  99 * OpenACC Environment Variables::    Influencing OpenACC runtime behavior with
 100                                      environment variables.
 101 * OpenACC Library Interoperability:: OpenACC library interoperability with the
 102                                      NVIDIA CUBLAS library.
 103 * Enabling OpenMP::                  How to enable OpenMP for your
 104                                      applications.
 105 * OpenMP Runtime Library Routines: Runtime Library Routines.
 106                                      The OpenMP runtime application programming
 107                                      interface.
 108 * OpenMP Environment Variables: Environment Variables.
 109                                      Influencing OpenMP runtime behavior with
 110                                      environment variables.
 111 * The libgomp ABI::                  Notes on the external libgomp ABI.
 112 * Reporting Bugs::                   How to report bugs.
 113 * Copying::                          GNU general public license says how you
 114                                      can copy and share libgomp.
 115 * GNU Free Documentation License::   How you can copy and share this manual.
 116 * Funding::                          How to help assure continued work for free
 117                                      software.
 118 * Library Index::                    Index of this documentation.
 119 @end menu
 120
 121
 122
 123 @c ---------------------------------------------------------------------
 124 @c Enabling OpenACC
 125 @c ---------------------------------------------------------------------
 126
 127 @node Enabling OpenACC
 128 @chapter Enabling OpenACC
 129
 130 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
 131 flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
 132 @code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
 133 @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
 134 @code{!$} conditional compilation sentinels in free form and @code{c$},
 135 @code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 136 arranges for automatic linking of the OpenACC runtime library
 137 (@ref{OpenACC Runtime Library Routines}).
 138
 139 A complete description of all OpenACC directives accepted may be found in
 140 the @uref{http://www.openacc.org/, OpenMP Application Programming
 141 Interface} manual, version 2.0.
 142
 143
 144 @c ---------------------------------------------------------------------
 145 @c OpenACC Runtime Library Routines
 146 @c ---------------------------------------------------------------------
 147
 148 @node OpenACC Runtime Library Routines
 149 @chapter OpenACC Runtime Library Routines
 150
 151 The runtime routines described here are defined by section 3 of the OpenACC
 152 specifications in version 2.0.
 153 They have C linkage, and do not throw exceptions.
 154 Generally, they are available only for the host, with the exception of
 155 @code{acc_on_device}, which is available for both the host and the
 156 acceleration device.
 157
 158 @menu
 159 * acc_get_num_devices::         Get number of devices for the given device type
 160 * acc_set_device_type::
 161 * acc_get_device_type::
 162 * acc_set_device_num::
 163 * acc_get_device_num::
 164 * acc_init::
 165 * acc_shutdown::
 166 * acc_on_device::               Whether executing on a particular device
 167 * acc_malloc::
 168 * acc_free::
 169 * acc_copyin::
 170 * acc_present_or_copyin::
 171 * acc_create::
 172 * acc_present_or_create::
 173 * acc_copyout::
 174 * acc_delete::
 175 * acc_update_device::
 176 * acc_update_self::
 177 * acc_map_data::
 178 * acc_unmap_data::
 179 * acc_deviceptr::
 180 * acc_hostptr::
 181 * acc_is_present::
 182 * acc_memcpy_to_device::
 183 * acc_memcpy_from_device::
 184
 185 API routines for target platforms.
 186
 187 * acc_get_current_cuda_device::
 188 * acc_get_current_cuda_context::
 189 * acc_get_cuda_stream::
 190 * acc_set_cuda_stream::
 191 @end menu
 192
 193
 194
 195 @node acc_get_num_devices
 196 @section @code{acc_get_num_devices} -- Get number of devices for given device type
 197 @table @asis
 198 @item @emph{Description}
 199 This routine returns a value indicating the
 200 number of devices available for the given device type.  It determines
 201 the number of devices in a @emph{passive} manner.  In other words, it
 202 does not alter the state within the runtime environment aside from
 203 possibly initializing an uninitialized device.  This aspect allows
 204 the routine to be called without concern for altering the interaction
 205 with an attached accelerator device.
 206
 207 @item @emph{Reference}:
 208 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 209 3.2.1.
 210 @end table
 211
 212
 213
 214 @node acc_set_device_type
 215 @section @code{acc_set_device_type}
 216 @table @asis
 217 @item @emph{Reference}:
 218 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 219 3.2.2.
 220 @end table
 221
 222
 223
 224 @node acc_get_device_type
 225 @section @code{acc_get_device_type}
 226 @table @asis
 227 @item @emph{Reference}:
 228 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 229 3.2.3.
 230 @end table
 231
 232
 233
 234 @node acc_set_device_num
 235 @section @code{acc_set_device_num}
 236 @table @asis
 237 @item @emph{Reference}:
 238 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 239 3.2.4.
 240 @end table
 241
 242
 243
 244 @node acc_get_device_num
 245 @section @code{acc_get_device_num}
 246 @table @asis
 247 @item @emph{Reference}:
 248 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 249 3.2.5.
 250 @end table
 251
 252
 253
 254 @node acc_init
 255 @section @code{acc_init}
 256 @table @asis
 257 @item @emph{Reference}:
 258 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 259 3.2.12.
 260 @end table
 261
 262
 263
 264 @node acc_shutdown
 265 @section @code{acc_shutdown}
 266 @table @asis
 267 @item @emph{Reference}:
 268 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 269 3.2.13.
 270 @end table
 271
 272
 273
 274 @node acc_on_device
 275 @section @code{acc_on_device} -- Whether executing on a particular device
 276 @table @asis
 277 @item @emph{Description}:
 278 This routine tells the program whether it is executing on a particular
 279 device.  Based on the argument passed, GCC tries to evaluate this to a
 280 constant at compile time, but library functions are also provided, for
 281 both the host and the acceleration device.
 282
 283 @item @emph{Reference}:
 284 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 285 3.2.14.
 286 @end table
 287
 288
 289
 290 @node acc_malloc
 291 @section @code{acc_malloc}
 292 @table @asis
 293 @item @emph{Reference}:
 294 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 295 3.2.15.
 296 @end table
 297
 298
 299
 300 @node acc_free
 301 @section @code{acc_free}
 302 @table @asis
 303 @item @emph{Reference}:
 304 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 305 3.2.16.
 306 @end table
 307
 308
 309
 310 @node acc_copyin
 311 @section @code{acc_copyin}
 312 @table @asis
 313 @item @emph{Reference}:
 314 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 315 3.2.17.
 316 @end table
 317
 318
 319
 320 @node acc_present_or_copyin
 321 @section @code{acc_present_or_copyin}
 322 @table @asis
 323 @item @emph{Reference}:
 324 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 325 3.2.18.
 326 @end table
 327
 328
 329
 330 @node acc_create
 331 @section @code{acc_create}
 332 @table @asis
 333 @item @emph{Reference}:
 334 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 335 3.2.19.
 336 @end table
 337
 338
 339
 340 @node acc_present_or_create
 341 @section @code{acc_present_or_create}
 342 @table @asis
 343 @item @emph{Reference}:
 344 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 345 3.2.20.
 346 @end table
 347
 348
 349
 350 @node acc_copyout
 351 @section @code{acc_copyout}
 352 @table @asis
 353 @item @emph{Reference}:
 354 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 355 3.2.21.
 356 @end table
 357
 358
 359
 360 @node acc_delete
 361 @section @code{acc_delete}
 362 @table @asis
 363 @item @emph{Reference}:
 364 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 365 3.2.22.
 366 @end table
 367
 368
 369
 370 @node acc_update_device
 371 @section @code{acc_update_device}
 372 @table @asis
 373 @item @emph{Reference}:
 374 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 375 3.2.23.
 376 @end table
 377
 378
 379
 380 @node acc_update_self
 381 @section @code{acc_update_self}
 382 @table @asis
 383 @item @emph{Reference}:
 384 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 385 3.2.24.
 386 @end table
 387
 388
 389
 390 @node acc_map_data
 391 @section @code{acc_map_data}
 392 @table @asis
 393 @item @emph{Reference}:
 394 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 395 3.2.25.
 396 @end table
 397
 398
 399
 400 @node acc_unmap_data
 401 @section @code{acc_unmap_data}
 402 @table @asis
 403 @item @emph{Reference}:
 404 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 405 3.2.26.
 406 @end table
 407
 408
 409
 410 @node acc_deviceptr
 411 @section @code{acc_deviceptr}
 412 @table @asis
 413 @item @emph{Reference}:
 414 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 415 3.2.27.
 416 @end table
 417
 418
 419
 420 @node acc_hostptr
 421 @section @code{acc_hostptr}
 422 @table @asis
 423 @item @emph{Reference}:
 424 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 425 3.2.28.
 426 @end table
 427
 428
 429
 430 @node acc_is_present
 431 @section @code{acc_is_present}
 432 @table @asis
 433 @item @emph{Reference}:
 434 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 435 3.2.29.
 436 @end table
 437
 438
 439
 440 @node acc_memcpy_to_device
 441 @section @code{acc_memcpy_to_device}
 442 @table @asis
 443 @item @emph{Reference}:
 444 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 445 3.2.30.
 446 @end table
 447
 448
 449
 450 @node acc_memcpy_from_device
 451 @section @code{acc_memcpy_from_device}
 452 @table @asis
 453 @item @emph{Reference}:
 454 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 455 3.2.31.
 456 @end table
 457
 458
 459
 460 @node acc_get_current_cuda_device
 461 @section @code{acc_get_current_cuda_device}
 462 @table @asis
 463 @item @emph{Reference}:
 464 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 465 A.2.1.1.
 466 @end table
 467
 468
 469
 470 @node acc_get_current_cuda_context
 471 @section @code{acc_get_current_cuda_context}
 472 @table @asis
 473 @item @emph{Reference}:
 474 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 475 A.2.1.2.
 476 @end table
 477
 478
 479
 480 @node acc_get_cuda_stream
 481 @section @code{acc_get_cuda_stream}
 482 @table @asis
 483 @item @emph{Reference}:
 484 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 485 A.2.1.3.
 486 @end table
 487
 488
 489
 490 @node acc_set_cuda_stream
 491 @section @code{acc_set_cuda_stream}
 492 @table @asis
 493 @item @emph{Reference}:
 494 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 495 A.2.1.4.
 496 @end table
 497
 498
 499
 500 @c ---------------------------------------------------------------------
 501 @c OpenACC Environment Variables
 502 @c ---------------------------------------------------------------------
 503
 504 @node OpenACC Environment Variables
 505 @chapter OpenACC Environment Variables
 506
 507 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
 508 are defined by section 4 of the OpenACC specification in version 2.0.
 509 The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
 510
 511 @menu
 512 * ACC_DEVICE_TYPE::
 513 * ACC_DEVICE_NUM::
 514 * GCC_ACC_NOTIFY::
 515 @end menu
 516
 517
 518
 519 @node ACC_DEVICE_TYPE
 520 @section @code{ACC_DEVICE_TYPE}
 521 @table @asis
 522 @item @emph{Reference}:
 523 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 524 4.1.
 525 @end table
 526
 527
 528
 529 @node ACC_DEVICE_NUM
 530 @section @code{ACC_DEVICE_NUM}
 531 @table @asis
 532 @item @emph{Reference}:
 533 @uref{http://www.openacc.org/, OpenACC specification v2.0}, section
 534 4.2.
 535 @end table
 536
 537
 538
 539 @node GCC_ACC_NOTIFY
 540 @section @code{GCC_ACC_NOTIFY}
 541 @table @asis
 542 @item @emph{Description}:
 543 Print debug information pertaining to the accelerator.
 544 @end table
 545
 546
 547 @c ---------------------------------------------------------------------
 548 @c OpenACC Library Interoperability
 549 @c ---------------------------------------------------------------------
 550
 551 @node OpenACC Library Interoperability
 552 @chapter OpenACC Library Interoperability
 553
 554 @section Introduction
 555
 556 As the OpenACC library is built using the CUDA Driver API, the question has
 557 arisen on what impact does using the OpenACC library have on a program that
 558 uses the Runtime library, or a library based on the Runtime library, e.g.,
 559 CUBLAS@footnote{Seee section 2.26, "Interactions with the CUDA Driver API" in
 560 "CUDA Runtime API", Version 5.5, July 2013 and section 2.27, "VDPAU
 561 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
 562 July 2013, for additional information on library interoperability.}.
 563 This chapter will describe the use cases and what changes are
 564 required in order to use both the OpenACC library and the CUBLAS and Runtime
 565 libraries within a program.
 566
 567 @section First invocation: NVIDIA CUBLAS library API
 568
 569 In this first use case (see below), a function in the CUBLAS library is called
 570 prior to any of the functions in the OpenACC library. More specifically, the
 571 function @code{cublasCreate()}.
 572
 573 When invoked, the function will initialize the library and allocate the
 574 hardware resources on the host and the device on behalf of the caller. Once
 575 the initialization and allocation has completed, a handle is returned to the
 576 caller. The OpenACC library also requires initialization and allocation of
 577 hardware resources. Since the CUBLAS library has already allocated the
 578 hardware resources for the device, all that is left to do is to initialize
 579 the OpenACC library and acquire the hardware resources on the host.
 580
 581 Prior to calling the OpenACC function that will initialize the library and
 582 allocate the host hardware resources, one needs to acquire the device number
 583 that was allocated during the call to @code{cublasCreate()}. The invoking of the
 584 runtime library function @code{cudaGetDevice()} will accomplish this. Once
 585 acquired, the device number is passed along with the device type as
 586 parameters to the OpenACC library function @code{acc_set_device_num()}.
 587
 588 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
 589 library will be using the  context that was created during the call to
 590 @code{cublasCreate()}. In other words, both libraries will be sharing the
 591 same context.
 592
 593 @verbatim
 594     /* Create the handle */
 595     s = cublasCreate(&h);
 596     if (s != CUBLAS_STATUS_SUCCESS)
 597     {
 598         fprintf(stderr, "cublasCreate failed %d\n", s);
 599         exit(EXIT_FAILURE);
 600     }
 601
 602     /* Get the device number */
 603     e = cudaGetDevice(&dev);
 604     if (e != cudaSuccess)
 605     {
 606         fprintf(stderr, "cudaGetDevice failed %d\n", e);
 607         exit(EXIT_FAILURE);
 608     }
 609
 610     /* Initialize OpenACC library and use device 'dev' */
 611     acc_set_device_num(dev, acc_device_nvidia);
 612
 613 @end verbatim
 614 @center Use Case 1
 615
 616 @section First invocation: OpenACC library API
 617
 618 In this second use case (see below), a function in the OpenACC library is
 619 called prior to any of the functions in the CUBLAS library. More specificially,
 620 the function acc_set_device_num().
 621
 622 In the use case presented here, the function @code{acc_set_device_num()}
 623 is used to both initialize the OpenACC library and allocate the hardware
 624 resources on the host and the device. In the call to the function, the
 625 call parameters specify which device to use, i.e., 'dev', and what device
 626 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
 627 is but one method to initialize the OpenACC library and allocate the
 628 appropriate hardware resources. Other methods are available through the
 629 use of environment variables and these will be discussed in the next section.
 630
 631 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
 632 functions can be called as seen with multiple calls being made to
 633 @code{acc_copyin()}. In addition, calls can be made to functions in the
 634 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
 635 subsequent to the calls to @code{acc_copyin()}.
 636 As seen in the previous use case, a call to @code{cublasCreate()} will
 637 initialize the CUBLAS library and allocate the hardware resources on the
 638 host and the device.  However, since the device has already been allocated,
 639 @code{cublasCreate()} will only initialize the CUBLAS library and allocate
 640 the appropriate hardware resources on the host. The context that was created
 641 as part of the OpenACC initialization will be shared with the CUBLAS library,
 642 similarly to the first use case.
 643
 644 @verbatim
 645     dev = 0;
 646
 647     acc_set_device_num(dev, acc_device_nvidia);
 648
 649     /* Copy the first set to the device */
 650     d_X = acc_copyin(&h_X[0], N * sizeof (float));
 651     if (d_X == NULL)
 652     {
 653         fprintf(stderr, "copyin error h_X\n");
 654         exit(EXIT_FAILURE);
 655     }
 656
 657     /* Copy the second set to the device */
 658     d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
 659     if (d_Y == NULL)
 660     {
 661         fprintf(stderr, "copyin error h_Y1\n");
 662         exit(EXIT_FAILURE);
 663     }
 664
 665     /* Create the handle */
 666     s = cublasCreate(&h);
 667     if (s != CUBLAS_STATUS_SUCCESS)
 668     {
 669         fprintf(stderr, "cublasCreate failed %d\n", s);
 670         exit(EXIT_FAILURE);
 671     }
 672
 673     /* Perform saxpy using CUBLAS library function */
 674     s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
 675     if (s != CUBLAS_STATUS_SUCCESS)
 676     {
 677         fprintf(stderr, "cublasSaxpy failed %d\n", s);
 678         exit(EXIT_FAILURE);
 679     }
 680
 681     /* Copy the results from the device */
 682     acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
 683
 684 }
 685 @end verbatim
 686 @center Use Case 2
 687
 688 @section OpenACC library and environment variables
 689
 690 There are two environment variables associated with the OpenACC library that
 691 may be used to control the device type and device number.
 692 Namely, @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}. In the second
 693 use case, the device type and device number were specified using
 694 @code{acc_set_device_num()}. However, @env{ACC_DEVICE_TYPE} and
 695 @env{ACC_DEVICE_NUM} could have been defined and the call to
 696 @code{acc_set_device_num()} would be not be required. At the time of the
 697 call to @code{acc_copyin()}, these two environment variables would be
 698 sampled and their values used.
 699
 700 The use of the environment variables is only relevant when an OpenACC function
 701 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
 702 is called prior to a call to an OpenACC function, then a call to
 703 @code{acc_set_device_num()}, must be done@footnote{More complete information
 704 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
 705 sections 4.1 and 4.2 of the “The OpenACC
 706 Application Programming Interface”, Version 2.0, June, 2013.}.
 707
 708
 709
 710 @c ---------------------------------------------------------------------
 711 @c Enabling OpenMP
 712 @c ---------------------------------------------------------------------
 713
 714 @node Enabling OpenMP
 715 @chapter Enabling OpenMP
 716
 717 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
 718 flag @command{-fopenmp} must be specified.  This enables the OpenMP directive
 719 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
 720 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
 721 @code{!$} conditional compilation sentinels in free form and @code{c$},
 722 @code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
 723 arranges for automatic linking of the OpenMP runtime library
 724 (@ref{Runtime Library Routines}).
 725
 726 A complete description of all OpenMP directives accepted may be found in
 727 the @uref{http://www.openmp.org, OpenMP Application Program Interface} manual,
 728 version 4.0.
 729
 730
 731 @c ---------------------------------------------------------------------
 732 @c OpenMP Runtime Library Routines
 733 @c ---------------------------------------------------------------------
 734
 735 @node Runtime Library Routines
 736 @chapter OpenMP Runtime Library Routines
 737
 738 The runtime routines described here are defined by Section 3 of the OpenMP
 739 specification in version 4.0.  The routines are structured in following
 740 three parts:
 741
 742 @menu
 743 Control threads, processors and the parallel environment.  They have C
 744 linkage, and do not throw exceptions.
 745
 746 * omp_get_active_level::        Number of active parallel regions
 747 * omp_get_ancestor_thread_num:: Ancestor thread ID
 748 * omp_get_cancellation::        Whether cancellation support is enabled
 749 * omp_get_default_device::      Get the default device for target regions
 750 * omp_get_dynamic::             Dynamic teams setting
 751 * omp_get_level::               Number of parallel regions
 752 * omp_get_max_active_levels::   Maximum number of active regions
 753 * omp_get_max_threads::         Maximum number of threads of parallel region
 754 * omp_get_nested::              Nested parallel regions
 755 * omp_get_num_devices::         Number of target devices
 756 * omp_get_num_procs::           Number of processors online
 757 * omp_get_num_teams::           Number of teams
 758 * omp_get_num_threads::         Size of the active team
 759 * omp_get_proc_bind::           Whether theads may be moved between CPUs
 760 * omp_get_schedule::            Obtain the runtime scheduling method
 761 * omp_get_team_num::            Get team number
 762 * omp_get_team_size::           Number of threads in a team
 763 * omp_get_thread_limit::        Maximum number of threads
 764 * omp_get_thread_num::          Current thread ID
 765 * omp_in_parallel::             Whether a parallel region is active
 766 * omp_in_final::                Whether in final or included task region
 767 * omp_is_initial_device::       Whether executing on the host device
 768 * omp_set_default_device::      Set the default device for target regions
 769 * omp_set_dynamic::             Enable/disable dynamic teams
 770 * omp_set_max_active_levels::   Limits the number of active parallel regions
 771 * omp_set_nested::              Enable/disable nested parallel regions
 772 * omp_set_num_threads::         Set upper team size limit
 773 * omp_set_schedule::            Set the runtime scheduling method
 774
 775 Initialize, set, test, unset and destroy simple and nested locks.
 776
 777 * omp_init_lock::            Initialize simple lock
 778 * omp_set_lock::             Wait for and set simple lock
 779 * omp_test_lock::            Test and set simple lock if available
 780 * omp_unset_lock::           Unset simple lock
 781 * omp_destroy_lock::         Destroy simple lock
 782 * omp_init_nest_lock::       Initialize nested lock
 783 * omp_set_nest_lock::        Wait for and set simple lock
 784 * omp_test_nest_lock::       Test and set nested lock if available
 785 * omp_unset_nest_lock::      Unset nested lock
 786 * omp_destroy_nest_lock::    Destroy nested lock
 787
 788 Portable, thread-based, wall clock timer.
 789
 790 * omp_get_wtick::            Get timer precision.
 791 * omp_get_wtime::            Elapsed wall clock time.
 792 @end menu
 793
 794
 795
 796 @node omp_get_active_level
 797 @section @code{omp_get_active_level} -- Number of parallel regions
 798 @table @asis
 799 @item @emph{Description}:
 800 This function returns the nesting level for the active parallel blocks,
 801 which enclose the calling call.
 802
 803 @item @emph{C/C++}
 804 @multitable @columnfractions .20 .80
 805 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
 806 @end multitable
 807
 808 @item @emph{Fortran}:
 809 @multitable @columnfractions .20 .80
 810 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
 811 @end multitable
 812
 813 @item @emph{See also}:
 814 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
 815
 816 @item @emph{Reference}:
 817 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.20.
 818 @end table
 819
 820
 821
 822 @node omp_get_ancestor_thread_num
 823 @section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
 824 @table @asis
 825 @item @emph{Description}:
 826 This function returns the thread identification number for the given
 827 nesting level of the current thread.  For values of @var{level} outside
 828 zero to @code{omp_get_level} -1 is returned; if @var{level} is
 829 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
 830
 831 @item @emph{C/C++}
 832 @multitable @columnfractions .20 .80
 833 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
 834 @end multitable
 835
 836 @item @emph{Fortran}:
 837 @multitable @columnfractions .20 .80
 838 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
 839 @item                   @tab @code{integer level}
 840 @end multitable
 841
 842 @item @emph{See also}:
 843 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
 844
 845 @item @emph{Reference}:
 846 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.18.
 847 @end table
 848
 849
 850
 851 @node omp_get_cancellation
 852 @section @code{omp_get_cancellation} -- Whether cancellation support is enabled
 853 @table @asis
 854 @item @emph{Description}:
 855 This function returns @code{true} if cancellation is activated, @code{false}
 856 otherwise.  Here, @code{true} and @code{false} represent their language-specific
 857 counterparts.  Unless @env{OMP_CANCELLATION} is set true, cancellations are
 858 deactivated.
 859
 860 @item @emph{C/C++}:
 861 @multitable @columnfractions .20 .80
 862 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
 863 @end multitable
 864
 865 @item @emph{Fortran}:
 866 @multitable @columnfractions .20 .80
 867 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
 868 @end multitable
 869
 870 @item @emph{See also}:
 871 @ref{OMP_CANCELLATION}
 872
 873 @item @emph{Reference}:
 874 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.9.
 875 @end table
 876
 877
 878
 879 @node omp_get_default_device
 880 @section @code{omp_get_default_device} -- Get the default device for target regions
 881 @table @asis
 882 @item @emph{Description}:
 883 Get the default device for target regions without device clause.
 884
 885 @item @emph{C/C++}:
 886 @multitable @columnfractions .20 .80
 887 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
 888 @end multitable
 889
 890 @item @emph{Fortran}:
 891 @multitable @columnfractions .20 .80
 892 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
 893 @end multitable
 894
 895 @item @emph{See also}:
 896 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
 897
 898 @item @emph{Reference}:
 899 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.24.
 900 @end table
 901
 902
 903
 904 @node omp_get_dynamic
 905 @section @code{omp_get_dynamic} -- Dynamic teams setting
 906 @table @asis
 907 @item @emph{Description}:
 908 This function returns @code{true} if enabled, @code{false} otherwise.
 909 Here, @code{true} and @code{false} represent their language-specific
 910 counterparts.
 911
 912 The dynamic team setting may be initialized at startup by the
 913 @env{OMP_DYNAMIC} environment variable or at runtime using
 914 @code{omp_set_dynamic}.  If undefined, dynamic adjustment is
 915 disabled by default.
 916
 917 @item @emph{C/C++}:
 918 @multitable @columnfractions .20 .80
 919 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
 920 @end multitable
 921
 922 @item @emph{Fortran}:
 923 @multitable @columnfractions .20 .80
 924 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
 925 @end multitable
 926
 927 @item @emph{See also}:
 928 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
 929
 930 @item @emph{Reference}:
 931 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.8.
 932 @end table
 933
 934
 935
 936 @node omp_get_level
 937 @section @code{omp_get_level} -- Obtain the current nesting level
 938 @table @asis
 939 @item @emph{Description}:
 940 This function returns the nesting level for the parallel blocks,
 941 which enclose the calling call.
 942
 943 @item @emph{C/C++}
 944 @multitable @columnfractions .20 .80
 945 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
 946 @end multitable
 947
 948 @item @emph{Fortran}:
 949 @multitable @columnfractions .20 .80
 950 @item @emph{Interface}: @tab @code{integer function omp_level()}
 951 @end multitable
 952
 953 @item @emph{See also}:
 954 @ref{omp_get_active_level}
 955
 956 @item @emph{Reference}:
 957 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.17.
 958 @end table
 959
 960
 961
 962 @node omp_get_max_active_levels
 963 @section @code{omp_get_max_active_levels} -- Maximum number of active regions
 964 @table @asis
 965 @item @emph{Description}:
 966 This function obtains the maximum allowed number of nested, active parallel regions.
 967
 968 @item @emph{C/C++}
 969 @multitable @columnfractions .20 .80
 970 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
 971 @end multitable
 972
 973 @item @emph{Fortran}:
 974 @multitable @columnfractions .20 .80
 975 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
 976 @end multitable
 977
 978 @item @emph{See also}:
 979 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
 980
 981 @item @emph{Reference}:
 982 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.16.
 983 @end table
 984
 985
 986
 987 @node omp_get_max_threads
 988 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
 989 @table @asis
 990 @item @emph{Description}:
 991 Return the maximum number of threads used for the current parallel region
 992 that does not use the clause @code{num_threads}.
 993
 994 @item @emph{C/C++}:
 995 @multitable @columnfractions .20 .80
 996 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
 997 @end multitable
 998
 999 @item @emph{Fortran}:
1000 @multitable @columnfractions .20 .80
1001 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
1002 @end multitable
1003
1004 @item @emph{See also}:
1005 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
1006
1007 @item @emph{Reference}:
1008 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.3.
1009 @end table
1010
1011
1012
1013 @node omp_get_nested
1014 @section @code{omp_get_nested} -- Nested parallel regions
1015 @table @asis
1016 @item @emph{Description}:
1017 This function returns @code{true} if nested parallel regions are
1018 enabled, @code{false} otherwise.  Here, @code{true} and @code{false}
1019 represent their language-specific counterparts.
1020
1021 Nested parallel regions may be initialized at startup by the
1022 @env{OMP_NESTED} environment variable or at runtime using
1023 @code{omp_set_nested}.  If undefined, nested parallel regions are
1024 disabled by default.
1025
1026 @item @emph{C/C++}:
1027 @multitable @columnfractions .20 .80
1028 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
1029 @end multitable
1030
1031 @item @emph{Fortran}:
1032 @multitable @columnfractions .20 .80
1033 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
1034 @end multitable
1035
1036 @item @emph{See also}:
1037 @ref{omp_set_nested}, @ref{OMP_NESTED}
1038
1039 @item @emph{Reference}:
1040 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.11.
1041 @end table
1042
1043
1044
1045 @node omp_get_num_devices
1046 @section @code{omp_get_num_devices} -- Number of target devices
1047 @table @asis
1048 @item @emph{Description}:
1049 Returns the number of target devices.
1050
1051 @item @emph{C/C++}:
1052 @multitable @columnfractions .20 .80
1053 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
1054 @end multitable
1055
1056 @item @emph{Fortran}:
1057 @multitable @columnfractions .20 .80
1058 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
1059 @end multitable
1060
1061 @item @emph{Reference}:
1062 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.25.
1063 @end table
1064
1065
1066
1067 @node omp_get_num_procs
1068 @section @code{omp_get_num_procs} -- Number of processors online
1069 @table @asis
1070 @item @emph{Description}:
1071 Returns the number of processors online on that device.
1072
1073 @item @emph{C/C++}:
1074 @multitable @columnfractions .20 .80
1075 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
1076 @end multitable
1077
1078 @item @emph{Fortran}:
1079 @multitable @columnfractions .20 .80
1080 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
1081 @end multitable
1082
1083 @item @emph{Reference}:
1084 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.5.
1085 @end table
1086
1087
1088
1089 @node omp_get_num_teams
1090 @section @code{omp_get_num_teams} -- Number of teams
1091 @table @asis
1092 @item @emph{Description}:
1093 Returns the number of teams in the current team region.
1094
1095 @item @emph{C/C++}:
1096 @multitable @columnfractions .20 .80
1097 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
1098 @end multitable
1099
1100 @item @emph{Fortran}:
1101 @multitable @columnfractions .20 .80
1102 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
1103 @end multitable
1104
1105 @item @emph{Reference}:
1106 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.26.
1107 @end table
1108
1109
1110
1111 @node omp_get_num_threads
1112 @section @code{omp_get_num_threads} -- Size of the active team
1113 @table @asis
1114 @item @emph{Description}:
1115 Returns the number of threads in the current team.  In a sequential section of
1116 the program @code{omp_get_num_threads} returns 1.
1117
1118 The default team size may be initialized at startup by the
1119 @env{OMP_NUM_THREADS} environment variable.  At runtime, the size
1120 of the current team may be set either by the @code{NUM_THREADS}
1121 clause or by @code{omp_set_num_threads}.  If none of the above were
1122 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
1123 one thread per CPU online is used.
1124
1125 @item @emph{C/C++}:
1126 @multitable @columnfractions .20 .80
1127 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
1128 @end multitable
1129
1130 @item @emph{Fortran}:
1131 @multitable @columnfractions .20 .80
1132 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
1133 @end multitable
1134
1135 @item @emph{See also}:
1136 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
1137
1138 @item @emph{Reference}:
1139 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.2.
1140 @end table
1141
1142
1143
1144 @node omp_get_proc_bind
1145 @section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
1146 @table @asis
1147 @item @emph{Description}:
1148 This functions returns the currently active thread affinity policy, which is
1149 set via @env{OMP_PROC_BIND}.  Possible values are @code{omp_proc_bind_false},
1150 @code{omp_proc_bind_true}, @code{omp_proc_bind_master},
1151 @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
1152
1153 @item @emph{C/C++}:
1154 @multitable @columnfractions .20 .80
1155 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1156 @end multitable
1157
1158 @item @emph{Fortran}:
1159 @multitable @columnfractions .20 .80
1160 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1161 @end multitable
1162
1163 @item @emph{See also}:
1164 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1165
1166 @item @emph{Reference}:
1167 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.22.
1168 @end table
1169
1170
1171
1172 @node omp_get_schedule
1173 @section @code{omp_get_schedule} -- Obtain the runtime scheduling method
1174 @table @asis
1175 @item @emph{Description}:
1176 Obtain the runtime scheduling method.  The @var{kind} argument will be
1177 set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
1178 @code{omp_sched_guided} or @code{omp_sched_auto}.  The second argument,
1179 @var{modifier}, is set to the chunk size.
1180
1181 @item @emph{C/C++}
1182 @multitable @columnfractions .20 .80
1183 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *modifier);}
1184 @end multitable
1185
1186 @item @emph{Fortran}:
1187 @multitable @columnfractions .20 .80
1188 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, modifier)}
1189 @item                   @tab @code{integer(kind=omp_sched_kind) kind}
1190 @item                   @tab @code{integer modifier}
1191 @end multitable
1192
1193 @item @emph{See also}:
1194 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
1195
1196 @item @emph{Reference}:
1197 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.13.
1198 @end table
1199
1200
1201
1202 @node omp_get_team_num
1203 @section @code{omp_get_team_num} -- Get team number
1204 @table @asis
1205 @item @emph{Description}:
1206 Returns the team number of the calling thread.
1207
1208 @item @emph{C/C++}:
1209 @multitable @columnfractions .20 .80
1210 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1211 @end multitable
1212
1213 @item @emph{Fortran}:
1214 @multitable @columnfractions .20 .80
1215 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1216 @end multitable
1217
1218 @item @emph{Reference}:
1219 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.27.
1220 @end table
1221
1222
1223
1224 @node omp_get_team_size
1225 @section @code{omp_get_team_size} -- Number of threads in a team
1226 @table @asis
1227 @item @emph{Description}:
1228 This function returns the number of threads in a thread team to which
1229 either the current thread or its ancestor belongs.  For values of @var{level}
1230 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
1231 1 is returned, and for @code{omp_get_level}, the result is identical
1232 to @code{omp_get_num_threads}.
1233
1234 @item @emph{C/C++}:
1235 @multitable @columnfractions .20 .80
1236 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
1237 @end multitable
1238
1239 @item @emph{Fortran}:
1240 @multitable @columnfractions .20 .80
1241 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1242 @item                   @tab @code{integer level}
1243 @end multitable
1244
1245 @item @emph{See also}:
1246 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1247
1248 @item @emph{Reference}:
1249 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.19.
1250 @end table
1251
1252
1253
1254 @node omp_get_thread_limit
1255 @section @code{omp_get_thread_limit} -- Maximum number of threads
1256 @table @asis
1257 @item @emph{Description}:
1258 Return the maximum number of threads of the program.
1259
1260 @item @emph{C/C++}:
1261 @multitable @columnfractions .20 .80
1262 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
1263 @end multitable
1264
1265 @item @emph{Fortran}:
1266 @multitable @columnfractions .20 .80
1267 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
1268 @end multitable
1269
1270 @item @emph{See also}:
1271 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
1272
1273 @item @emph{Reference}:
1274 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.14.
1275 @end table
1276
1277
1278
1279 @node omp_get_thread_num
1280 @section @code{omp_get_thread_num} -- Current thread ID
1281 @table @asis
1282 @item @emph{Description}:
1283 Returns a unique thread identification number within the current team.
1284 In a sequential parts of the program, @code{omp_get_thread_num}
1285 always returns 0.  In parallel regions the return value varies
1286 from 0 to @code{omp_get_num_threads}-1 inclusive.  The return
1287 value of the master thread of a team is always 0.
1288
1289 @item @emph{C/C++}:
1290 @multitable @columnfractions .20 .80
1291 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
1292 @end multitable
1293
1294 @item @emph{Fortran}:
1295 @multitable @columnfractions .20 .80
1296 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
1297 @end multitable
1298
1299 @item @emph{See also}:
1300 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
1301
1302 @item @emph{Reference}:
1303 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.4.
1304 @end table
1305
1306
1307
1308 @node omp_in_parallel
1309 @section @code{omp_in_parallel} -- Whether a parallel region is active
1310 @table @asis
1311 @item @emph{Description}:
1312 This function returns @code{true} if currently running in parallel,
1313 @code{false} otherwise.  Here, @code{true} and @code{false} represent
1314 their language-specific counterparts.
1315
1316 @item @emph{C/C++}:
1317 @multitable @columnfractions .20 .80
1318 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
1319 @end multitable
1320
1321 @item @emph{Fortran}:
1322 @multitable @columnfractions .20 .80
1323 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
1324 @end multitable
1325
1326 @item @emph{Reference}:
1327 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.6.
1328 @end table
1329
1330
1331 @node omp_in_final
1332 @section @code{omp_in_final} -- Whether in final or included task region
1333 @table @asis
1334 @item @emph{Description}:
1335 This function returns @code{true} if currently running in a final
1336 or included task region, @code{false} otherwise.  Here, @code{true}
1337 and @code{false} represent their language-specific counterparts.
1338
1339 @item @emph{C/C++}:
1340 @multitable @columnfractions .20 .80
1341 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1342 @end multitable
1343
1344 @item @emph{Fortran}:
1345 @multitable @columnfractions .20 .80
1346 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
1347 @end multitable
1348
1349 @item @emph{Reference}:
1350 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.21.
1351 @end table
1352
1353
1354
1355 @node omp_is_initial_device
1356 @section @code{omp_is_initial_device} -- Whether executing on the host device
1357 @table @asis
1358 @item @emph{Description}:
1359 This function returns @code{true} if currently running on the host device,
1360 @code{false} otherwise.  Here, @code{true} and @code{false} represent
1361 their language-specific counterparts.
1362
1363 @item @emph{C/C++}:
1364 @multitable @columnfractions .20 .80
1365 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
1366 @end multitable
1367
1368 @item @emph{Fortran}:
1369 @multitable @columnfractions .20 .80
1370 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
1371 @end multitable
1372
1373 @item @emph{Reference}:
1374 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.28.
1375 @end table
1376
1377
1378
1379 @node omp_set_default_device
1380 @section @code{omp_set_default_device} -- Set the default device for target regions
1381 @table @asis
1382 @item @emph{Description}:
1383 Set the default device for target regions without device clause.  The argument
1384 shall be a nonnegative device number.
1385
1386 @item @emph{C/C++}:
1387 @multitable @columnfractions .20 .80
1388 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1389 @end multitable
1390
1391 @item @emph{Fortran}:
1392 @multitable @columnfractions .20 .80
1393 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1394 @item                   @tab @code{integer device_num}
1395 @end multitable
1396
1397 @item @emph{See also}:
1398 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1399
1400 @item @emph{Reference}:
1401 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.23.
1402 @end table
1403
1404
1405
1406 @node omp_set_dynamic
1407 @section @code{omp_set_dynamic} -- Enable/disable dynamic teams
1408 @table @asis
1409 @item @emph{Description}:
1410 Enable or disable the dynamic adjustment of the number of threads
1411 within a team.  The function takes the language-specific equivalent
1412 of @code{true} and @code{false}, where @code{true} enables dynamic
1413 adjustment of team sizes and @code{false} disables it.
1414
1415 @item @emph{C/C++}:
1416 @multitable @columnfractions .20 .80
1417 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
1418 @end multitable
1419
1420 @item @emph{Fortran}:
1421 @multitable @columnfractions .20 .80
1422 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
1423 @item                   @tab @code{logical, intent(in) :: dynamic_threads}
1424 @end multitable
1425
1426 @item @emph{See also}:
1427 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
1428
1429 @item @emph{Reference}:
1430 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.7.
1431 @end table
1432
1433
1434
1435 @node omp_set_max_active_levels
1436 @section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
1437 @table @asis
1438 @item @emph{Description}:
1439 This function limits the maximum allowed number of nested, active
1440 parallel regions.
1441
1442 @item @emph{C/C++}
1443 @multitable @columnfractions .20 .80
1444 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
1445 @end multitable
1446
1447 @item @emph{Fortran}:
1448 @multitable @columnfractions .20 .80
1449 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1450 @item                   @tab @code{integer max_levels}
1451 @end multitable
1452
1453 @item @emph{See also}:
1454 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
1455
1456 @item @emph{Reference}:
1457 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.15.
1458 @end table
1459
1460
1461
1462 @node omp_set_nested
1463 @section @code{omp_set_nested} -- Enable/disable nested parallel regions
1464 @table @asis
1465 @item @emph{Description}:
1466 Enable or disable nested parallel regions, i.e., whether team members
1467 are allowed to create new teams.  The function takes the language-specific
1468 equivalent of @code{true} and @code{false}, where @code{true} enables
1469 dynamic adjustment of team sizes and @code{false} disables it.
1470
1471 @item @emph{C/C++}:
1472 @multitable @columnfractions .20 .80
1473 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
1474 @end multitable
1475
1476 @item @emph{Fortran}:
1477 @multitable @columnfractions .20 .80
1478 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
1479 @item                   @tab @code{logical, intent(in) :: nested}
1480 @end multitable
1481
1482 @item @emph{See also}:
1483 @ref{OMP_NESTED}, @ref{omp_get_nested}
1484
1485 @item @emph{Reference}:
1486 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.10.
1487 @end table
1488
1489
1490
1491 @node omp_set_num_threads
1492 @section @code{omp_set_num_threads} -- Set upper team size limit
1493 @table @asis
1494 @item @emph{Description}:
1495 Specifies the number of threads used by default in subsequent parallel
1496 sections, if those do not specify a @code{num_threads} clause.  The
1497 argument of @code{omp_set_num_threads} shall be a positive integer.
1498
1499 @item @emph{C/C++}:
1500 @multitable @columnfractions .20 .80
1501 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
1502 @end multitable
1503
1504 @item @emph{Fortran}:
1505 @multitable @columnfractions .20 .80
1506 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
1507 @item                   @tab @code{integer, intent(in) :: num_threads}
1508 @end multitable
1509
1510 @item @emph{See also}:
1511 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
1512
1513 @item @emph{Reference}:
1514 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.1.
1515 @end table
1516
1517
1518
1519 @node omp_set_schedule
1520 @section @code{omp_set_schedule} -- Set the runtime scheduling method
1521 @table @asis
1522 @item @emph{Description}:
1523 Sets the runtime scheduling method.  The @var{kind} argument can have the
1524 value @code{omp_sched_static}, @code{omp_sched_dynamic},
1525 @code{omp_sched_guided} or @code{omp_sched_auto}.  Except for
1526 @code{omp_sched_auto}, the chunk size is set to the value of
1527 @var{modifier} if positive, or to the default value if zero or negative.
1528 For @code{omp_sched_auto} the @var{modifier} argument is ignored.
1529
1530 @item @emph{C/C++}
1531 @multitable @columnfractions .20 .80
1532 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int modifier);}
1533 @end multitable
1534
1535 @item @emph{Fortran}:
1536 @multitable @columnfractions .20 .80
1537 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, modifier)}
1538 @item                   @tab @code{integer(kind=omp_sched_kind) kind}
1539 @item                   @tab @code{integer modifier}
1540 @end multitable
1541
1542 @item @emph{See also}:
1543 @ref{omp_get_schedule}
1544 @ref{OMP_SCHEDULE}
1545
1546 @item @emph{Reference}:
1547 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.12.
1548 @end table
1549
1550
1551
1552 @node omp_init_lock
1553 @section @code{omp_init_lock} -- Initialize simple lock
1554 @table @asis
1555 @item @emph{Description}:
1556 Initialize a simple lock.  After initialization, the lock is in
1557 an unlocked state.
1558
1559 @item @emph{C/C++}:
1560 @multitable @columnfractions .20 .80
1561 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1562 @end multitable
1563
1564 @item @emph{Fortran}:
1565 @multitable @columnfractions .20 .80
1566 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1567 @item                   @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1568 @end multitable
1569
1570 @item @emph{See also}:
1571 @ref{omp_destroy_lock}
1572
1573 @item @emph{Reference}:
1574 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
1575 @end table
1576
1577
1578
1579 @node omp_set_lock
1580 @section @code{omp_set_lock} -- Wait for and set simple lock
1581 @table @asis
1582 @item @emph{Description}:
1583 Before setting a simple lock, the lock variable must be initialized by
1584 @code{omp_init_lock}.  The calling thread is blocked until the lock
1585 is available.  If the lock is already held by the current thread,
1586 a deadlock occurs.
1587
1588 @item @emph{C/C++}:
1589 @multitable @columnfractions .20 .80
1590 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1591 @end multitable
1592
1593 @item @emph{Fortran}:
1594 @multitable @columnfractions .20 .80
1595 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1596 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1597 @end multitable
1598
1599 @item @emph{See also}:
1600 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1601
1602 @item @emph{Reference}:
1603 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
1604 @end table
1605
1606
1607
1608 @node omp_test_lock
1609 @section @code{omp_test_lock} -- Test and set simple lock if available
1610 @table @asis
1611 @item @emph{Description}:
1612 Before setting a simple lock, the lock variable must be initialized by
1613 @code{omp_init_lock}.  Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1614 does not block if the lock is not available.  This function returns
1615 @code{true} upon success, @code{false} otherwise.  Here, @code{true} and
1616 @code{false} represent their language-specific counterparts.
1617
1618 @item @emph{C/C++}:
1619 @multitable @columnfractions .20 .80
1620 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1621 @end multitable
1622
1623 @item @emph{Fortran}:
1624 @multitable @columnfractions .20 .80
1625 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1626 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1627 @end multitable
1628
1629 @item @emph{See also}:
1630 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1631
1632 @item @emph{Reference}:
1633 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
1634 @end table
1635
1636
1637
1638 @node omp_unset_lock
1639 @section @code{omp_unset_lock} -- Unset simple lock
1640 @table @asis
1641 @item @emph{Description}:
1642 A simple lock about to be unset must have been locked by @code{omp_set_lock}
1643 or @code{omp_test_lock} before.  In addition, the lock must be held by the
1644 thread calling @code{omp_unset_lock}.  Then, the lock becomes unlocked.  If one
1645 or more threads attempted to set the lock before, one of them is chosen to,
1646 again, set the lock to itself.
1647
1648 @item @emph{C/C++}:
1649 @multitable @columnfractions .20 .80
1650 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1651 @end multitable
1652
1653 @item @emph{Fortran}:
1654 @multitable @columnfractions .20 .80
1655 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1656 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1657 @end multitable
1658
1659 @item @emph{See also}:
1660 @ref{omp_set_lock}, @ref{omp_test_lock}
1661
1662 @item @emph{Reference}:
1663 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
1664 @end table
1665
1666
1667
1668 @node omp_destroy_lock
1669 @section @code{omp_destroy_lock} -- Destroy simple lock
1670 @table @asis
1671 @item @emph{Description}:
1672 Destroy a simple lock.  In order to be destroyed, a simple lock must be
1673 in the unlocked state.
1674
1675 @item @emph{C/C++}:
1676 @multitable @columnfractions .20 .80
1677 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1678 @end multitable
1679
1680 @item @emph{Fortran}:
1681 @multitable @columnfractions .20 .80
1682 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1683 @item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1684 @end multitable
1685
1686 @item @emph{See also}:
1687 @ref{omp_init_lock}
1688
1689 @item @emph{Reference}:
1690 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
1691 @end table
1692
1693
1694
1695 @node omp_init_nest_lock
1696 @section @code{omp_init_nest_lock} -- Initialize nested lock
1697 @table @asis
1698 @item @emph{Description}:
1699 Initialize a nested lock.  After initialization, the lock is in
1700 an unlocked state and the nesting count is set to zero.
1701
1702 @item @emph{C/C++}:
1703 @multitable @columnfractions .20 .80
1704 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1705 @end multitable
1706
1707 @item @emph{Fortran}:
1708 @multitable @columnfractions .20 .80
1709 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1710 @item                   @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1711 @end multitable
1712
1713 @item @emph{See also}:
1714 @ref{omp_destroy_nest_lock}
1715
1716 @item @emph{Reference}:
1717 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
1718 @end table
1719
1720
1721 @node omp_set_nest_lock
1722 @section @code{omp_set_nest_lock} -- Wait for and set nested lock
1723 @table @asis
1724 @item @emph{Description}:
1725 Before setting a nested lock, the lock variable must be initialized by
1726 @code{omp_init_nest_lock}.  The calling thread is blocked until the lock
1727 is available.  If the lock is already held by the current thread, the
1728 nesting count for the lock is incremented.
1729
1730 @item @emph{C/C++}:
1731 @multitable @columnfractions .20 .80
1732 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1733 @end multitable
1734
1735 @item @emph{Fortran}:
1736 @multitable @columnfractions .20 .80
1737 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1738 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1739 @end multitable
1740
1741 @item @emph{See also}:
1742 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1743
1744 @item @emph{Reference}:
1745 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
1746 @end table
1747
1748
1749
1750 @node omp_test_nest_lock
1751 @section @code{omp_test_nest_lock} -- Test and set nested lock if available
1752 @table @asis
1753 @item @emph{Description}:
1754 Before setting a nested lock, the lock variable must be initialized by
1755 @code{omp_init_nest_lock}.  Contrary to @code{omp_set_nest_lock},
1756 @code{omp_test_nest_lock} does not block if the lock is not available.
1757 If the lock is already held by the current thread, the new nesting count
1758 is returned.  Otherwise, the return value equals zero.
1759
1760 @item @emph{C/C++}:
1761 @multitable @columnfractions .20 .80
1762 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1763 @end multitable
1764
1765 @item @emph{Fortran}:
1766 @multitable @columnfractions .20 .80
1767 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1768 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1769 @end multitable
1770
1771
1772 @item @emph{See also}:
1773 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1774
1775 @item @emph{Reference}:
1776 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
1777 @end table
1778
1779
1780
1781 @node omp_unset_nest_lock
1782 @section @code{omp_unset_nest_lock} -- Unset nested lock
1783 @table @asis
1784 @item @emph{Description}:
1785 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1786 or @code{omp_test_nested_lock} before.  In addition, the lock must be held by the
1787 thread calling @code{omp_unset_nested_lock}.  If the nesting count drops to zero, the
1788 lock becomes unlocked.  If one ore more threads attempted to set the lock before,
1789 one of them is chosen to, again, set the lock to itself.
1790
1791 @item @emph{C/C++}:
1792 @multitable @columnfractions .20 .80
1793 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1794 @end multitable
1795
1796 @item @emph{Fortran}:
1797 @multitable @columnfractions .20 .80
1798 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1799 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1800 @end multitable
1801
1802 @item @emph{See also}:
1803 @ref{omp_set_nest_lock}
1804
1805 @item @emph{Reference}:
1806 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
1807 @end table
1808
1809
1810
1811 @node omp_destroy_nest_lock
1812 @section @code{omp_destroy_nest_lock} -- Destroy nested lock
1813 @table @asis
1814 @item @emph{Description}:
1815 Destroy a nested lock.  In order to be destroyed, a nested lock must be
1816 in the unlocked state and its nesting count must equal zero.
1817
1818 @item @emph{C/C++}:
1819 @multitable @columnfractions .20 .80
1820 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1821 @end multitable
1822
1823 @item @emph{Fortran}:
1824 @multitable @columnfractions .20 .80
1825 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1826 @item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1827 @end multitable
1828
1829 @item @emph{See also}:
1830 @ref{omp_init_lock}
1831
1832 @item @emph{Reference}:
1833 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
1834 @end table
1835
1836
1837
1838 @node omp_get_wtick
1839 @section @code{omp_get_wtick} -- Get timer precision
1840 @table @asis
1841 @item @emph{Description}:
1842 Gets the timer precision, i.e., the number of seconds between two
1843 successive clock ticks.
1844
1845 @item @emph{C/C++}:
1846 @multitable @columnfractions .20 .80
1847 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1848 @end multitable
1849
1850 @item @emph{Fortran}:
1851 @multitable @columnfractions .20 .80
1852 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1853 @end multitable
1854
1855 @item @emph{See also}:
1856 @ref{omp_get_wtime}
1857
1858 @item @emph{Reference}:
1859 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.2.
1860 @end table
1861
1862
1863
1864 @node omp_get_wtime
1865 @section @code{omp_get_wtime} -- Elapsed wall clock time
1866 @table @asis
1867 @item @emph{Description}:
1868 Elapsed wall clock time in seconds.  The time is measured per thread, no
1869 guarantee can be made that two distinct threads measure the same time.
1870 Time is measured from some "time in the past", which is an arbitrary time
1871 guaranteed not to change during the execution of the program.
1872
1873 @item @emph{C/C++}:
1874 @multitable @columnfractions .20 .80
1875 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1876 @end multitable
1877
1878 @item @emph{Fortran}:
1879 @multitable @columnfractions .20 .80
1880 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1881 @end multitable
1882
1883 @item @emph{See also}:
1884 @ref{omp_get_wtick}
1885
1886 @item @emph{Reference}:
1887 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.1.
1888 @end table
1889
1890
1891
1892 @c ---------------------------------------------------------------------
1893 @c OpenMP Environment Variables
1894 @c ---------------------------------------------------------------------
1895
1896 @node Environment Variables
1897 @chapter OpenMP Environment Variables
1898
1899 The environment variables which beginning with @env{OMP_} are defined by
1900 section 4 of the OpenMP specification in version 4.0, while those
1901 beginning with @env{GOMP_} are GNU extensions.
1902
1903 @menu
1904 * OMP_CANCELLATION::      Set whether cancellation is activated
1905 * OMP_DISPLAY_ENV::       Show OpenMP version and environment variables
1906 * OMP_DEFAULT_DEVICE::    Set the device used in target regions
1907 * OMP_DYNAMIC::           Dynamic adjustment of threads
1908 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
1909 * OMP_NESTED::            Nested parallel regions
1910 * OMP_NUM_THREADS::       Specifies the number of threads to use
1911 * OMP_PROC_BIND::         Whether theads may be moved between CPUs
1912 * OMP_PLACES::            Specifies on which CPUs the theads should be placed
1913 * OMP_STACKSIZE::         Set default thread stack size
1914 * OMP_SCHEDULE::          How threads are scheduled
1915 * OMP_THREAD_LIMIT::      Set the maximum number of threads
1916 * OMP_WAIT_POLICY::       How waiting threads are handled
1917 * GOMP_CPU_AFFINITY::     Bind threads to specific CPUs
1918 * GOMP_DEBUG::            Enable debugging output
1919 * GOMP_STACKSIZE::        Set default thread stack size
1920 * GOMP_SPINCOUNT::        Set the busy-wait spin count
1921 @end menu
1922
1923
1924 @node OMP_CANCELLATION
1925 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1926 @cindex Environment Variable
1927 @table @asis
1928 @item @emph{Description}:
1929 If set to @code{TRUE}, the cancellation is activated.  If set to @code{FALSE} or
1930 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1931
1932 @item @emph{See also}:
1933 @ref{omp_get_cancellation}
1934
1935 @item @emph{Reference}:
1936 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
1937 @end table
1938
1939
1940
1941 @node OMP_DISPLAY_ENV
1942 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1943 @cindex Environment Variable
1944 @table @asis
1945 @item @emph{Description}:
1946 If set to @code{TRUE}, the OpenMP version number and the values
1947 associated with the OpenMP environment variables are printed to @code{stderr}.
1948 If set to @code{VERBOSE}, it additionally shows the value of the environment
1949 variables which are GNU extensions.  If undefined or set to @code{FALSE},
1950 this information will not be shown.
1951
1952
1953 @item @emph{Reference}:
1954 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.12
1955 @end table
1956
1957
1958
1959 @node OMP_DEFAULT_DEVICE
1960 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1961 @cindex Environment Variable
1962 @table @asis
1963 @item @emph{Description}:
1964 Set to choose the device which is used in a @code{target} region, unless the
1965 value is overridden by @code{omp_set_default_device} or by a @code{device}
1966 clause.  The value shall be the nonnegative device number. If no device with
1967 the given device number exists, the code is executed on the host.  If unset,
1968 device number 0 will be used.
1969
1970
1971 @item @emph{See also}:
1972 @ref{omp_get_default_device}, @ref{omp_set_default_device},
1973
1974 @item @emph{Reference}:
1975 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
1976 @end table
1977
1978
1979
1980 @node OMP_DYNAMIC
1981 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1982 @cindex Environment Variable
1983 @table @asis
1984 @item @emph{Description}:
1985 Enable or disable the dynamic adjustment of the number of threads
1986 within a team.  The value of this environment variable shall be
1987 @code{TRUE} or @code{FALSE}.  If undefined, dynamic adjustment is
1988 disabled by default.
1989
1990 @item @emph{See also}:
1991 @ref{omp_set_dynamic}
1992
1993 @item @emph{Reference}:
1994 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.3
1995 @end table
1996
1997
1998
1999 @node OMP_MAX_ACTIVE_LEVELS
2000 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
2001 @cindex Environment Variable
2002 @table @asis
2003 @item @emph{Description}:
2004 Specifies the initial value for the maximum number of nested parallel
2005 regions.  The value of this variable shall be a positive integer.
2006 If undefined, the number of active levels is unlimited.
2007
2008 @item @emph{See also}:
2009 @ref{omp_set_max_active_levels}
2010
2011 @item @emph{Reference}:
2012 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.9
2013 @end table
2014
2015
2016
2017 @node OMP_NESTED
2018 @section @env{OMP_NESTED} -- Nested parallel regions
2019 @cindex Environment Variable
2020 @cindex Implementation specific setting
2021 @table @asis
2022 @item @emph{Description}:
2023 Enable or disable nested parallel regions, i.e., whether team members
2024 are allowed to create new teams.  The value of this environment variable
2025 shall be @code{TRUE} or @code{FALSE}.  If undefined, nested parallel
2026 regions are disabled by default.
2027
2028 @item @emph{See also}:
2029 @ref{omp_set_nested}
2030
2031 @item @emph{Reference}:
2032 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.6
2033 @end table
2034
2035
2036
2037 @node OMP_NUM_THREADS
2038 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
2039 @cindex Environment Variable
2040 @cindex Implementation specific setting
2041 @table @asis
2042 @item @emph{Description}:
2043 Specifies the default number of threads to use in parallel regions.  The
2044 value of this variable shall be a comma-separated list of positive integers;
2045 the value specified the number of threads to use for the corresponding nested
2046 level.  If undefined one thread per CPU is used.
2047
2048 @item @emph{See also}:
2049 @ref{omp_set_num_threads}
2050
2051 @item @emph{Reference}:
2052 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.2
2053 @end table
2054
2055
2056
2057 @node OMP_PROC_BIND
2058 @section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
2059 @cindex Environment Variable
2060 @table @asis
2061 @item @emph{Description}:
2062 Specifies whether threads may be moved between processors.  If set to
2063 @code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
2064 they may be moved.  Alternatively, a comma separated list with the
2065 values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
2066 the thread affinity policy for the corresponding nesting level.  With
2067 @code{MASTER} the worker threads are in the same place partition as the
2068 master thread.  With @code{CLOSE} those are kept close to the master thread
2069 in contiguous place partitions.  And with @code{SPREAD} a sparse distribution
2070 across the place partitions is used.
2071
2072 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
2073 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
2074
2075 @item @emph{See also}:
2076 @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
2077
2078 @item @emph{Reference}:
2079 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.4
2080 @end table
2081
2082
2083
2084 @node OMP_PLACES
2085 @section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
2086 @cindex Environment Variable
2087 @table @asis
2088 @item @emph{Description}:
2089 The thread placement can be either specified using an abstract name or by an
2090 explicit list of the places.  The abstract names @code{threads}, @code{cores}
2091 and @code{sockets} can be optionally followed by a positive number in
2092 parentheses, which denotes the how many places shall be created.  With
2093 @code{threads} each place corresponds to a single hardware thread; @code{cores}
2094 to a single core with the corresponding number of hardware threads; and with
2095 @code{sockets} the place corresponds to a single socket.  The resulting
2096 placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
2097 variable.
2098
2099 Alternatively, the placement can be specified explicitly as comma-separated
2100 list of places.  A place is specified by set of nonnegative numbers in curly
2101 braces, denoting the denoting the hardware threads.  The hardware threads
2102 belonging to a place can either be specified as comma-separated list of
2103 nonnegative thread numbers or using an interval.  Multiple places can also be
2104 either specified by a comma-separated list of places or by an interval.  To
2105 specify an interval, a colon followed by the count is placed after after
2106 the hardware thread number or the place.  Optionally, the length can be
2107 followed by a colon and the stride number -- otherwise a unit stride is
2108 assumed.  For instance, the following specifies the same places list:
2109 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
2110 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
2111
2112 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
2113 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
2114 between CPUs following no placement policy.
2115
2116 @item @emph{See also}:
2117 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
2118 @ref{OMP_DISPLAY_ENV}
2119
2120 @item @emph{Reference}:
2121 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.5
2122 @end table
2123
2124
2125
2126 @node OMP_STACKSIZE
2127 @section @env{OMP_STACKSIZE} -- Set default thread stack size
2128 @cindex Environment Variable
2129 @table @asis
2130 @item @emph{Description}:
2131 Set the default thread stack size in kilobytes, unless the number
2132 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
2133 case the size is, respectively, in bytes, kilobytes, megabytes
2134 or gigabytes.  This is different from @code{pthread_attr_setstacksize}
2135 which gets the number of bytes as an argument.  If the stack size cannot
2136 be set due to system constraints, an error is reported and the initial
2137 stack size is left unchanged.  If undefined, the stack size is system
2138 dependent.
2139
2140 @item @emph{Reference}:
2141 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.7
2142 @end table
2143
2144
2145
2146 @node OMP_SCHEDULE
2147 @section @env{OMP_SCHEDULE} -- How threads are scheduled
2148 @cindex Environment Variable
2149 @cindex Implementation specific setting
2150 @table @asis
2151 @item @emph{Description}:
2152 Allows to specify @code{schedule type} and @code{chunk size}.
2153 The value of the variable shall have the form: @code{type[,chunk]} where
2154 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
2155 The optional @code{chunk} size shall be a positive integer.  If undefined,
2156 dynamic scheduling and a chunk size of 1 is used.
2157
2158 @item @emph{See also}:
2159 @ref{omp_set_schedule}
2160
2161 @item @emph{Reference}:
2162 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Sections 2.7.1 and 4.1
2163 @end table
2164
2165
2166
2167 @node OMP_THREAD_LIMIT
2168 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
2169 @cindex Environment Variable
2170 @table @asis
2171 @item @emph{Description}:
2172 Specifies the number of threads to use for the whole program.  The
2173 value of this variable shall be a positive integer.  If undefined,
2174 the number of threads is not limited.
2175
2176 @item @emph{See also}:
2177 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
2178
2179 @item @emph{Reference}:
2180 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.10
2181 @end table
2182
2183
2184
2185 @node OMP_WAIT_POLICY
2186 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
2187 @cindex Environment Variable
2188 @table @asis
2189 @item @emph{Description}:
2190 Specifies whether waiting threads should be active or passive.  If
2191 the value is @code{PASSIVE}, waiting threads should not consume CPU
2192 power while waiting; while the value is @code{ACTIVE} specifies that
2193 they should.  If undefined, threads wait actively for a short time
2194 before waiting passively.
2195
2196 @item @emph{See also}:
2197 @ref{GOMP_SPINCOUNT}
2198
2199 @item @emph{Reference}:
2200 @uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.8
2201 @end table
2202
2203
2204
2205 @node GOMP_CPU_AFFINITY
2206 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
2207 @cindex Environment Variable
2208 @table @asis
2209 @item @emph{Description}:
2210 Binds threads to specific CPUs.  The variable should contain a space-separated
2211 or comma-separated list of CPUs.  This list may contain different kinds of
2212 entries: either single CPU numbers in any order, a range of CPUs (M-N)
2213 or a range with some stride (M-N:S).  CPU numbers are zero based.  For example,
2214 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
2215 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
2216 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
2217 and 14 respectively and then start assigning back from the beginning of
2218 the list.  @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
2219
2220 There is no GNU OpenMP library routine to determine whether a CPU affinity
2221 specification is in effect.  As a workaround, language-specific library
2222 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
2223 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
2224 environment variable.  A defined CPU affinity on startup cannot be changed
2225 or disabled during the runtime of the application.
2226
2227 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
2228 @env{OMP_PROC_BIND} has a higher precedence.  If neither has been set and
2229 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
2230 @code{FALSE}, the host system will handle the assignment of threads to CPUs.
2231
2232 @item @emph{See also}:
2233 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
2234 @end table
2235
2236
2237
2238 @node GOMP_DEBUG
2239 @section @env{GOMP_DEBUG} -- Enable debugging output
2240 @cindex Environment Variable
2241 @table @asis
2242 @item @emph{Description}:
2243 Enable debugging output.  The variable should be set to @code{0}
2244 (disabled, also the default if not set), or @code{1} (enabled).
2245
2246 If enabled, some debugging output will be printed during execution.
2247 This is currently not specified in more detail, and subject to change.
2248 @end table
2249
2250
2251
2252 @node GOMP_STACKSIZE
2253 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
2254 @cindex Environment Variable
2255 @cindex Implementation specific setting
2256 @table @asis
2257 @item @emph{Description}:
2258 Set the default thread stack size in kilobytes.  This is different from
2259 @code{pthread_attr_setstacksize} which gets the number of bytes as an
2260 argument.  If the stack size cannot be set due to system constraints, an
2261 error is reported and the initial stack size is left unchanged.  If undefined,
2262 the stack size is system dependent.
2263
2264 @item @emph{See also}:
2265 @ref{OMP_STACKSIZE}
2266
2267 @item @emph{Reference}:
2268 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
2269 GCC Patches Mailinglist},
2270 @uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
2271 GCC Patches Mailinglist}
2272 @end table
2273
2274
2275
2276 @node GOMP_SPINCOUNT
2277 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
2278 @cindex Environment Variable
2279 @cindex Implementation specific setting
2280 @table @asis
2281 @item @emph{Description}:
2282 Determines how long a threads waits actively with consuming CPU power
2283 before waiting passively without consuming CPU power.  The value may be
2284 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
2285 integer which gives the number of spins of the busy-wait loop.  The
2286 integer may optionally be followed by the following suffixes acting
2287 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
2288 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
2289 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
2290 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
2291 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
2292 If there are more OpenMP threads than available CPUs, 1000 and 100
2293 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
2294 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
2295 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
2296
2297 @item @emph{See also}:
2298 @ref{OMP_WAIT_POLICY}
2299 @end table
2300
2301
2302
2303 @c ---------------------------------------------------------------------
2304 @c The libgomp ABI
2305 @c ---------------------------------------------------------------------
2306
2307 @node The libgomp ABI
2308 @chapter The libgomp ABI
2309
2310 The following sections present notes on the external ABI as
2311 presented by libgomp.  Only maintainers should need them.
2312
2313 @menu
2314 * Implementing MASTER construct::
2315 * Implementing CRITICAL construct::
2316 * Implementing ATOMIC construct::
2317 * Implementing FLUSH construct::
2318 * Implementing BARRIER construct::
2319 * Implementing THREADPRIVATE construct::
2320 * Implementing PRIVATE clause::
2321 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
2322 * Implementing REDUCTION clause::
2323 * Implementing PARALLEL construct::
2324 * Implementing FOR construct::
2325 * Implementing ORDERED construct::
2326 * Implementing SECTIONS construct::
2327 * Implementing SINGLE construct::
2328 * Implementing OpenACC's PARALLEL construct::
2329 @end menu
2330
2331
2332 @node Implementing MASTER construct
2333 @section Implementing MASTER construct
2334
2335 @smallexample
2336 if (omp_get_thread_num () == 0)
2337   block
2338 @end smallexample
2339
2340 Alternately, we generate two copies of the parallel subfunction
2341 and only include this in the version run by the master thread.
2342 Surely this is not worthwhile though...
2343
2344
2345
2346 @node Implementing CRITICAL construct
2347 @section Implementing CRITICAL construct
2348
2349 Without a specified name,
2350
2351 @smallexample
2352   void GOMP_critical_start (void);
2353   void GOMP_critical_end (void);
2354 @end smallexample
2355
2356 so that we don't get COPY relocations from libgomp to the main
2357 application.
2358
2359 With a specified name, use omp_set_lock and omp_unset_lock with
2360 name being transformed into a variable declared like
2361
2362 @smallexample
2363   omp_lock_t gomp_critical_user_<name> __attribute__((common))
2364 @end smallexample
2365
2366 Ideally the ABI would specify that all zero is a valid unlocked
2367 state, and so we wouldn't need to initialize this at
2368 startup.
2369
2370
2371
2372 @node Implementing ATOMIC construct
2373 @section Implementing ATOMIC construct
2374
2375 The target should implement the @code{__sync} builtins.
2376
2377 Failing that we could add
2378
2379 @smallexample
2380   void GOMP_atomic_enter (void)
2381   void GOMP_atomic_exit (void)
2382 @end smallexample
2383
2384 which reuses the regular lock code, but with yet another lock
2385 object private to the library.
2386
2387
2388
2389 @node Implementing FLUSH construct
2390 @section Implementing FLUSH construct
2391
2392 Expands to the @code{__sync_synchronize} builtin.
2393
2394
2395
2396 @node Implementing BARRIER construct
2397 @section Implementing BARRIER construct
2398
2399 @smallexample
2400   void GOMP_barrier (void)
2401 @end smallexample
2402
2403
2404 @node Implementing THREADPRIVATE construct
2405 @section Implementing THREADPRIVATE construct
2406
2407 In _most_ cases we can map this directly to @code{__thread}.  Except
2408 that OMP allows constructors for C++ objects.  We can either
2409 refuse to support this (how often is it used?) or we can
2410 implement something akin to .ctors.
2411
2412 Even more ideally, this ctor feature is handled by extensions
2413 to the main pthreads library.  Failing that, we can have a set
2414 of entry points to register ctor functions to be called.
2415
2416
2417
2418 @node Implementing PRIVATE clause
2419 @section Implementing PRIVATE clause
2420
2421 In association with a PARALLEL, or within the lexical extent
2422 of a PARALLEL block, the variable becomes a local variable in
2423 the parallel subfunction.
2424
2425 In association with FOR or SECTIONS blocks, create a new
2426 automatic variable within the current function.  This preserves
2427 the semantic of new variable creation.
2428
2429
2430
2431 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
2432 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
2433
2434 This seems simple enough for PARALLEL blocks.  Create a private
2435 struct for communicating between the parent and subfunction.
2436 In the parent, copy in values for scalar and "small" structs;
2437 copy in addresses for others TREE_ADDRESSABLE types.  In the
2438 subfunction, copy the value into the local variable.
2439
2440 It is not clear what to do with bare FOR or SECTION blocks.
2441 The only thing I can figure is that we do something like:
2442
2443 @smallexample
2444 #pragma omp for firstprivate(x) lastprivate(y)
2445 for (int i = 0; i < n; ++i)
2446   body;
2447 @end smallexample
2448
2449 which becomes
2450
2451 @smallexample
2452 @{
2453   int x = x, y;
2454
2455   // for stuff
2456
2457   if (i == n)
2458     y = y;
2459 @}
2460 @end smallexample
2461
2462 where the "x=x" and "y=y" assignments actually have different
2463 uids for the two variables, i.e. not something you could write
2464 directly in C.  Presumably this only makes sense if the "outer"
2465 x and y are global variables.
2466
2467 COPYPRIVATE would work the same way, except the structure
2468 broadcast would have to happen via SINGLE machinery instead.
2469
2470
2471
2472 @node Implementing REDUCTION clause
2473 @section Implementing REDUCTION clause
2474
2475 The private struct mentioned in the previous section should have
2476 a pointer to an array of the type of the variable, indexed by the
2477 thread's @var{team_id}.  The thread stores its final value into the
2478 array, and after the barrier, the master thread iterates over the
2479 array to collect the values.
2480
2481
2482 @node Implementing PARALLEL construct
2483 @section Implementing PARALLEL construct
2484
2485 @smallexample
2486   #pragma omp parallel
2487   @{
2488     body;
2489   @}
2490 @end smallexample
2491
2492 becomes
2493
2494 @smallexample
2495   void subfunction (void *data)
2496   @{
2497     use data;
2498     body;
2499   @}
2500
2501   setup data;
2502   GOMP_parallel_start (subfunction, &data, num_threads);
2503   subfunction (&data);
2504   GOMP_parallel_end ();
2505 @end smallexample
2506
2507 @smallexample
2508   void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
2509 @end smallexample
2510
2511 The @var{FN} argument is the subfunction to be run in parallel.
2512
2513 The @var{DATA} argument is a pointer to a structure used to
2514 communicate data in and out of the subfunction, as discussed
2515 above with respect to FIRSTPRIVATE et al.
2516
2517 The @var{NUM_THREADS} argument is 1 if an IF clause is present
2518 and false, or the value of the NUM_THREADS clause, if
2519 present, or 0.
2520
2521 The function needs to create the appropriate number of
2522 threads and/or launch them from the dock.  It needs to
2523 create the team structure and assign team ids.
2524
2525 @smallexample
2526   void GOMP_parallel_end (void)
2527 @end smallexample
2528
2529 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
2530
2531
2532
2533 @node Implementing FOR construct
2534 @section Implementing FOR construct
2535
2536 @smallexample
2537   #pragma omp parallel for
2538   for (i = lb; i <= ub; i++)
2539     body;
2540 @end smallexample
2541
2542 becomes
2543
2544 @smallexample
2545   void subfunction (void *data)
2546   @{
2547     long _s0, _e0;
2548     while (GOMP_loop_static_next (&_s0, &_e0))
2549     @{
2550       long _e1 = _e0, i;
2551       for (i = _s0; i < _e1; i++)
2552         body;
2553     @}
2554     GOMP_loop_end_nowait ();
2555   @}
2556
2557   GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
2558   subfunction (NULL);
2559   GOMP_parallel_end ();
2560 @end smallexample
2561
2562 @smallexample
2563   #pragma omp for schedule(runtime)
2564   for (i = 0; i < n; i++)
2565     body;
2566 @end smallexample
2567
2568 becomes
2569
2570 @smallexample
2571   @{
2572     long i, _s0, _e0;
2573     if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
2574       do @{
2575         long _e1 = _e0;
2576         for (i = _s0, i < _e0; i++)
2577           body;
2578       @} while (GOMP_loop_runtime_next (&_s0, _&e0));
2579     GOMP_loop_end ();
2580   @}
2581 @end smallexample
2582
2583 Note that while it looks like there is trickiness to propagating
2584 a non-constant STEP, there isn't really.  We're explicitly allowed
2585 to evaluate it as many times as we want, and any variables involved
2586 should automatically be handled as PRIVATE or SHARED like any other
2587 variables.  So the expression should remain evaluable in the
2588 subfunction.  We can also pull it into a local variable if we like,
2589 but since its supposed to remain unchanged, we can also not if we like.
2590
2591 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
2592 able to get away with no work-sharing context at all, since we can
2593 simply perform the arithmetic directly in each thread to divide up
2594 the iterations.  Which would mean that we wouldn't need to call any
2595 of these routines.
2596
2597 There are separate routines for handling loops with an ORDERED
2598 clause.  Bookkeeping for that is non-trivial...
2599
2600
2601
2602 @node Implementing ORDERED construct
2603 @section Implementing ORDERED construct
2604
2605 @smallexample
2606   void GOMP_ordered_start (void)
2607   void GOMP_ordered_end (void)
2608 @end smallexample
2609
2610
2611
2612 @node Implementing SECTIONS construct
2613 @section Implementing SECTIONS construct
2614
2615 A block as
2616
2617 @smallexample
2618   #pragma omp sections
2619   @{
2620     #pragma omp section
2621     stmt1;
2622     #pragma omp section
2623     stmt2;
2624     #pragma omp section
2625     stmt3;
2626   @}
2627 @end smallexample
2628
2629 becomes
2630
2631 @smallexample
2632   for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
2633     switch (i)
2634       @{
2635       case 1:
2636         stmt1;
2637         break;
2638       case 2:
2639         stmt2;
2640         break;
2641       case 3:
2642         stmt3;
2643         break;
2644       @}
2645   GOMP_barrier ();
2646 @end smallexample
2647
2648
2649 @node Implementing SINGLE construct
2650 @section Implementing SINGLE construct
2651
2652 A block like
2653
2654 @smallexample
2655   #pragma omp single
2656   @{
2657     body;
2658   @}
2659 @end smallexample
2660
2661 becomes
2662
2663 @smallexample
2664   if (GOMP_single_start ())
2665     body;
2666   GOMP_barrier ();
2667 @end smallexample
2668
2669 while
2670
2671 @smallexample
2672   #pragma omp single copyprivate(x)
2673     body;
2674 @end smallexample
2675
2676 becomes
2677
2678 @smallexample
2679   datap = GOMP_single_copy_start ();
2680   if (datap == NULL)
2681     @{
2682       body;
2683       data.x = x;
2684       GOMP_single_copy_end (&data);
2685     @}
2686   else
2687     x = datap->x;
2688   GOMP_barrier ();
2689 @end smallexample
2690
2691
2692
2693 @node Implementing OpenACC's PARALLEL construct
2694 @section Implementing OpenACC's PARALLEL construct
2695
2696 @smallexample
2697   void GOACC_parallel ()
2698 @end smallexample
2699
2700
2701
2702 @c ---------------------------------------------------------------------
2703 @c Reporting Bugs
2704 @c ---------------------------------------------------------------------
2705
2706 @node Reporting Bugs
2707 @chapter Reporting Bugs
2708
2709 Bugs in the GNU OpenACC or OpenMP implementation should be reported via
2710 @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  As appropriate, please
2711 add "openacc", or "openmp", or both to the keywords field in the bug
2712 report.
2713
2714
2715
2716 @c ---------------------------------------------------------------------
2717 @c GNU General Public License
2718 @c ---------------------------------------------------------------------
2719
2720 @include gpl_v3.texi
2721
2722
2723
2724 @c ---------------------------------------------------------------------
2725 @c GNU Free Documentation License
2726 @c ---------------------------------------------------------------------
2727
2728 @include fdl.texi
2729
2730
2731
2732 @c ---------------------------------------------------------------------
2733 @c Funding Free Software
2734 @c ---------------------------------------------------------------------
2735
2736 @include funding.texi
2737
2738 @c ---------------------------------------------------------------------
2739 @c Index
2740 @c ---------------------------------------------------------------------
2741
2742 @node Library Index
2743 @unnumbered Library Index
2744
2745 @printindex cp
2746
2747 @bye