libitm/libitm.texi

   1 \input texinfo @c -*-texinfo-*-
   2
   3 @c %**start of header
   4 @setfilename libitm.info
   5 @settitle GNU libitm
   6 @c %**end of header
   7
   8
   9 @copying
  10 Copyright @copyright{} 2011-2023 Free Software Foundation, Inc.
  11
  12 Permission is granted to copy, distribute and/or modify this document
  13 under the terms of the GNU Free Documentation License, Version 1.2 or
  14 any later version published by the Free Software Foundation; with no
  15 Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  16 A copy of the license is included in the section entitled ``GNU
  17 Free Documentation License''.
  18 @end copying
  19
  20 @ifinfo
  21 @dircategory GNU Libraries
  22 @direntry
  23 * libitm: (libitm).                    GNU Transactional Memory Library
  24 @end direntry
  25
  26 This manual documents the GNU Transactional Memory Library.
  27
  28 @insertcopying
  29 @end ifinfo
  30
  31
  32 @setchapternewpage odd
  33
  34 @titlepage
  35 @title The GNU Transactional Memory Library
  36 @page
  37 @vskip 0pt plus 1filll
  38 @comment For the @value{version-GCC} Version*
  39 @sp 1
  40 @insertcopying
  41 @end titlepage
  42
  43 @summarycontents
  44 @contents
  45 @page
  46
  47
  48 @node Top
  49 @top Introduction
  50 @cindex Introduction
  51
  52 This manual documents the usage and internals of libitm, the GNU Transactional
  53 Memory Library. It provides transaction support for accesses to a process'
  54 memory, enabling easy-to-use synchronization of accesses to shared memory by
  55 several threads.
  56
  57
  58 @comment
  59 @comment  When you add a new menu item, please keep the right hand
  60 @comment  aligned to the same column.  Do not use tabs.  This provides
  61 @comment  better formatting.
  62 @comment
  63 @menu
  64 * Enabling libitm::            How to enable libitm for your applications.
  65 * C/C++ Language Constructs for TM::
  66                                Notes on the language-level interface supported
  67                                by gcc.
  68 * The libitm ABI::             Notes on the external ABI provided by libitm.
  69 * Internals::                  Notes on libitm's internal synchronization.
  70 * GNU Free Documentation License::
  71                                How you can copy and share this manual.
  72 * Library Index::              Index of this documentation.
  73 @end menu
  74
  75
  76 @c ---------------------------------------------------------------------
  77 @c Enabling libitm
  78 @c ---------------------------------------------------------------------
  79
  80 @node Enabling libitm
  81 @chapter Enabling libitm
  82
  83 To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm}
  84 must be specified. This enables TM language-level constructs such as
  85 transaction statements (e.g., @code{__transaction_atomic}, @pxref{C/C++
  86 Language Constructs for TM} for details).
  87
  88 @c ---------------------------------------------------------------------
  89 @c C/C++ Language Constructs for TM
  90 @c ---------------------------------------------------------------------
  91
  92 @node C/C++ Language Constructs for TM
  93 @chapter C/C++ Language Constructs for TM
  94
  95 Transactions are supported in C++ and C in the form of transaction statements,
  96 transaction expressions, and function transactions. In the following example,
  97 both @code{a} and @code{b} will be read and the difference will be written to
  98 @code{c}, all atomically and isolated from other transactions:
  99
 100 @example
 101 __transaction_atomic @{ c = a - b; @}
 102 @end example
 103
 104 Therefore, another thread can use the following code to concurrently update
 105 @code{b} without ever causing @code{c} to hold a negative value (and without
 106 having to use other synchronization constructs such as locks or C++11
 107 atomics):
 108
 109 @example
 110 __transaction_atomic @{ if (a > b) b++; @}
 111 @end example
 112
 113 GCC follows the @uref{https://sites.google.com/site/tmforcplusplus/, Draft
 114 Specification of Transactional Language Constructs for C++ (v1.1)} in its
 115 implementation of transactions.
 116
 117 The precise semantics of transactions are defined in terms of the C++11/C11
 118 memory model (see the specification). Roughly, transactions provide
 119 synchronization guarantees that are similar to what would be guaranteed when
 120 using a single global lock as a guard for all transactions. Note that like
 121 other synchronization constructs in C/C++, transactions rely on a
 122 data-race-free program (e.g., a nontransactional write that is concurrent
 123 with a transactional read to the same memory location is a data race).
 124
 125 @c ---------------------------------------------------------------------
 126 @c The libitm ABI
 127 @c ---------------------------------------------------------------------
 128
 129 @node The libitm ABI
 130 @chapter The libitm ABI
 131
 132 The ABI provided by libitm is basically equal to the Linux variant of Intel's
 133 current TM ABI specification document (Revision 1.1, May 6 2009) but with the
 134 differences listed in this chapter. It would be good if these changes would
 135 eventually be merged into a future version of this specification. To ease
 136 look-up, the following subsections mirror the structure of this specification.
 137
 138 @section [No changes] Objectives
 139 @section [No changes] Non-objectives
 140
 141 @section Library design principles
 142 @subsection [No changes] Calling conventions
 143 @subsection [No changes] TM library algorithms
 144 @subsection [No changes] Optimized load and store routines
 145 @subsection [No changes] Aligned load and store routines
 146
 147 @subsection Data logging functions
 148
 149 The memory locations accessed with transactional loads and stores and the
 150 memory locations whose values are logged must not overlap. This required
 151 separation only extends to the scope of the execution of one transaction
 152 including all the executions of all nested transactions.
 153
 154 The compiler must be consistent (within the scope of a single transaction)
 155 about which memory locations are shared and which are not shared with other
 156 threads (i.e., data must be accessed either transactionally or
 157 nontransactionally). Otherwise, non-write-through TM algorithms would not work.
 158
 159 For memory locations on the stack, this requirement extends to only the
 160 lifetime of the stack frame that the memory location belongs to (or the
 161 lifetime of the transaction, whichever is shorter).  Thus, memory that is
 162 reused for several stack frames could be target of both data logging and
 163 transactional accesses; however, this is harmless because these stack frames'
 164 lifetimes will end before the transaction finishes.
 165
 166 @subsection [No changes] Scatter/gather calls
 167 @subsection [No changes] Serial and irrevocable mode
 168 @subsection [No changes] Transaction descriptor
 169 @subsection Store allocation
 170
 171 There is no @code{getTransaction} function.
 172
 173 @subsection [No changes] Naming conventions
 174
 175 @subsection Function pointer encryption
 176
 177 Currently, this is not implemented.
 178
 179
 180 @section Types and macros list
 181
 182 @code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a
 183 transaction}.
 184 @code{_ITM_srcLocation} is not used.
 185
 186
 187 @section Function list
 188
 189 @subsection Initialization and finalization functions
 190 These functions are not part of the ABI.
 191
 192 @subsection [No changes] Version checking
 193 @subsection [No changes] Error reporting
 194 @subsection [No changes] inTransaction call
 195
 196 @subsection State manipulation functions
 197 There is no @code{getTransaction} function. Transaction identifiers for
 198 nested transactions will be ordered but not necessarily sequential (i.e., for
 199 a nested transaction's identifier @var{IN} and its enclosing transaction's
 200 identifier @var{IE}, it is guaranteed that @math{IN >= IE}).
 201
 202 @subsection [No changes] Source locations
 203
 204 @subsection Starting a transaction
 205
 206 @subsubsection Transaction code properties
 207
 208 @anchor{txn-code-properties}
 209 The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}.
 210 Iff it is set, vector register save/restore is not necessary for any target
 211 machine.
 212
 213 The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating
 214 point register save/restore is not necessary for any target machine.
 215
 216 @code{undoLogCode} is not supported and a fatal runtime error will be raised
 217 if this bit is set. It is not properly defined in the ABI why barriers
 218 other than undo logging are not present; Are they not necessary (e.g., a
 219 transaction operating purely on thread-local data) or have they been omitted by
 220 the compiler because it thinks that some kind of global synchronization
 221 (e.g., serial mode) might perform better? The specification suggests that the
 222 latter might be the case, but the former seems to be more useful.
 223
 224 The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic
 225 scope?
 226
 227 @code{hasNoRetry} is not supported. If this bit is not set, but
 228 @code{hasNoAbort} is set, the library can assume that transaction
 229 rollback will not be requested.
 230
 231 It would be useful if the absence of externally-triggered rollbacks would be
 232 reported for the dynamic scope as well, not just for the lexical scope
 233 (@code{hasNoAbort}). Without this, a library cannot exploit this together
 234 with flat nesting.
 235
 236 @code{exceptionBlock} is not supported because exception blocks are not used.
 237
 238 @subsubsection [No changes] Windows exception state
 239 @subsubsection [No changes] Other machine state
 240
 241 @subsubsection [No changes] Results from beginTransaction
 242
 243 @subsection Aborting a transaction
 244
 245 @code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction}
 246 is supported but the abort reasons @code{exceptionBlockAbort},
 247 @code{TMConflict}, and @code{userRetry} are not supported. There are no
 248 exception blocks in general, so the related cases also do not have to be
 249 considered. To encode @code{__transaction_cancel [[outer]]}, compilers must
 250 set the new @code{outerAbort} bit (@code{0x10}) additionally to the
 251 @code{userAbort} bit in the abort reason.
 252
 253 @subsection Committing a transaction
 254
 255 The exception handling (EH) scheme is different. The Intel ABI requires the
 256 @code{_ITM_tryCommitTransaction} function that will return even when the
 257 commit failed and will have to be matched with calls to either
 258 @code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast,
 259 gcc relies on transactional wrappers for the functions of the Exception
 260 Handling ABI and on one additional commit function (shown below). This allows
 261 the TM to keep track of EH internally and thus it does not have to embed the
 262 cleanup of EH state into the existing EH code in the program.
 263 @code{_ITM_tryCommitTransaction} is not supported.
 264 @code{_ITM_commitTransactionToId} is also not supported because the
 265 propagation of thrown exceptions will not bypass commits of nested
 266 transactions.
 267
 268 @example
 269 void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM;
 270 void *_ITM_cxa_allocate_exception (size_t);
 271 void _ITM_cxa_free_exception (void *exc_ptr);
 272 void _ITM_cxa_throw (void *obj, void *tinfo, void (*dest) (void *));
 273 void *_ITM_cxa_begin_catch (void *exc_ptr);
 274 void _ITM_cxa_end_catch (void);
 275 @end example
 276
 277 The EH scheme changed in version 6 of GCC.  Previously, the compiler
 278 added a call to @code{_ITM_commitTransactionEH} to commit a transaction if
 279 an exception could be in flight at this position in the code; @code{exc_ptr} is
 280 the address of the current exception and must be non-zero.  Now, the
 281 compiler must catch all exceptions that are about to be thrown out of a
 282 transaction and call @code{_ITM_commitTransactionEH} from the catch clause,
 283 with @code{exc_ptr} being zero.
 284
 285 Note that the old EH scheme never worked completely in GCC's implementation;
 286 libitm currently does not try to be compatible with the old scheme.
 287
 288 The @code{_ITM_cxa...} functions are transactional wrappers for the respective
 289 @code{__cxa...} functions and must be called instead of these in transactional
 290 code.  @code{_ITM_cxa_free_exception} is new in GCC 6.
 291
 292 To support this EH scheme, libstdc++ needs to provide one additional function
 293 (@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception
 294 handling state while rolling back a transaction:
 295
 296 @example
 297 void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc,
 298                        unsigned int caught_count);
 299 @end example
 300
 301 Since GCC 6, @code{unthrown_obj} is not used anymore and always null;
 302 prior to that, @code{unthrown_obj} is non-null if the program called
 303 @code{__cxa_allocate_exception} for this exception but did not yet called
 304 @code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is
 305 currently processing a cleanup along an exception path but has not caught this
 306 exception yet. @code{caught_count} is the nesting depth of
 307 @code{__cxa_begin_catch} within the transaction (which can be counted by the TM
 308 using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch});
 309 @code{__cxa_tm_cleanup} then performs rollback by essentially performing
 310 @code{__cxa_end_catch} that many times.
 311
 312
 313
 314 @subsection Exception handling support
 315
 316 Currently, there is no support for functionality like
 317 @code{__transaction_cancel throw} as described in the C++ TM specification.
 318 Supporting this should be possible with the EH scheme explained previously
 319 because via the transactional wrappers for the EH ABI, the TM is able to
 320 observe and intercept EH.
 321
 322
 323 @subsection [No changes] Transition to serial--irrevocable mode
 324 @subsection [No changes] Data transfer functions
 325 @subsection [No changes] Transactional memory copies
 326
 327 @subsection Transactional versions of memmove
 328
 329 If either the source or destination memory region is to be accessed
 330 nontransactionally, then source and destination regions must not be
 331 overlapping. The respective @code{_ITM_memmove} functions are still
 332 available but a fatal runtime error will be raised if such regions do overlap.
 333 To support this functionality, the ABI would have to specify how the
 334 intersection of the regions has to be accessed (i.e., transactionally or
 335 nontransactionally).
 336
 337 @subsection [No changes] Transactional versions of memset
 338 @subsection [No changes] Logging functions
 339
 340 @subsection User-registered commit and undo actions
 341
 342 Commit actions will get executed in the same order in which the respective
 343 calls to @code{_ITM_addUserCommitAction} happened. Only
 344 @code{_ITM_noTransactionId} is allowed as value for the
 345 @code{resumingTransactionId} argument. Commit actions get executed after
 346 privatization safety has been ensured.
 347
 348 Undo actions will get executed in reverse order compared to the order in which
 349 the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of
 350 undo actions w.r.t. the roll-back of other actions (e.g., data transfers or
 351 memory allocations) is undefined.
 352
 353 @code{_ITM_getThreadnum} is not supported currently because its only purpose
 354 is to provide a thread ID that matches some assumed performance tuning output,
 355 but this output is not part of the ABI nor further defined by it.
 356
 357 @code{_ITM_dropReferences} is not supported currently because its semantics and
 358 the intention behind it is not entirely clear. The
 359 specification suggests that this function is necessary because of certain
 360 orderings of data transfer undos and the releasing of memory regions (i.e.,
 361 privatization). However, this ordering is never defined, nor is the ordering of
 362 dropping references w.r.t. other events.
 363
 364 @subsection [New] Transactional indirect calls
 365
 366 Indirect calls (i.e., calls through a function pointer) within transactions
 367 should execute the transactional clone of the original function (i.e., a clone
 368 of the original that has been fully instrumented to use the TM runtime), if
 369 such a clone is available. The runtime provides two functions to
 370 register/deregister clone tables:
 371
 372 @example
 373 struct clone_entry
 374 @{
 375   void *orig, *clone;
 376 @};
 377
 378 void _ITM_registerTMCloneTable (clone_entry *table, size_t entries);
 379 void _ITM_deregisterTMCloneTable (clone_entry *table);
 380 @end example
 381
 382 Registered tables must be writable by the TM runtime, and must be live
 383 throughout the life-time of the TM runtime.
 384
 385 @strong{TODO} The intention was always to drop the registration functions
 386 entirely, and create a new ELF Phdr describing the linker-sorted table.  Much
 387 like what currently happens for @code{PT_GNU_EH_FRAME}.
 388 This work kept getting bogged down in how to represent the @var{N} different
 389 code generation variants.  We clearly needed at least two---SW and HW
 390 transactional clones---but there was always a suggestion of more variants for
 391 different TM assumptions/invariants.
 392
 393 The compiler can then use two TM runtime functions to perform indirect calls in
 394 transactions:
 395 @example
 396 void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM;
 397 void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM;
 398 @end example
 399
 400 If there is a registered clone for supplied function, both will return a
 401 pointer to the clone. If not, the first runtime function will attempt to switch
 402 to serial--irrevocable mode and return the original pointer, whereas the second
 403 will raise a fatal runtime error.
 404
 405 @subsection [New] Transactional dynamic memory management
 406
 407 @example
 408 void *_ITM_malloc (size_t)
 409        __attribute__((__malloc__)) ITM_PURE;
 410 void *_ITM_calloc (size_t, size_t)
 411        __attribute__((__malloc__)) ITM_PURE;
 412 void _ITM_free (void *) ITM_PURE;
 413 @end example
 414
 415 These functions are essentially transactional wrappers for @code{malloc},
 416 @code{calloc}, and @code{free}. Within transactions, the compiler should
 417 replace calls to the original functions with calls to the wrapper functions.
 418
 419 libitm also provides transactional clones of C++ memory management functions
 420 such as global operator new and delete.  They are part of libitm for historic
 421 reasons but do not need to be part of this ABI.
 422
 423
 424 @section [No changes] Future Enhancements to the ABI
 425
 426 @section Sample code
 427
 428 The code examples might not be correct w.r.t. the current version of the ABI,
 429 especially everything related to exception handling.
 430
 431
 432 @section [New] Memory model
 433
 434 The ABI should define a memory model and the ordering that is guaranteed for
 435 data transfers and commit/undo actions, or at least refer to another memory
 436 model that needs to be preserved. Without that, the compiler cannot ensure the
 437 memory model specified on the level of the programming language (e.g., by the
 438 C++ TM specification).
 439
 440 For example, if a transactional load is ordered before another load/store, then
 441 the TM runtime must also ensure this ordering when accessing shared state. If
 442 not, this might break the kind of publication safety used in the C++ TM
 443 specification. Likewise, the TM runtime must ensure privatization safety.
 444
 445
 446
 447 @c ---------------------------------------------------------------------
 448 @c Internals
 449 @c ---------------------------------------------------------------------
 450
 451 @node Internals
 452 @chapter Internals
 453
 454 @section TM methods and method groups
 455
 456 libitm supports several ways of synchronizing transactions with each other.
 457 These TM methods (or TM algorithms) are implemented in the form of
 458 subclasses of @code{abi_dispatch}, which provide methods for
 459 transactional loads and stores as well as callbacks for rollback and commit.
 460 All methods that are compatible with each other (i.e., that let concurrently
 461 running transactions still synchronize correctly even if different methods
 462 are used) belong to the same TM method group. Pointers to TM methods can be
 463 obtained using the factory methods prefixed with @code{dispatch_} in
 464 @file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and
 465 @code{dispatch_serialirr}, that are compatible with all methods because they
 466 run transactions completely in serial mode.
 467
 468 @subsection TM method life cycle
 469
 470 The state of TM methods does not change after construction, but they do alter
 471 the state of transactions that use this method. However, because
 472 per-transaction data gets used by several methods, @code{gtm_thread} is
 473 responsible for setting an initial state that is useful for all methods.
 474 After that, methods are responsible for resetting/clearing this state on each
 475 rollback or commit (of outermost transactions), so that the transaction
 476 executed next is not affected by the previous transaction.
 477
 478 There is also global state associated with each method group, which is
 479 initialized and shut down (@code{method_group::init()} and @code{fini()})
 480 when switching between method groups (see @file{retry.cc}).
 481
 482 @subsection Selecting the default method
 483
 484 The default method that libitm uses for freshly started transactions (but
 485 not necessarily for restarted transactions) can be set via an environment
 486 variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name
 487 of one of the factory methods returning abi_dispatch subclasses but without
 488 the "dispatch_" prefix (e.g., "serialirr" instead of
 489 @code{GTM::dispatch_serialirr()}).
 490
 491 Note that this environment variable is only a hint for libitm and might not
 492 be supported in the future.
 493
 494
 495 @section Nesting: flat vs. closed
 496
 497 We support two different kinds of nesting of transactions. In the case of
 498 @emph{flat nesting}, the nesting structure is flattened and all nested
 499 transactions are subsumed by the enclosing transaction. In contrast,
 500 with @emph{closed nesting}, nested transactions that have not yet committed
 501 can be rolled back separately from the enclosing transactions; when they
 502 commit, they are subsumed by the enclosing transaction, and their effects
 503 will be finally committed when the outermost transaction commits.
 504 @emph{Open nesting} (where nested transactions can commit independently of the
 505 enclosing transactions) are not supported.
 506
 507 Flat nesting is the default nesting mode, but closed nesting is supported and
 508 used when transactions contain user-controlled aborts
 509 (@code{__transaction_cancel} statements). We assume that user-controlled
 510 aborts are rare in typical code and used mostly in exceptional situations.
 511 Thus, it makes more sense to use flat nesting by default to avoid the
 512 performance overhead of the additional checkpoints required for closed
 513 nesting. User-controlled aborts will correctly abort the innermost enclosing
 514 transaction, whereas the whole (i.e., outermost) transaction will be restarted
 515 otherwise (e.g., when a transaction encounters data conflicts during
 516 optimistic execution).
 517
 518
 519 @section Locking conventions
 520
 521 This section documents the locking scheme and rules for all uses of locking
 522 in libitm. We have to support serial(-irrevocable) mode, which is implemented
 523 using a global lock as explained next (called the @emph{serial lock}). To
 524 simplify the overall design, we use the same lock as catch-all locking
 525 mechanism for other infrequent tasks such as (de)registering clone tables or
 526 threads. Besides the serial lock, there are @emph{per-method-group locks} that
 527 are managed by specific method groups (i.e., groups of similar TM concurrency
 528 control algorithms), and lock-like constructs for quiescence-based operations
 529 such as ensuring privatization safety.
 530
 531 Thus, the actions that participate in the libitm-internal locking are either
 532 @emph{active transactions} that do not run in serial mode, @emph{serial
 533 transactions} (which (are about to) run in serial mode), and management tasks
 534 that do not execute within a transaction but have acquired the serial mode
 535 like a serial transaction would do (e.g., to be able to register threads with
 536 libitm). Transactions become active as soon as they have successfully used the
 537 serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock
 538 implementation}). Likewise, transactions become serial transactions as soon as
 539 they have acquired the exclusive rights provided by the serial lock (i.e.,
 540 serial mode, which also means that there are no other concurrent active or
 541 serial transactions). Note that active transactions can become serial
 542 transactions when they enter serial mode during the runtime of the
 543 transaction.
 544
 545 @subsection State-to-lock mapping
 546
 547 Application data is protected by the serial lock if there is a serial
 548 transaction and no concurrently running active transaction (i.e., non-serial).
 549 Otherwise, application data is protected by the currently selected method
 550 group, which might use per-method-group locks or other mechanisms. Also note
 551 that application data that is about to be privatized might not be allowed to be
 552 accessed by nontransactional code until privatization safety has been ensured;
 553 the details of this are handled by the current method group.
 554
 555 libitm-internal state is either protected by the serial lock or accessed
 556 through custom concurrent code. The latter applies to the public/shared part
 557 of a transaction object and most typical method-group-specific state.
 558
 559 The former category (protected by the serial lock) includes:
 560 @itemize @bullet
 561 @item The list of active threads that have used transactions.
 562 @item The tables that map functions to their transactional clones.
 563 @item The current selection of which method group to use.
 564 @item Some method-group-specific data, or invariants of this data. For example,
 565 resetting a method group to its initial state is handled by switching to the
 566 same method group, so the serial lock protects such resetting as well.
 567 @end itemize
 568 In general, such state is immutable whenever there exists an active
 569 (non-serial) transaction. If there is no active transaction, a serial
 570 transaction (or a thread that is not currently executing a transaction but has
 571 acquired the serial lock) is allowed to modify this state (but must of course
 572 be careful to not surprise the current method group's implementation with such
 573 modifications).
 574
 575 @subsection Lock acquisition order
 576
 577 To prevent deadlocks, locks acquisition must happen in a globally agreed-upon
 578 order. Note that this applies to other forms of blocking too, but does not
 579 necessarily apply to lock acquisitions that do not block (e.g., trylock()
 580 calls that do not get retried forever). Note that serial transactions are
 581 never return back to active transactions until the transaction has committed.
 582 Likewise, active transactions stay active until they have committed.
 583 Per-method-group locks are typically also not released before commit.
 584
 585 Lock acquisition / blocking rules:
 586 @itemize @bullet
 587
 588 @item Transactions must become active or serial before they are allowed to
 589 use method-group-specific locks or blocking (i.e., the serial lock must be
 590 acquired before those other locks, either in serial or nonserial mode).
 591
 592 @item Any number of threads that do not currently run active transactions can
 593 block while trying to get the serial lock in exclusive mode. Note that active
 594 transactions must not block when trying to upgrade to serial mode unless there
 595 is no other transaction that is trying that (the latter is ensured by the
 596 serial lock implementation.
 597
 598 @item Method groups must prevent deadlocks on their locks. In particular, they
 599 must also be prepared for another active transaction that has acquired
 600 method-group-specific locks but is blocked during an attempt to upgrade to
 601 being a serial transaction. See below for details.
 602
 603 @item Serial transactions can acquire method-group-specific locks because there
 604 will be no other active nor serial transaction.
 605
 606 @end itemize
 607
 608 There is no single rule for per-method-group blocking because this depends on
 609 when a TM method might acquire locks. If no active transaction can upgrade to
 610 being a serial transaction after it has acquired per-method-group locks (e.g.,
 611 when those locks are only acquired during an attempt to commit), then the TM
 612 method does not need to consider a potential deadlock due to serial mode.
 613
 614 If there can be upgrades to serial mode after the acquisition of
 615 per-method-group locks, then TM methods need to avoid those deadlocks:
 616 @itemize @bullet
 617 @item When upgrading to a serial transaction, after acquiring exclusive rights
 618 to the serial lock but before waiting for concurrent active transactions to
 619 finish (@pxref{serial-lock-impl,,Serial lock implementation} for details),
 620 we have to wake up all active transactions waiting on the upgrader's
 621 per-method-group locks.
 622 @item Active transactions blocking on per-method-group locks need to check the
 623 serial lock and abort if there is a pending serial transaction.
 624 @item Lost wake-ups have to be prevented (e.g., by changing a bit in each
 625 per-method-group lock before doing the wake-up, and only blocking on this lock
 626 using a futex if this bit is not group).
 627 @end itemize
 628
 629 @strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make
 630 sense to introduce further complexity in the serial lock? For gl-*, we can
 631 really only avoid an abort if we do -wb and -vbv.
 632
 633
 634 @subsection Serial lock implementation
 635 @anchor{serial-lock-impl}
 636
 637 The serial lock implementation is optimized towards assuming that serial
 638 transactions are infrequent and not the common case. However, the performance
 639 of entering serial mode can matter because when only few transactions are run
 640 concurrently or if there are few threads, then it can be efficient to run
 641 transactions serially.
 642
 643 The serial lock is similar to a multi-reader-single-writer lock in that there
 644 can be several active transactions but only one serial transaction. However,
 645 we do want to avoid contention (in the lock implementation) between active
 646 transactions, so we split up the reader side of the lock into per-transaction
 647 flags that are true iff the transaction is active. The exclusive writer side
 648 remains a shared single flag, which is acquired using a CAS, for example.
 649 On the fast-path, the serial lock then works similar to Dekker's algorithm but
 650 with several reader flags that a serial transaction would have to check.
 651 A serial transaction thus requires a list of all threads with potentially
 652 active transactions; we can use the serial lock itself to protect this list
 653 (i.e., only threads that have acquired the serial lock can modify this list).
 654
 655 We want starvation-freedom for the serial lock to allow for using it to ensure
 656 progress for potentially starved transactions (@pxref{progress-guarantees,,
 657 Progress Guarantees} for details). However, this is currently not enforced by
 658 the implementation of the serial lock.
 659
 660 Here is pseudo-code for the read/write fast paths of acquiring the serial
 661 lock (read-to-write upgrade is similar to write_lock:
 662 @example
 663 // read_lock:
 664 tx->shared_state |= active;
 665 __sync_synchronize(); // or STLD membar, or C++0x seq-cst fence
 666 while (!serial_lock.exclusive)
 667   if (spinning_for_too_long) goto slowpath;
 668
 669 // write_lock:
 670 if (CAS(&serial_lock.exclusive, 0, this) != 0)
 671   goto slowpath; // writer-writer contention
 672 // need a membar here, but CAS already has full membar semantics
 673 bool need_blocking = false;
 674 for (t: all txns)
 675   @{
 676     for (;t->shared_state & active;)
 677       if (spinning_for_too_long) @{ need_blocking = true; break; @}
 678   @}
 679 if (need_blocking) goto slowpath;
 680 @end example
 681
 682 Releasing a lock in this spin-lock version then just consists of resetting
 683 @code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}.
 684
 685 However, we can't rely on a pure spinlock because we need to get the OS
 686 involved at some time (e.g., when there are more threads than CPUs to run on).
 687 Therefore, the real implementation falls back to a blocking slow path, either
 688 based on pthread mutexes or Linux futexes.
 689
 690
 691 @subsection Reentrancy
 692
 693 libitm has to consider the following cases of reentrancy:
 694 @itemize @bullet
 695
 696 @item Transaction calls unsafe code that starts a new transaction: The outer
 697 transaction will become a serial transaction before executing unsafe code.
 698 Therefore, nesting within serial transactions must work, even if the nested
 699 transaction is called from within uninstrumented code.
 700
 701 @item Transaction calls either a transactional wrapper or safe code, which in
 702 turn starts a new transaction: It is not yet defined in the specification
 703 whether this is allowed. Thus, it is undefined whether libitm supports this.
 704
 705 @item Code that starts new transactions might be called from within any part
 706 of libitm: This kind of reentrancy would likely be rather complex and can
 707 probably be avoided. Therefore, it is not supported.
 708
 709 @end itemize
 710
 711 @subsection Privatization safety
 712
 713 Privatization safety is ensured by libitm using a quiescence-based approach.
 714 Basically, a privatizing transaction waits until all concurrent active
 715 transactions will either have finished (are not active anymore) or operate on
 716 a sufficiently recent snapshot to not access the privatized data anymore. This
 717 happens after the privatizing transaction has stopped being an active
 718 transaction, so waiting for quiescence does not contribute to deadlocks.
 719
 720 In method groups that need to ensure publication safety explicitly, active
 721 transactions maintain a flag or timestamp in the public/shared part of the
 722 transaction descriptor. Before blocking, privatizers need to let the other
 723 transactions know that they should wake up the privatizer.
 724
 725 @strong{TODO} Ho to implement the waiters? Should those flags be
 726 per-transaction or at a central place? We want to avoid one wake/wait call
 727 per active transactions, so we might want to use either a tree or combining
 728 to reduce the syscall overhead, or rather spin for a long amount of time
 729 instead of doing blocking. Also, it would be good if only the last transaction
 730 that the privatizer waits for would do the wake-up.
 731
 732 @subsection Progress guarantees
 733 @anchor{progress-guarantees}
 734
 735 Transactions that do not make progress when using the current TM method will
 736 eventually try to execute in serial mode. Thus, the serial lock's progress
 737 guarantees determine the progress guarantees of the whole TM. Obviously, we at
 738 least need deadlock-freedom for the serial lock, but it would also be good to
 739 provide starvation-freedom (informally, all threads will finish executing a
 740 transaction eventually iff they get enough cycles).
 741
 742 However, the scheduling of transactions (e.g., thread scheduling by the OS)
 743 also affects the handling of progress guarantees by the TM. First, the TM
 744 can only guarantee deadlock-freedom if threads do not get stopped. Likewise,
 745 low-priority threads can starve if they do not get scheduled when other
 746 high-priority threads get those cycles instead.
 747
 748 If all threads get scheduled eventually, correct lock implementations will
 749 provide deadlock-freedom, but might not provide starvation-freedom. We can
 750 either enforce the latter in the TM's lock implementation, or assume that
 751 the scheduling is sufficiently random to yield a probabilistic guarantee that
 752 no thread will starve (because eventually, a transaction will encounter a
 753 scheduling that will allow it to run). This can indeed work well in practice
 754 but is not necessarily guaranteed to work (e.g., simple spin locks can be
 755 pretty efficient).
 756
 757 Because enforcing stronger progress guarantees in the TM has a higher runtime
 758 overhead, we focus on deadlock-freedom right now and assume that the threads
 759 will get scheduled eventually by the OS (but don't consider threads with
 760 different priorities). We should support starvation-freedom for serial
 761 transactions in the future. Everything beyond that is highly related to proper
 762 contention management across all of the TM (including with TM method to
 763 choose), and is future work.
 764
 765 @strong{TODO} Handling thread priorities: We want to avoid priority inversion
 766 but it's unclear how often that actually matters in practice. Workloads that
 767 have threads with different priorities will likely also require lower latency
 768 or higher throughput for high-priority threads. Therefore, it probably makes
 769 not that much sense (except for eventual progress guarantees) to use
 770 priority inheritance until the TM has priority-aware contention management.
 771
 772
 773 @c ---------------------------------------------------------------------
 774 @c GNU Free Documentation License
 775 @c ---------------------------------------------------------------------
 776
 777 @include fdl.texi
 778
 779 @c ---------------------------------------------------------------------
 780 @c Index
 781 @c ---------------------------------------------------------------------
 782
 783 @node Library Index
 784 @unnumbered Library Index
 785
 786 @printindex cp
 787
 788 @bye