doc/charm++/sdag.tex

   1 \charmpp\ is based on the message-driven parallel programming paradigm. In
   2 contrast to many other approaches, \charmpp\ programmers encode entry points to
   3 their parallel objects, but do not explicitly wait (i.e. block) on the runtime
   4 to indicate completion of posted `receive' requests. Thus, a \charmpp\ object's
   5 overall flow of control can end up fragmented across a number of separate
   6 methods, obscuring the sequence in which code is expected to
   7 execute. Furthermore, there are often constraints on when different pieces of
   8 code should execute relative to one another, related to data and
   9 synchronization dependencies.
  10
  11 Consider one way of expressing these constraints using flags, buffers, and
  12 counters, as in the following example:
  13 %
  14 %\begin{figure}[ht]
  15 \begin{center}
  16 \begin{alltt}
  17 // in .ci file
  18 chare ComputeObject \{
  19   entry void ComputeObject();
  20   entry void startStep();
  21   entry void firstInput(Input i);
  22   entry void secondInput(Input j);
  23 \};
  24
  25 // in C++ file
  26 class ComputeObject : public CBase_ComputeObject \{
  27   int   expectedMessageCount;
  28   Input first, second;
  29
  30 public:
  31   ComputeObject() \{
  32     startStep();
  33   \}
  34   void startStep() \{
  35     expectedMessageCount = 2;
  36   \}
  37
  38   void firstInput(Input i) \{
  39     first = i;
  40     if (\verb|--expectedMessageCount == 0|)
  41       computeInteractions(first, second);
  42     \}
  43   void recv_second(Input j) \{
  44     second = j;
  45     if (\verb|--expectedMessageCount == 0|)
  46       computeInteractions(first, second);
  47   \}
  48
  49   void computeInteractions(Input a, Input b) \{
  50     // do computations using a and b
  51     . . .
  52     // send off results
  53     . . .
  54     // reset for next step
  55     startStep();
  56   \}
  57 \};
  58 \end{alltt}
  59 \end{center}
  60 %\caption{Compute Object in a Molecular Dynamics Application}
  61 %\label{figchareexample}
  62 %\end{figure}
  63 In each step, this object expects pairs of messages, and waits to process the
  64 incoming data until it has both of them. This sequencing is encoded across 4
  65 different functions, which in real code could be much larger and more numerous,
  66 resulting in a spaghetti-code mess.
  67
  68 Instead, it would be preferable to express this flow of control using
  69 structured constructs, such as loops. \charmpp\ provides such constructs for
  70 structured control flow across an object's entry methods in a notation called
  71 Structured Dagger. The basic constructs of Structured Dagger (SDAG) provide for
  72 \emph{program-order execution} of the entry methods and code blocks that they
  73 define. These definitions appear in the {\tt .ci} file definition of the
  74 enclosing chare class as a `body' of an entry method following its signature.
  75
  76 The most basic construct in SDAG is the {\tt serial} (aka the {\tt atomic}) block.
  77 Serial blocks
  78 contain sequential \CC code.  They're also called atomic because the code within
  79 them executes without returning control to the \charmpp\ runtime scheduler, and
  80 thus avoiding interruption from incoming messages. The keywords atomic and serial
  81 are synonymous, and you can find example programs that use atomic. However, we
  82 recommend the use of serial and are considering the deprecation of the atomic keyword.
  83 Typically serial blocks hold
  84 the code that actually deals with incoming messages in a {\tt when} statement,
  85 or to do local operations before a message is sent or after it's received. The
  86 earlier example can be adapted to use serial blocks as follows:
  87 \begin{center}
  88 \begin{alltt}
  89 // in .ci file
  90 chare ComputeObject \{
  91   entry void ComputeObject();
  92   entry void startStep();
  93   entry void firstInput(Input i) \{
  94     serial \{
  95       first = i;
  96       if (\verb|--expectedMessageCount == 0|)
  97         computeInteractions(first, second);
  98     \}
  99   \};
 100   entry void secondInput(Input j) \{
 101     serial \{
 102       second = j;
 103       if (\verb|--expectedMessageCount == 0|)
 104         computeInteractions(first, second);
 105     \}
 106   \};
 107 \};
 108
 109 // in C++ file
 110 class ComputeObject : public CBase\_ComputeObject \{
 111   ComputeObject\_SDAG\_CODE
 112   int   expectedMessageCount;
 113   Input first, second;
 114
 115 public:
 116   ComputeObject() \{
 117     startStep();
 118   \}
 119   void startStep() \{
 120     expectedMessageCount = 2;
 121   \}
 122
 123   void computeInteractions(Input a, Input b) \{
 124     // do computations using a and b
 125     . . .
 126     // send off results
 127     . . .
 128     // reset for next step
 129     startStep();
 130   \}
 131 \};
 132 \end{alltt}
 133 \end{center}
 134 Note that chare classes containing SDAG code must include a few additional declarations
 135 in addition to inheriting from their {\tt CBase\_Foo} class, by incorporating the
 136 {\tt Foo\_SDAG\_CODE} generated-code macro in the class.
 137
 138 Serial blocks can also specify a textual `label' that will appear in traces, as
 139 follows:
 140 \begin{center}
 141 \begin{alltt}
 142   entry void firstInput(Input i) \{
 143     serial "process first" \{
 144       first = i;
 145       if (\verb|--expectedMessageCount == 0|)
 146         computeInteractions(first, second);
 147     \}
 148   \};
 149 \end{alltt}
 150 \end{center}
 151
 152 In order to control the sequence in which entry methods are processed, SDAG
 153 provides the {\tt when} construct. These statements, also called triggers,
 154 indicate that we expect an incoming message of a particular type, and provide
 155 code to handle that message when it arrives. From the perspective of a chare
 156 object reaching a {\tt when} statement, it is effectively a `blocking
 157 receive.'
 158
 159 Entry methods defined by a {\tt when} are
 160 not executed immediately when a message targeting them is delivered, but
 161 instead are held until control flow in the chare reaches a corresponding {\tt
 162   when} clause. Conversely, when control flow reaches a {\tt when} clause, the
 163 generated code checks whether a corresponding message has arrived: if one has
 164 arrived, it is processed; otherwise, control is returned to the
 165 \charmpp\ scheduler.
 166
 167 The use of {\tt when} substantially simplifies the example from above:
 168 \begin{center}
 169 \begin{alltt}
 170 // in .ci file
 171 chare ComputeObject \{
 172   entry void ComputeObject();
 173   entry void startStep() \{
 174     when firstInput(Input first)
 175       when secondInput(Input second)
 176         serial \{
 177           computeInteractions(first, second);
 178         \}
 179   \};
 180   entry void firstInput(Input i);
 181   entry void secondInput(Input j);
 182 \};
 183
 184 // in C++ file
 185 class ComputeObject : public CBase_ComputeObject \{
 186   ComputeObject_SDAG_CODE
 187
 188 public:
 189   ComputeObject() \{
 190     startStep();
 191   \}
 192
 193   void computeInteractions(Input a, Input b) \{
 194     // do computations using a and b
 195     . . .
 196     // send off results
 197     . . .
 198     // reset for next step
 199     startStep();
 200   \}
 201 \};
 202 \end{alltt}
 203 \end{center}
 204 Like an {\tt if} or {\tt while} in C code, each {\tt when} clause has a body
 205 made up of the statement or block following it. The variables declared as
 206 arguments to the entry method triggering the when are available in the scope of
 207 the body. By using the sequenced execution of SDAG code and the availability of
 208 parameters to when-defined entry methods in their bodies, the counter {\tt
 209   expectedMessageCount} and the intermediate copies of the received input are
 210 eliminated. Note that the entry methods {\tt firstInput} and {\tt secondInput}
 211 are still declared in the {\tt .ci} file, but their definition is in the SDAG
 212 code. The interface translator generates code to handle buffering and
 213 triggering them appropriately.
 214
 215 For simplicity, {\tt when} constructs can also specify multiple expected entry
 216 methods that all feed into a single body, by separating their prototypes with
 217 commas:
 218 \begin{center}
 219 \begin{alltt}
 220 entry void startStep() \{
 221   when firstInput(Input first),
 222        secondInput(Input second)
 223     serial \{
 224       computeInteractions(first, second);
 225     \}
 226 \};
 227 \end{alltt}
 228 \end{center}
 229
 230 A single entry method is allowed to appear in more than one {\tt when} statement.
 231 If only one of those {\tt when} statements has been triggered when the runtime
 232 delivers a message to that entry method, that {\tt when} statement is guaranteed
 233 to process it. If there is no trigger waiting for that entry method, then the
 234 next corresponding {\tt when} to be reached will receive that message. If there is
 235 more than one {\tt when} waiting on that method, which one will receive it is not
 236 specified, and should not be relied upon. For an example of multiple {\tt when}
 237 statements handling the same entry method without reaching the unspecified case,
 238 see the CharmLU benchmark.
 239
 240 To more finely control the correspondence between incoming messages and {\tt when}
 241 clauses associated with the target entry method, SDAG supports \emph{matching} on
 242 reference numbers. Matching is typically used to denote an iteration of a program
 243 that executes asynchronously (without any sort of barrier or other synchronization
 244 between steps) or a particular piece of the problem being solved.
 245 Matching is requested by placing an expression denoting the desired reference
 246 number in square brackets between the entry method name and its parameter list.
 247 For parameter marshalled entry methods, the reference number expression will be
 248 compared for equality with the entry method's first argument. For entry methods
 249 that accept an explicit message (\S~\ref{messages}), the reference number on
 250 the message can be set by calling the function
 251 \verb|CkSetRefNum(void *msg, CMK_REFNUM_TYPE ref)|.
 252 Matching is used in the loop example below, and in
 253 \examplereffile{jacobi2d-sdag/jacobi2d.ci}. Multiple {\tt when} triggers for
 254 an entry method with different matching reference numbers will not conflict - each
 255 will receive only corresponding messages.
 256
 257 SDAG supports the {\tt for} and {\tt while} loop constructs mostly as if they
 258 appeared in plain C or C++ code. In the running example, {\tt
 259   computeInteractions()} calls {\tt startStep()} when it is finished to start
 260 the next step. Instead of this arrangement, the loop structure can be made
 261 explicit:
 262 \begin{center}
 263 \begin{alltt}
 264 // in .ci file
 265 chare ComputeObject \{
 266   entry void ComputeObject();
 267   entry void runForever() \{
 268     while(true) \{
 269       when firstInput(Input first),
 270            secondInput(Input second) serial \{
 271           computeInteractions(first, second);
 272       \}
 273     \}
 274   \};
 275   entry void firstInput(Input i);
 276   entry void secondInput(Input j);
 277 \};
 278
 279 // in C++ file
 280 class ComputeObject : public CBase_ComputeObject \{
 281   ComputeObject_SDAG_CODE
 282
 283 public:
 284   ComputeObject() \{
 285     runForever();
 286   \}
 287
 288   void computeInteractions(Input a, Input b) \{
 289     // do computations using a and b
 290     . . .
 291     // send off results
 292     . . .
 293   \}
 294 \};
 295 \end{alltt}
 296 \end{center}
 297 If this code should instead run for a fixed number of iterations, we can
 298 instead use a for loop:
 299 \begin{center}
 300 \begin{alltt}
 301 // in .ci file
 302 chare ComputeObject \{
 303   entry void ComputeObject();
 304   entry void runForever() \{
 305     for(iter = 0; iter < n; ++iter) \{
 306       // Match to only accept inputs for the current iteration
 307       when firstInput[iter](int a, Input first),
 308            secondInput[iter](int b, Input second) serial \{
 309         computeInteractions(first, second);
 310       \}
 311     \}
 312   \};
 313   entry void firstInput(int a, Input i);
 314   entry void secondInput(int b, Input j);
 315 \};
 316
 317 // in C++ file
 318 class ComputeObject : public CBase_ComputeObject \{
 319   ComputeObject_SDAG_CODE
 320   int n, iter;
 321
 322 public:
 323   ComputeObject() \{
 324     n = 10;
 325     runForever();
 326   \}
 327
 328   void computeInteractions(Input a, Input b) \{
 329     // do computations using a and b
 330     . . .
 331     // send off results
 332     . . .
 333   \}
 334 \};
 335 \end{alltt}
 336 \end{center}
 337 Note that {\tt int iter;} is declared in the chare's class definition and not
 338 in the {\tt .ci} file. This is necessary because the \charmpp\ interface
 339 translator does not fully parse the declarations in the {\tt for} loop header,
 340 because of the inherent complexities of C++.
 341
 342 SDAG also supports conditional execution of statements and blocks with {\tt if}
 343 statements. The syntax of SDAG {\tt if} statements matches that of C and
 344 C++. However, if one encounters a syntax error on correct-looking code in a
 345 loop or conditional statement, try assigning the condition expression to a
 346 boolean variable in a serial block preceding the statement and then testing that
 347 boolean's value. This can be necessary because of the complexity of parsing C++
 348 code.
 349
 350 In cases where multiple tasks must be processed before execution continues, but
 351 with no dependencies or interactions among them, SDAG provides the {\tt
 352   overlap} construct. Overlap blocks contain a series of SDAG statements within
 353 them which can occur in any order. Commonly these blocks are used to hold a
 354 series of {\tt when} triggers which can be received and processed in any
 355 order. Flow of control doesn't leave the overlap block until all the statements
 356 within it have been processed.
 357
 358 In the running example, suppose each input needs to be preprocessed independently
 359 before the call to {\tt computeInteractions}. Since we don't care which order
 360 they get processed in, and want it to happen as soon as possible, we can apply
 361 {\tt overlap}:
 362 \begin{center}
 363 \begin{alltt}
 364 // in .ci file
 365 chare ComputeObject \{
 366   entry void ComputeObject();
 367   entry void startStep() \{
 368     overlap \{
 369       when firstInput(Input i)
 370         serial \{ first = preprocess(i); \}
 371       when secondInput(Input j)
 372         serial \{ second = preprocess(j); \}
 373      \}
 374      serial \{
 375        computeInteractions(first, second);
 376      \}
 377   \};
 378   entry void firstInput(Input i);
 379   entry void secondInput(Input j);
 380 \};
 381
 382 // in C++ file
 383 class ComputeObject : public CBase_ComputeObject \{
 384   ComputeObject_SDAG_CODE
 385
 386 public:
 387   ComputeObject() \{
 388     startStep();
 389   \}
 390
 391   void computeInteractions(Input a, Input b) \{
 392     // do computations using a and b
 393     . . .
 394     // send off results
 395     . . .
 396     // reset for next step
 397     startStep();
 398   \}
 399 \};
 400 \end{alltt}
 401 \end{center}
 402
 403 Another construct offered by SDAG is the {\tt forall} loop. These loops are
 404 used when the iterations of a loop can be performed independently and in any
 405 order. This is in contrast to a regular {\tt for} loop, in which each iteration
 406 is executed sequentially. The {\tt forall} loop can be seen as an overlap with
 407 an indexed set of otherwise identical statements in the body. Its syntax is
 408 \begin{center}
 409 \begin{alltt}
 410 forall [IDENT] (MIN:MAX,STRIDE) BODY
 411 \end{alltt}
 412 \end{center}
 413 The range from MIN to MAX is inclusive. Its use is demonstrated through
 414 distributed parallel matrix-matrix multiply shown in
 415 \examplereffile{matmul/matmul.ci}
 416
 417 \subsection{The \texttt{case} Statement}
 418
 419 The \texttt{case} statement in SDAG expresses a disjunction over a set of
 420 \texttt{when} clauses. In other words, if it is known that one dependency out
 421 of a set will be satisfied, but which one is not known, this statement allows
 422 the set to be specified and will execute the corresponding block based on which
 423 dependency ends up being fulfilled.
 424
 425 The following is a basic example of the \texttt{case} statement. Note that the
 426 trigger \texttt{b(), d()} will only be fulfilled if both \texttt{b()} and
 427 \texttt{d()} arrive. If only one arrives, then it will partially match, and the
 428 runtime will not ``commit'' to this branch until the second arrives. If another
 429 dependency fully matches, the partial match will be ignored and can be used to
 430 trigger another \texttt{when} later in the execution.
 431
 432 \begin{verbatim}
 433 case {
 434   when a() { }
 435   when b(), d() { }
 436   when c() { }
 437 }
 438 \end{verbatim}
 439
 440 A full example of the \texttt{case} statement can be found
 441 \testreffile{sdag/case/caseTest.ci}.
 442
 443 \section{Usage Notes}
 444
 445 If you've added \sdag\ code to your class, you must link in the code
 446 by adding ``{\it className}\_SDAG\_CODE'' inside the class declaration
 447 in the .h file. This macro defines the entry points and support code
 448 used by \sdag{}. Forgetting this results in a compile error (undefined
 449 SDAG entry methods referenced from the .def.h file).
 450
 451 For example, an array named ``Foo'' that uses sdag code might contain:
 452 \begin{center}
 453 \begin{alltt}
 454 class Foo : public CBase_Foo \{
 455 public:
 456     Foo_SDAG_CODE
 457     Foo(...) \{
 458        ...
 459     \}
 460     Foo(CkMigrateMessage *m) \{ \}
 461
 462     void pup(PUP::er &p) \{
 463        ...
 464     \}
 465     . . .
 466 \};
 467 \end{alltt}
 468 \end{center}
 469
 470 \zap{
 471 \section{Relationship to Threads}
 472
 473 Threads are typically used to perform the abovementioned sequencing.
 474 Lets us code our previous example using threads.
 475
 476 %\begin{figure}[ht]
 477 \begin{center}
 478 \begin{alltt}
 479 void compute_thread(int first_index, int second_index)
 480 \{
 481     getPatch(first_index);
 482     getPatch(second_index);
 483     threadId[0] = createThread(recvFirst);
 484     threadId[1] = createThread(recvSecond);
 485     threadJoin(2, threadId);
 486     computeInteractions(first, second);
 487   \}
 488   void recvFirst(void)
 489   \{
 490     recv(first, sizeof(Patch), ANY_PE, FIRST_TAG);
 491     filter(first);
 492   \}
 493   void recvSecond(void)
 494   \{
 495     recv(second, sizeof(Patch), ANY_PE, SECOND_TAG);
 496     filter(second);
 497   \}
 498 \end{alltt}
 499 \end{center}
 500 %\caption{Compute Thread in a Molecular Dynamics Application}
 501 %\label{figthreadexample}
 502 %\end{figure}
 503
 504 Contrast the compute chare-object example in figure~\ref{figchareexample} with
 505 a thread-based implementation of the same scheme in
 506 figure~\ref{figthreadexample}. Functions \uw{getFirst}, and \uw{getSecond} send
 507 messages asynchronously to the PatchManager, requesting that the specified
 508 patches be sent to them, and return immediately. Since these messages with
 509 patches could arrive in any order, two threads, \uw{recvFirst} and
 510 \uw{recvSecond}, are created. These threads block, waiting for messages to
 511 arrive. After each message arrives, each thread performs the filtering
 512 operation. The main thread waits for these two threads to complete, and then
 513 computes the pairwise interactions. Though the programming complexity of
 514 buffering the messages and maintaining the counters has been eliminated in this
 515 implementation, considerable overhead in the form of thread creation, and
 516 synchronization in the form of {\em join} has been added. Let us now code the
 517 same example in \sdag. It reduces the parallel programming complexity without
 518 adding any significant overhead.
 519
 520 %\begin{figure}[ht]
 521 \begin{center}
 522 \begin{alltt}
 523   array[1D] compute_object \{
 524     entry void recv_first(Patch *first);
 525     entry void recv_second(Patch *first);
 526     entry void compute_object(MSG *msg)\{
 527       serial \{
 528          PatchManager->Get(msg->first_index,\dots);
 529          PatchManager->Get(msg->second_index,\dots);
 530       \}
 531       overlap \{
 532         when recv_first(Patch *first) serial \{ filter(first); \}
 533         when recv_second(Patch *second) serial \{ filter(second); \}
 534       \}
 535       serial \{ computeInteractions(first, second); \}
 536     \}
 537   \}
 538 \end{alltt}
 539 \end{center}
 540 %\caption{\sdag\ Implementation of the Compute Object}
 541 %\label{figsdagexample}
 542 %\end{figure}
 543 }
 544
 545
 546 \zap{
 547 \section{Grammar}
 548
 549 \paragraph{Tokens}
 550
 551 \begin{alltt}
 552   <ident> = Valid \CC{} identifier
 553   <int-expr> = Valid \CC{} integer expression
 554   <\CC{}-code> = Valid \CC{} code
 555 \end{alltt}
 556
 557 \paragraph{Grammar in EBNF Form}
 558
 559 \begin{alltt}
 560 <sdag> := <class-decl> <sdagentry>+
 561
 562 <class-decl> := "class" <ident>
 563
 564 <sdagentry> := "sdagentry" <ident> "(" <ident> "*" <ident> ")" <body>
 565
 566 <body> := <stmt>
 567         | "\{" <stmt>+ "\}"
 568
 569 <stmt> := <overlap-stmt>
 570         | <when-stmt>
 571         | <atomic-stmt>
 572         | <if-stmt>
 573         | <while-stmt>
 574         | <for-stmt>
 575         | <forall-stmt>
 576
 577 <overlap-stmt> := "overlap" <body>
 578
 579 <atomic-stmt> := "serial" "\{" <\CC-code> "\}"
 580
 581 <if-stmt> := "if" "(" <int-expr> ")" <body> [<else-stmt>]
 582
 583 <else-stmt> := "else" <body>
 584
 585 <while-stmt> := "while" "(" <int-expr> ")" <body>
 586
 587 <for-stmt> := "for" "(" <c++-code> ";" <int-expr> ";" <c++-code> ")" <body>
 588
 589 <forall-stmt> := "forall" "[" <ident> "]" "(" <range-stride> ")" <body>
 590
 591 <range-stride> := <int-expr> ":" <int-expr> "," <int-expr>
 592
 593 <when-stmt> := "when" <entry-list>  <body>
 594
 595 <entry-list> := <entry>
 596               | <entry> [ "," <entry-list> ]
 597
 598 <entry> := <ident> [ "[" <int-expr> "]" ] "(" <ident> "*" <ident> ")"
 599
 600 \end{alltt}
 601 }
 602
 603 \zap{
 604 \sdag\ is a coordination language built on top of \charmpp\ that supports the
 605 sequencing mentioned above, while overcoming limitations of thread-based
 606 languages, and facilitating a clear expression of flow of control within the
 607 object without losing the performance benefits of adaptive message-driven
 608 execution.  In other words, \sdag\ is a structured notation for specifying
 609 intra-process control dependences in message-driven programs. It combines the
 610 efficiency of message-driven execution with the explicitness of control
 611 specification. \sdag\ allows easy expression of dependences among messages and
 612 computations and also among computations within the same object using
 613 when-blocks and various structured constructs.  \sdag\ is adequate for
 614 expressing control-dependencies that form a series-parallel control-flow graph.
 615 \sdag\ has been developed on top of \charmpp\. \sdag\ allows \charmpp\ entry
 616 methods (in chares, groups or arrays) to specify code (a when-block body) to be
 617 executed upon occurrence of certain events.  These events (or guards of a
 618 when-block) are entry methods of the object that can be invoked remotely. While
 619 writing a \sdag\ program, one has to declare these entries in \charmpp\
 620 interface file. The implementation of the entry methods that contain the
 621 when-block is written using the \sdag\ language. Grammar of \sdag\ is given in
 622 the EBNF form below.
 623
 624 \subsubsection{Usage}
 625
 626 You can use SDAG to implement entry methods for any chare, chare array, group,
 627 or nodegroup. Any entry method implemented using SDAG must be implemented in the
 628 interface (.ci) file for its class. An SDAG entry method consists of a series of
 629 SDAG constructs of the following kinds:
 630
 631 \begin{itemize}
 632     \item {\tt forall} loops:
 633     \item {\tt if}, {\tt for}, and {\tt while} statements: these statements have
 634         the same meaning as the normal {\tt if}, {\tt for}, and {\tt while}
 635         loops in sequential \CC programs. This allows the programmer to use
 636         common control flow constructs outside the context of serial blocks.
 637 \end{itemize}
 638
 639 \sdag{} code can be inserted into the .ci file for any array, group, or chare's entry methods.
 640
 641 For more details regarding \sdag{}, look at \examplerefdir{hello/sdag}
 642 }
 643