doc/faq/charm.tex

   1 \section{Basic \charmpp{} Programming}
   2
   3 \subsection{What's the basic programming model for Charm++?}
   4
   5 Parallel objects using "Asynchronous Remote Method Invocation":
   6
   7 \begin{description}
   8 \item[Asynchronous] in that you {\em do not block} until the method returns--the
   9 caller continues immediately.
  10
  11 \item[Remote] in that the two objects may be separated by a network.
  12
  13 \item[Method Invocation] in that it's just C++ classes calling each other's
  14 methods.
  15 \end{description}
  16
  17 \subsection{What is an "entry method"?}
  18
  19 Entry methods are all the methods of a chare where messages can be sent by other chares.
  20 They are declared in the .ci files, and they must be defined as public methods
  21 of the C++ object representing the chare.
  22
  23 \subsection{When I invoke a remote method, do I block until that method returns?}
  24
  25 No! This is one of the biggest differences between Charm++ and most
  26 other "remote procedure call" systems like, Java RMI, or RPC.
  27 "Invoke an asynchronous method" and "send a message" have exactly the same
  28 semantics and implementation.
  29 Since the invoking method does not wait for the remote method to terminate, it
  30 normally cannot receive any return value. (see later for a way to return values)
  31
  32 \subsection{Why does Charm++ use asynchronous methods?}
  33
  34 Asynchronous method invocation is more efficient because it can be
  35 implemented as a single message send. Unlike with synchronous methods,
  36 thread blocking and unblocking and a return message are not needed.
  37
  38 Another big advantage of asynchronous methods is that it's easy to make
  39 things run in parallel. If I execute:
  40 \begin{alltt}
  41 a->foo();
  42 b->bar();
  43 \end{alltt}
  44 Now foo and bar can run at the same time; there's no reason bar has
  45 to wait for foo.
  46
  47 \subsection{Can I make a method synchronous? Can I then return a value?}
  48
  49 Yes. If you want synchronous methods, so the caller will block, use the {\tt [sync]}
  50 keyword before the method in the .ci file. This requires the sender to be a threaded
  51 entry method, as it will be suspended until the callee finishes.
  52 Sync entry methods are allowed to return values to the caller.
  53
  54 \subsection{What is a threaded entry method? How does one make an entry method threaded?}
  55
  56 A threaded entry method is an entry method for a chare that executes
  57 in a separate user-level thread. It is useful when the entry method wants
  58 to suspend itself (for example, to wait for more data). Note that
  59 threaded entry methods have nothing to do with kernel-level threads or
  60 pthreads; they run in user-level threads that are scheduled by Charm++
  61 itself.
  62
  63 In order to make an entry method threaded, one should add the keyword
  64 {\em threaded} withing square brackets after the {\em entry} keyword in the
  65 interface file:
  66 \begin{alltt}
  67 module M \{
  68   chare X \{
  69     entry [threaded] E1(void);
  70   \};
  71 \};
  72 \end{alltt}
  73
  74 \subsection{If I don't want to use threads, how can an asynchronous method return a value?}
  75
  76 The usual way to get data back to
  77 your caller is via another invocation in the opposite direction:
  78 \begin{alltt}
  79 void A::start(void) \{
  80   b->giveMeSomeData();
  81 \}
  82 void B::giveMeSomeData(void) \{
  83   a->hereIsTheData(data);
  84 \}
  85 void A::hereIsTheData(myclass_t data) \{
  86   ...use data somehow...
  87 \}
  88 \end{alltt}
  89 This is contorted, but it exactly matches what the machine has to do.
  90 The difficulty of accessing remote data encourages programmers to use local
  91 data, bundle outgoing requests, and develop higher-level abstractions,
  92 which leads to good performance and good code.
  93
  94 \subsection{Isn't there a better way to send data back to whoever called me?}
  95
  96 The above example is very non-modular, because {\em b} has to know
  97 that {\em a} called it, and what method to call a back on. For
  98 this kind of request/response code, you can abstract away the ``where to
  99 return the data'' with a {\em CkCallback} object:
 100 \begin{alltt}
 101 void A::start(void) \{
 102   b->giveMeSomeData(CkCallback(CkIndex_A::hereIsTheData,thisProxy));
 103 \}
 104 void B::giveMeSomeData(CkCallback returnDataHere) \{
 105   returnDataHere.send(data);
 106 \}
 107 void A::hereIsTheData(myclass_t data) \{
 108   ...use data somehow...
 109 \}
 110 \end{alltt}
 111 Now {\em b} can be called from several different places in {\em a},
 112 or from several different modules.
 113
 114 \subsection{Why should I prefer the callback way to return data rather than using {\tt [sync]} entry methods?}
 115
 116 There are a few reasons for that:
 117
 118 \begin{itemize}
 119
 120 \item
 121 The caller needs to be threaded, which implies some overhead in creating the
 122 thread. Moreover the threaded entry method will suspend waiting for the data,
 123 preventing any code after the remote method invocation to proceed in parallel.
 124
 125 \item
 126 Threaded entry methods are still methods of an object. While they are suspended
 127 other entry methods for the same object (or even the same threaded entry method)
 128 can be called. This allows for potential problems if the suspending method does
 129 leave some objects in an inconsistent state.
 130
 131 \item
 132 Finally, and probably most important, {\tt [sync]} entry methods can only be
 133 used to return a value that can be computed by a single chare. When more
 134 flexibility is needed, such in cases where the resulting value needs to the
 135 contribution of multiple objects, the callback methodology is the only one
 136 available. The caller could for example send a broadcast to a chare array, which
 137 will use a reduction to collect back the results after they have been computed.
 138
 139 \end{itemize}
 140
 141 \subsection{How does the initialization in Charm work?}
 142
 143 Each processor executes the following operations strictly in order:
 144 \begin{enumerate}
 145 \item All methods registered as {\em initnode};
 146 \item All methods registered as {\em initproc};
 147 \item On processor zero, all {\em mainchares} constructor method is invoked (the ones taking a {\tt CkArgMsg*});
 148 \item The read-onlies are propagated from processor zero to all other processors;
 149 \item The nodegroups are created;
 150 \item The groups are created. During this phase, for all the chare arrays have been created with a block allocation, the corresponding array elements are instantiated;
 151 \item Initialization terminated and all messages are available for processing, including the messages responsible for the instantiation of array elements manually inserted.
 152 \end{enumerate}
 153
 154 This implies that you can assume that the previous steps has completely finished
 155 before the next one starts, and any side effect from all the previous steps are
 156 committed (and can therefore be used).
 157
 158 Inside a single step there is no order guarantee. This implies that, for example,
 159 two groups allocated from mainchare can be instantiated in any order. The only
 160 exception to this is processor zero, where chare objects are instantiated
 161 immediately when allocated in the mainchare, i.e if two groups are allocated,
 162 their order is fixed by the allocation order in the mainchare constructing them.
 163 Again, this is only valid for processor zero, and in no other processor this
 164 assumption should be made.
 165
 166 To notice that if array elements are allocated in block (by specifying the
 167 number of elements at the end of the {\tt ckNew} function), they are all
 168 instantiated before normal execution is resumed; if manual insertion is used,
 169 each element can be constructed at any time on its home processor, and not
 170 necessarily before other regular communication messages have been delivered to
 171 other chares (including other array elements part of the same array).
 172
 173 \subsection{Does Charm++ support C and Fortran?}
 174
 175 C and Fortran routines can be called from Charm++ using the usual API conventions for accessing them from C++.  AMPI supports Fortran directly, but direct use
 176 of Charm++ semantics from Fortran is at an immature stage, contact us \htmladdnormallink{charm AT cs.illinois.edu}{mailto:charm AT cs.illinois.edu} if you are interested in pursuing this further.
 177
 178
 179 \subsection{What is a proxy?}
 180
 181 A proxy is a local C++ class that represents a remote C++ class. When
 182 you invoke a method on a proxy, it sends the request across the network
 183 to the real object it represents. In Charm++, all communication is
 184 done using proxies.
 185
 186 A proxy class for each of your classes is generated based on the methods
 187 you list in the .ci file.
 188
 189 \subsection{What are the different ways one can create proxies?}
 190
 191 Proxies can be:
 192 \begin{itemize}
 193 \item
 194 Created using ckNew. This is the only method that actually creates a new
 195 parallel object. "CProxy\_A::ckNew(...)" returns a proxy, as described in
 196 the \htmladdnormallink{manual}{http://charm.cs.uiuc.edu/manuals/html/charm++/}.
 197
 198 \item
 199 Copied from an existing proxy. This happens when you assign two proxies
 200 or send a proxy in a message.
 201
 202 \item
 203 Created from a "handle". This happens when you say "CProxy\_A p=thishandle;"
 204
 205 \item
 206 Created uninitialized. This is the default when you say "CProxy\_A p;".
 207 You'll get a runtime error "proxy has not been initialized" if you try
 208 to use an uninitialized proxy.
 209 \end{itemize}
 210
 211 \subsection{What is wrong if I do {\tt A *ap = new CProxy\_A(handle)}?}
 212
 213 This will not compile, because a {\em CProxy\_A} is not an {\em A}.
 214 What you want is {\em CProxy\_A *ap = new CProxy\_A(handle)}.
 215
 216
 217
 218 %<br>&nbsp;
 219 %<li>
 220 %<b>When sending messages by invoking a method, can we be just in the middle
 221 %of executing another method? I tried to invoke one entry method in one
 222 %object while that target object was in the middle of execution of another
 223 %method, and could not finish until he'd receive the message. Is there something
 224 %wrong with this kind of thinking and can we execute only one method at
 225 %a time? How can I then make two-way communication between methods of two
 226 %objects?</b></li>
 227
 228 %<br>Only one method can execute on a processor at any time. Message sends
 229 %do not interrupt an ongoing execution. Note the lack of <b>blocking receives</b>
 230 %in Charm++.
 231 %<p>The way you implement two-way communication in Charm++ between two objects
 232 %is as follows:
 233 %<p>Object A calls method M on object B. The argument to the method M is
 234 %a message Msg, which contains a field that contains object A's handle (or
 235 %ChareID). Object B's method gets invoked. It constructs a proxy to A using
 236 %A's handle from the message, and invokes a method on A using that proxy.
 237 %<br>&nbsp;
 238
 239 \subsection{Why is the {\em def.h} usually included at the end? Is it
 240 necessary or can I just include it at the beginning?}
 241
 242 You can include the {\em def.h} file once you've actually declared
 243 everything it will reference-- all your chares and readonly variables.
 244 If your chares and readonlies are in your own header files, it is legal
 245 to include the {\em def.h} right away.
 246
 247 However, if the class declaration for a chare isn't visible when you
 248 include the {\em def.h} file, you'll get a confusing compiler error.
 249 This is why we recommend including the {\em def.h} file at the end.
 250
 251 \subsection{How can I use a global variable across different processors?}
 252
 253 Make the global variable "readonly" by declaring it in the .ci file.
 254 Remember also that read-onlies can be safely set only in the mainchare
 255 constructor. Any change after the mainchare constructor has finished will be
 256 local to the processor that made the change. To change a global variable later
 257 in the program, every processor must modify it accordingly (e.g by using a chare
 258 group. Note that chare arrays are not guaranteed to cover all processors)
 259
 260 \subsection{Can I have a class static read-only variable?}
 261
 262 One can have class-static variables as read-onlies. Inside a chare,
 263 group or array declaration in the {\em .ci} file, one can have a readonly
 264 variable declaration. Thus:
 265 \begin{alltt}
 266 chare someChare \{
 267   ...
 268   readonly CkGroupID someGroup;
 269   ...
 270 \};
 271 \end{alltt}
 272 is fine. In the {\em .h} declaration for {\em class someChare},
 273 you will have to put {\em someGroup} as a public static variable,
 274 and you are done.
 275
 276 You then refer to the variable in your program as {\em someChare::someGroup}.
 277
 278 \subsection{How do I measure the time taken by a program or operation?}
 279
 280 You can use {\tt CkWallTimer()} to determine the time on some particular
 281 processor. To time some parallel computation, you need to call CkWallTimer
 282 on some processor, do the parallel computation, then call CkWallTimer again
 283 on the same processor and subtract.
 284
 285 \subsection{What do {\tt CmiAssert} and
 286 {\tt CkAssert} do?}
 287
 288 These are just like the standard C++ {\em assert} calls in {\em \textless assert.h\textgreater}--
 289 they call abort if the condition passed to them is false.
 290
 291 We use our own version rather than the standard version because we have
 292 to call {\em CkAbort}, and because we can turn our asserts off
 293 when {\em --with-production} is used on the build line.  These
 294 assertions are specifically controlled by {\em --enable-error-checking
 295 } or {\em --disable-error-checking}. The {\em --with-production} flag
 296 implies {\em --disable-error-checking}, but it can still be explicitly
 297 enabled with {\em --enable-error-checking}.
 298
 299 \subsection{Can I know how many messages are being sent to a chare?}
 300
 301 No.
 302
 303 There is no nice library to solve this problem, as some messages might be queued
 304 on the receiving processor, some on the sender, and some on the network. You can
 305 still:
 306 \begin{itemize}
 307 \item Send a return receipt message to the sender, and wait until all the
 308 receipts for the messages sent have arrived, then go to a barrier;
 309 \item Do all the sends, then wait for quiescence.
 310 \end{itemize}
 311
 312 \subsection{What is "quiescence"? How does it work?}
 313
 314 Quiescence is When nothing is happening anywhere on the parallel machine.
 315
 316 A low-level background task counts sent and received messages.
 317 When, across the machine, all the messages that have been sent have been
 318 received, and nothing is being processed, quiescence is triggered.
 319
 320 \subsection{Should I use quiescence detection?}
 321
 322 Probably not.
 323
 324 See the \htmladdnormallink{Completion Detection}{http://charm.cs.illinois.edu/manuals/html/charm++/12.html\#SECTION02340000000000000000} section of the manual for instructions on a more local inactivity detection scheme.
 325
 326 In some ways, quiescence is a very strong property (it guarantees {\em nothing}
 327 is happening {\em anywhere}) so if some other library is doing something,
 328 you won't reach quiescence. In other ways, quiescence is a very weak property,
 329 since it doesn't guarantee anything about the state of your application
 330 like a reduction does, only that nothing is happening. Because quiescence
 331 detection is on the one hand so strong it breaks modularity, and on the
 332 other hand is too weak to guarantee anything useful, it's often better
 333 to use something else.
 334
 335 Often global properties can be replaced by much easier-to-compute local
 336 properties. For example, my object could wait until all {\em its} neighbors
 337 have sent it messages (a local property my object can easily detect by
 338 counting message arrivals), rather than waiting until {\em all} neighbor
 339 messages across the whole machine have been sent (a global property that's
 340 difficult to determine). Sometimes a simple reduction is needed instead
 341 of quiescence, which has the benefits of being activated explicitly (each
 342 element of a chare array or chare group has to call contribute) and allows
 343 some data to be collected
 344 at the same time. A reduction is also a few times faster than quiescence
 345 detection. Finally, there are a few situations, such as some tree-search
 346 problems, where quiescence detection is actually the most sensible, efficient
 347 solution.
 348
 349
 350
 351 %<li>
 352 %<b>Can a chare be deleted by using </b><tt>delete this</tt><b>?</b></li>
 353
 354 %<br>You can delete a chare using <tt>delete this;</tt> as long as you do
 355 %not refer to any of its instance variables, or don't send it a message
 356 %after that. <tt>delete this</tt>, by now, is a valid programming construct
 357 %after much debate. The ANSI C++ specification specifically mentions it.
 358 %To delete array elements, use <tt>ckDestroy()</tt> instead of <tt>delete
 359 %this;</tt>.
 360 %<br>&nbsp;
 361 %<li>
 362 %<b>Is there any way to put inheritance in a
 363 %</b><tt>.ci</tt><b> file?</b></li>
 364
 365 %<br>Yes!
 366 %<p>The syntax is exactly like C++, but there's no "public" keyword:
 367 %<pre>array [1D] subArray : parentArray {
 368 %&nbsp; ...the usual...
 369 %};</pre>
 370 %Virtual methods work right away, and entry methods which are declared virtual
 371 %in the .h file are still virtual, even across processors. Multiple inheritance
 372 %works, too. See
 373 %<tt>charm/pgms/charm++/ megatest/inherit.[ihC]</tt> for
 374 %an exhaustive example.
 375 %<br>&nbsp;
 376 %<li>
 377 %<b>Are accumulators supported in Charm++?</b></li>
 378
 379 %<br>No, they are no longer supported. You can get almost exactly the same
 380 %behavior by using a reduction or defining your own group.
 381 %<br>&nbsp;
 382 %<li>
 383 %<b>Can I find out if there are any pending messages for a chare?</b></li>
 384
 385 %<br>No. On a parallel machine, messages destined for a particular chare
 386 %might be queued on the sender, on the network, or queued on the local machine.&nbsp;
 387 %Since the first two are never going to be accessible to you, we didn't
 388 %make the last accessible either.
 389 %<br>&nbsp;</ol>