doc/charm++/loadb.tex

   1 Load balancing in \charmpp{} is enabled by its ability to place, or
   2 migrate, chares or chare array elements.  Typical application usage to
   3 exploit this feature will construct many more chares than processors, and
   4 enable their runtime migration.
   5
   6 Iterative applications, which are commonplace in physical simulations,
   7 are the most suitable target for \charmpp{}'s measurement based load
   8 balancing techniques.  Such applications may contain a series of
   9 time-steps, and/or iterative solvers that run to convergence. For such
  10 computations, typically, the heuristic principle that we call
  11 ``principle of persistence'' holds: the computational loads and
  12 communication patterns between objects (chares) tend to persist over
  13 multiple iterations, even in dynamic applications. In such cases,
  14 the recent past is a good predictor of the near future. Measurement-based
  15 chare migration strategies are useful in this context. Currently these
  16 apply to chare-array elements, but they may be extended to chares in
  17 the future.
  18
  19 For applications without such iterative structure, or with iterative
  20 structure, but without predictability (i.e. where the principle of
  21 persistence does not apply), Charm++ supports ``seed balancers'' that
  22 move ``seeds'' for new chares among processors (possibly repeatedly)
  23 to achieve load balance. These strategies are currently available for
  24 both chares and chare-arrays.  Seed balancers were the original load
  25 balancers provided in Charm since the late 80's. They are extremely
  26 useful for state-space search applications, and are also useful in
  27 other computations, as well as in conjunction with migration
  28 strategies.
  29
  30 For iterative computations when there is a correlation between iterations/steps,
  31 but either it is not strong, or the machine environment is not predictable
  32 (due to noise from OS interrupts on small time steps, or time-shared desktop
  33 machines), one can use a combination of the two kinds of strategies. The
  34 baseline load balancing is provided by migration strategies, but in each
  35 iteration one also spawns off work in the form of chares that can run on any
  36 processor. The seed balancer will handle such work as it arises.
  37
  38 Examples are in \examplerefdir{load\_balancing} and
  39 \testrefdir{load\_balancing}
  40
  41 \section{Measurement-based Object Migration Strategies}
  42 \label{lbFramework}
  43 \label{migrationlb}
  44
  45 In \charmpp{}, objects (except groups, nodegroups) can migrate from
  46 processor to processor at runtime. Object migration can potentially
  47 improve the performance of the parallel program by migrating objects from
  48 overloaded processors to underloaded ones.
  49
  50 %However, it is not
  51 %trivial to decide which objects to move and where to move them in
  52 %order to achieve load balance in a fashion without the knowledge about the
  53 %application. The strategy used in \charmpp{} load balancing framework
  54 %is a measurement-based one.
  55
  56  \charmpp{} implements a generic, measurement-based load balancing framework
  57 which automatically instruments all \charmpp{} objects, collects computation
  58 load and communication structure during execution and stores them into a
  59 \kw{load balancing database}. \charmpp{} then provides a collection of \kw{load
  60 balancing strategies} whose job it is to decide on a new mapping of objects to
  61 processors based on the information from the database.  Such measurement based
  62 strategies are efficient when we can reasonably assume that objects in a
  63 \charmpp{} application tend to exhibit temporal correlation in their
  64 computation and communication patterns, i.e. future can be to some extent
  65 predicted using the historical measurement data, allowing effective
  66 measurement-based load balancing without application-specific knowledge.
  67
  68 Two key terms in the \charmpp{} load balancing framework are:
  69 \begin{itemize}
  70 %
  71 \item \kw{Load balancing database} provides the interface of almost all load
  72 balancing calls. On each processor, it stores the load balancing instrumented
  73 data and coordinates the load balancing manager and balancer. It is implemented
  74 as a Chare Group called \kw{LBDatabase}.
  75 %
  76 \item \kw{Load balancer or strategy} takes the load balancing database and
  77 produces the new mapping of the objects. In \charmpp{}, it is implemented as
  78 Chare Group inherited from BaseLB. Three kinds of schemes are implemented: (a)
  79 centralized load balancers, (b) fully distributed load balancers and (c)
  80 hierarchical load balancers.
  81 %
  82 \end{itemize}
  83
  84 \section{Available Load Balancing Strategies}
  85 \label{lbStrategy}
  86
  87 Load balancing can be performed in either a centralized, a fully distributed,
  88 or an hierarchical fashion.
  89
  90 In centralized approaches, the entire machine's load and communication
  91 structure are accumulated to a single point, typically processor 0, followed by
  92 a decision making process to determine the new distribution of \charmpp
  93 objects. Centralized load balancing requires synchronization which may incur an
  94 overhead and delay. However, due to the fact that the decision process has a
  95 high degree of the knowledge about the entire platform, it tends to be more
  96 accurate.
  97
  98 In distributed approaches, load data is only exchanged among
  99 neighboring processors. There is no global synchronization. However,
 100 they will not, in general, provide an immediate restoration for load balance -
 101 the process is iterated until the load balance can be achieved.
 102
 103 In hierarchical approaches, processors are divided into independent autonomous
 104 sets of processor groups and these groups are organized in hierarchies,
 105 thereby decentralizing the load balancing task. Different strategies can be
 106 used to balance the load on processors inside each processor group, and
 107 processors across groups in a hierarchical fashion.
 108
 109 Listed below are some of the available non-trivial centralized load balancers
 110 and their brief descriptions:
 111 \begin{itemize}
 112 \item {\bf RandCentLB}:   Randomly assigns objects to processors;
 113 %\item {\bf RecBisectBfLB}:        Recursively partition with Breadth first enumeration;
 114 \item {\bf MetisLB}:      Uses METIS\texttrademark\hspace{0mm} to partitioning object communication graph.
 115 \item {\bf GreedyLB}:   Uses a greedy algorithm that always assigns the heaviest object to the least loaded processor.
 116 \item {\bf GreedyCommLB}:  Extends the greedy algorithm to take the communication graph into account.
 117 \item {\bf TopoCentLB}:    Extends the greedy algorithm to take processor topology into account.
 118 \item {\bf RefineLB}:     Moves objects away from the most overloaded processors to reach average, limits the number of objects migrated.
 119 \item {\bf RefineSwapLB}:     Moves objects away from the most overloaded processors
 120 to reach average. In case it cannot migrate an object from an overloaded
 121 processor to an underloaded processor, it swaps objects to reduce the load on
 122 the overloaded processor. This strategy limits the number of objects migrated.
 123 \item {\bf RefineCommLB}:     Same idea as in RefineLB, but takes communication into account.
 124 \item {\bf RefineTopoLB}:       Same idea as in RefineLB, but takes processor topology into account.
 125 \item {\bf BlockLB}: This strategy does a blocked distribution of objects to
 126 processors.
 127 \item {\bf ComboCentLB}:  A special load balancer that can be used to combine any number of centralized load balancers mentioned above.
 128 \end{itemize}
 129
 130 Listed below are the distributed load balancers:
 131 \begin{itemize}
 132 \item {\bf NeighborLB}:   A neighborhood load balancer in which each processor tries to average out its load only among its neighbors.
 133 \item {\bf WSLB}:   A load balancer for workstation clusters, which can detect load changes on desktops (and other timeshared processors) and adjust load without interfering with other's use of the desktop.
 134 \item {\bf DistributedLB}: A load balancer which uses partial information about
 135 underloaded and overloaded processors in the system to do probabilistic transfer
 136 of load. This is a refinement based strategy.
 137 \end{itemize}
 138
 139 An example of a hierarchical strategy can be found in:
 140 \begin{itemize}
 141 \item {\bf HybridLB}: This calls GreedyLB at the lower level and RefineLB at
 142 the root.
 143 \end{itemize}
 144
 145 Users can choose any load balancing strategy they think is appropriate for their
 146 application. The compiler and runtime options are described in
 147 section~\ref{lbOption}.
 148 \linebreak
 149
 150 {\bf Metabalancer to automatically schedule load balancing}
 151
 152 Metabalancer can be invoked to automatically decide when to
 153 invoke the load balancer, given the load-balancing strategy.
 154 Metabalancer uses a linear prediction model
 155 to set the load balancing period based on observed load imbalance.
 156
 157 The runtime option {\em +MetaLB} can be used to invoke this feature to automatically invoke
 158 the load balancing strategy based on the imbalance observed.
 159 This option needs to be specified alongside the {\em +balancer} option to specify the
 160 load balancing strategy to use. Metabalancer relies on the AtSync() calls specified
 161 in Section~\ref{lbarray} to collect load statistics.
 162
 163 {\em +MetaLBModelDir} \verb|<path-to-model>|
 164 can be used to invoke the Metabalancer feature
 165 to automatically decide which load balancing strategy to invoke. A model trained on a
 166 generic representative load imbalance benchmark can be found in
 167  \verb|charm/src/ck-ldb/rf_model|.
 168 Metabalancer makes a decision on which load balancing strategy
 169 to invoke out of a subset of strategies, namely GreedyLB, RefineLB,
 170 HybridLB, DistributedLB, MetisLB and ScotchLB.
 171 For using the model based prediction in Metabalancer, \charmpp{} needs to be built with
 172 all the above load balancing strategies, including ScotchLB that relies
 173 on the external partitioning library SCOTCH specified
 174 in the Section~\ref{lbOption}.
 175
 176
 177 %In some cases, one may need to create and invoke multiple load balancing
 178 %strategies/algorithms at the different phases. \charmpp{} now supports
 179 %multiple load balancers created at runtime. For example, one can use
 180 %an aggressive load balancer such as GreedyRefLB in the first load balancing
 181 %step, and use RefineLB for the later load balancing steps.
 182
 183 \section{Load Balancing Chare Arrays}
 184 \label{lbarray}
 185
 186 The load balancing framework is well integrated with chare array implementation
 187 -- when a chare array is created, it automatically registers its elements with
 188 the load balancing framework. The instrumentation of compute time (WALL/CPU
 189 time) and communication pattern is done automatically and APIs are provided
 190 for users to trigger the load balancing.  To use the load balancer, you must
 191 make your array elements migratable (see migration section above) and choose a
 192 \kw{load balancing strategy} (see the section \ref{lbStrategy} for a
 193 description of available load balancing strategies).
 194
 195 There are three different ways to use load balancing for chare arrays to meet
 196 different needs of the applications. These methods are different in how and
 197 when a load balancing phase starts. The three methods are: {\bf periodic load
 198 balancing mode}, {\bf at sync mode} and {\bf manual mode}.
 199
 200 In {\em periodic load balancing mode}, a user specifies only how often
 201 load balancing is to occur, using +LBPeriod runtime option to specify
 202 the time interval.
 203
 204 In {\em at sync mode}, the application invokes the load balancer
 205 explicitly at appropriate (generally at a pre-existing synchronization
 206 boundary) to trigger load balancing by inserting a function call
 207 (AtSync) in the application source code.
 208
 209 In the prior two load balancing modes, users do not need to worry
 210 about how to start load balancing.  However, in one scenario, those
 211 automatic load balancers will fail to work - when array elements are
 212 created by dynamic insertion.  This is because the above two load
 213 balancing modes require an application to have fixed the number of
 214 objects at the time of load balancing.  The array manager needs to
 215 maintain a head count of local array elements for the local barrier.
 216 In this case, the application must use the {\em manual mode} to
 217 trigger load balancer.
 218
 219 The detailed APIs of these three methods are described as follows:
 220 %
 221 \begin{enumerate}
 222 %
 223 \item {\bf Periodical load balancing mode}: In the default setting, load
 224 balancing happens whenever the array elements are ready, with an interval of 1
 225 second. It is desirable for the application to set a larger interval using
 226 +LBPeriod runtime option. For example ``+LBPeriod 5.0'' can be used to start load
 227 balancing roughly every 5 seconds. By default, array elements may be asked to
 228 migrate at any time, provided that they are not in the middle of executing an
 229 entry method. The array element's variable \kw{usesAtSync} being false
 230 attributes to this default behavior.
 231 %
 232 \item {\bf At sync mode}: Using this method, elements can be migrated only at
 233 certain points in the execution when the application invokes \kw{AtSync()}. In order to use the at
 234 sync mode, one should set \kw{usesAtSync} to true in the array element
 235 constructor.  When an element is ready to migrate, call
 236 \kw{AtSync()}~\footnote{AtSync() is a member function of class ArrayElement}.
 237 When all local elements call \kw{AtSync}, the load balancer is triggered.  Once
 238 all migrations are completed, the load balancer calls the virtual function
 239 \kw{ArrayElement::ResumeFromSync()} on each of the array elements. This
 240 function can be redefined in the application.
 241
 242 Note that the minimum time for \kw{AtSync()} load balancing to occur
 243 is controlled by the LBPeriod.  Unusually high frequency load
 244 balancing (more frequent than 500ms) will perform better if this value
 245 is set via +LBPeriod or \kw{SetLBPeriod()} to a number shorter than your load
 246 balancing interval.
 247
 248 Note that {\em AtSync()} is not a blocking call, it just gives a hint to load
 249 balancing that it is time for load balancing. During the time between {\em
 250 AtSync} and {\em ResumeFromSync}, the object may be migrated. One can choose
 251 to let objects continue working with incoming messages, however keep in mind
 252 the object may suddenly show up in another processor and make sure no
 253 operations that could possibly prevent migration be performed. This is
 254 the automatic way of doing load balancing where the application does not need to define ResumeFromSync().
 255
 256 The more commonly used approach is to force the object to be idle until load
 257 balancing finishes. The user places an AtSync call at the end of some iteration
 258 and when all elements reach that call load balancing is triggered. The objects
 259 can start executing again when \kw{ResumeFromSync()} is called. In this case,
 260 the user redefines ResumeFromSync() to trigger the next iteration of the
 261 application. This manual way of using the at sync mode results in a barrier at
 262 load balancing (see example here~\ref{lbexample}).
 263 %
 264 \item {\bf Manual mode}: The load balancer can be programmed to be started
 265 manually. To switch to the manual mode, the application calls {\em TurnManualLBOn()}
 266 on every processor to prevent the load balancer from starting automatically. {\em
 267 TurnManualLBOn()} should be called as early as possible in the program. It
 268 could be called at the initialization part of the program, for example from a
 269 global variable constructor, or in an initproc call (Section~\ref{initproc}).  It can also be
 270 called in the constructor of a static array or before the {\em
 271 doneInserting} call for a dynamic array.  It can be called multiple times on
 272 one processor, but only the last one takes effect.
 273
 274 The function call {\em CkStartLB()} starts load balancing immediately. This call
 275 should be made at only one place on only one processor. This function is
 276 not blocking, the object will continue to process messages and the load
 277 balancing, when triggered, happens in the background.
 278
 279 {\em TurnManualLBOff()} turns off manual load balancing and switches back to
 280 the automatic load balancing mode.
 281 %
 282 \end{enumerate}
 283
 284 \section{Migrating objects}
 285 \label{lbmigobj}
 286
 287 Load balancers migrate objects automatically.
 288 For an array element to migrate, user can refer to Section~\ref{arraymigratable}
 289 for how to write a ``pup'' for an array element.
 290
 291 In general one needs to pack the whole snapshot of the member data in an
 292 array element in the pup subroutine. This is because the migration of
 293 the object may happen at any time. In certain load balancing schemes where
 294  the user explicitly controls when load balancing occurs, the user may choose
 295 to pack only a part of the data and may skip temporary data.
 296
 297 An array element can migrate by calling the \kw{migrateMe}(\uw{destination
 298 processor}) member function-- this call must be the last action
 299 in an element entry method.  The system can also migrate array elements
 300 for load balancing (see the section~\ref{lbarray}).
 301
 302 To migrate your array element to another processor, the \charmpp{}
 303 runtime will:
 304
 305 \begin{itemize}
 306 \item Call your \kw{ckAboutToMigrate} method
 307 \item Call your \uw{pup} method with a sizing \kw{PUP::er} to determine how
 308 big a message it needs to hold your element.
 309 \item Call your \uw{pup} method again with a packing \kw{PUP::er} to pack
 310 your element into a message.
 311 \item Call your element's destructor (deleting the old copy).
 312 \item Send the message (containing your element) across the network.
 313 \item Call your element's migration constructor on the new processor.
 314 \item Call your \uw{pup} method on with an unpacking \kw{PUP::er} to unpack
 315 the element.
 316 \item Call your \kw{ckJustMigrated} method
 317 \end{itemize}
 318
 319 Migration constructors, then, are normally empty-- all the unpacking
 320 and allocation of the data items is done in the element's \uw{pup} routine.
 321 Deallocation is done in the element destructor as usual.
 322
 323
 324 \section{Other utility functions}
 325
 326 There are several utility functions that can be called in applications to
 327 configure the load balancer, etc. These functions are:
 328
 329 \begin{itemize}
 330 \item {\bf LBTurnInstrumentOn()} and {\bf LBTurnInstrumentOff()}: are plain C
 331       functions to control the load balancing statistics instrumentation
 332       on or off on the calling processor. No implicit broadcast or
 333       synchronization exists in these functions.
 334       Fortran interface: {\bf FLBTURNINSTRUMENTON()} and {\bf FLBTURNINSTRUMENTOFF()}.
 335 \item {\bf setMigratable(bool migratable)}: is a member function of array
 336       element. This function can be called
 337       in an array element constructor to tell the load balancer whether this object
 338       is migratable or not\footnote{Currently not all load balancers
 339       recognize this setting though.}.
 340 \item {\bf LBSetPeriod(double s)}: this function can be called
 341       anywhere (even in Charm++ initnodes or initprocs) to specify
 342       the load balancing period time in seconds.
 343       It tells load balancer not to start next
 344       load balancing in less than $s$ seconds. This can be used to prevent
 345       load balancing from occurring too often in
 346       {\em automatic without sync mode}. Here is how to use it:
 347       \begin{alltt}
 348 // if used in an array element
 349 LBDatabase *lbdb = getLBDB();
 350 lbdb->SetLBPeriod(5.0);
 351
 352 // if used outside of an array element
 353 LBSetPeriod(5.0);
 354 \end{alltt}
 355       Alternatively, one can specify +LBPeriod \{seconds\} at command line.
 356 \end{itemize}
 357
 358 \section{Compiler and runtime options to use load balancing module}
 359 \label{lbOption}
 360
 361 Load balancing strategies are implemented as libraries in \charmpp{}. This
 362 allows programmers to easily experiment with different existing strategies
 363 by simply linking a pool of strategy modules and choosing
 364 one to use at runtime via a command line option.
 365
 366 {\bf Note:} linking a load balancing module is different from activating it:
 367 \begin{itemize}
 368 \item link an LB module: is to link a Load Balancer module(library) at
 369    compile time. You can link against multiple LB libraries as candidates.
 370 \item activate an LB: is to actually ask the runtime to create an LB strategy and
 371    start it. You can only activate load balancers that have been linked at
 372    compile time.
 373 \end{itemize}
 374
 375
 376 Below are the descriptions about the compiler and runtime options:
 377
 378 \begin{enumerate}
 379 \item {\bf compile time options:}
 380
 381 \begin{itemize}
 382 \item {\em -module NeighborLB -module GreedyCommLB ...}  \\
 383   links the modules NeighborLB, GreedyCommLB etc into an application, but these
 384 load balancers will remain inactive at execution time unless overridden by other
 385 runtime options.
 386 \item {\em -module CommonLBs} \\
 387   links a special module CommonLBs which includes some commonly used \charmpp{}
 388 built-in load balancers. The commonly used load balancers include {\tt
 389 DummyLB, GreedyLB, CommLB, RandCentLB, RefineLB, RefineCommLB, RotateLB, DistributedLB, HybridLB, ComboCentLB, RefineSwapLB, NeighborLB, OrbLB, BlockLB, GreedyCommLB}
 390 \item {\em -balancer GreedyCommLB} \\
 391   links the load balancer GreedyCommLB and invokes it at runtime.
 392 \item {\em -balancer GreedyCommLB -balancer RefineLB} \\
 393   invokes GreedyCommLB at the first load balancing step and RefineLB in all
 394 subsequent load balancing steps.
 395 \item {\em -balancer ComboCentLB:GreedyLB,RefineLB}  \\
 396   You can create a new combination load balancer made of multiple
 397 load balancers. In the above example, GreedyLB and RefineLB strategies are
 398 applied one after the other in each load balancing step.
 399 \end{itemize}
 400
 401 The list of existing load balancers is given in Section
 402 \ref{lbStrategy}. Note: you can have multiple -module *LB options. LB
 403 modules are linked into a program, but they are not activated
 404 automatically at runtime.  Using -balancer A at compile time will
 405 activate load balancer A automatically at runtime.  Having -balancer A
 406 implies -module A, so you don't have to write -module A again,
 407 although that is not invalid.  Using CommonLBs is a convenient way to
 408 link against the commonly used existing load balancers.
 409
 410 The SCOTCH-based load balancer(s) use an external partitioning library requiring 3rd party software:
 411
 412 SCOTCH can be downloaded from:
 413 \url{http://www.labri.fr/perso/pelegrin/scotch/}
 414
 415 Use the {\em --incdir and --libdir} build time option to add your installation of any third party libraries you wish to use to the \charmpp{} search paths.
 416
 417 \item {\bf Building individual load balancers}
 418
 419 Load balancers can be built individually by changing the current working directory to the {\em tmp} subdirectory of your build and making them by name.
 420
 421 \begin{alltt}
 422  cd netlrts-linux-x86\_64/tmp
 423  make PhasebyArrayLB
 424 \end{alltt}
 425
 426 \item {\bf Write and use your own load balancer}
 427
 428 Refer Section~\ref{lbWriteNewLB} for writing a new load balancer. Compile it in
 429 the form of library and name it {\em libmoduleFooLB.a} where {\em FooLB} is the
 430 new load balancer. Add the path to the library and link the load balancer into
 431 an application using {\em -module FooLB}.
 432
 433 You can create a library by modifying the Makefile in the following way. This
 434 will create {\em libmoduleFooLB.a}.
 435 \begin{alltt}
 436 libmoduleFooLB.a: FooLB.o
 437   $(CHARMC) -o libmoduleFooLB.a FooLB.o
 438 \end{alltt}
 439
 440 To include this balancer in your application, the Makefile can be changed in the
 441 following way
 442 \begin{alltt}
 443 $(TARGET): $(OBJECTS)
 444   $(CHARMC) -o $(TARGET) -L/path-to-the-lib $(OBJS) -module FooLB
 445 \end{alltt}
 446
 447
 448 \item {\bf runtime options:}
 449
 450 Runtime balancer selection options are similar to the compile time
 451 options as described above, but they can be used to override those
 452 compile time options.
 453
 454 \begin{itemize}
 455 \item {\em +balancer help} \\
 456   displays all available balancers that have been linked in.
 457 \item {\em +balancer GreedyCommLB} \\
 458   invokes GreedyCommLB
 459 \item {\em +balancer GreedyCommLB +balancer RefineLB} \\
 460   invokes GreedyCommLB at the first load balancing step and RefineLB in all
 461 subsequent load balancing steps.
 462 \item {\em +balancer ComboCentLB:GreedyLB,RefineLB}  \\
 463   same as the example in the -balancer compile time option.
 464 \end{itemize}
 465
 466 Note: +balancer option works only if you have already linked the corresponding
 467 load balancers module at compile time.
 468 Giving +balancer with a wrong LB name will result in a runtime error.
 469 When you have used -balancer A as compile time option, you do not need to use
 470 +balancer A again to activate it at runtime. However, you can
 471 use +balancer B to override the compile time option and choose to
 472 activate B instead of A.
 473
 474 \item {\bf Handling the case that no load balancer is activated by users}
 475
 476 When no balancer is linked by users,
 477 but the program counts on a load balancer because it used {\em AtSync()}
 478 and expect {\em ResumeFromSync()} to be called to continue,
 479 a special load balancer called {\em NullLB} will be
 480 automatically created to run the program.
 481 This default load balancer calls {\em ResumeFromSync()} after {\em AtSync()}.
 482 It keeps a program from hanging after calling {\em AtSync()}.
 483 {\em NullLB} will be suppressed if another load balancer is created.
 484
 485 \item {\bf Other useful runtime options}
 486
 487 There are a few other runtime options for load balancing that may be useful:
 488
 489 \begin{itemize}
 490 \item {\em +LBDebug \{verbose level\}} \\
 491      \{verbose level\} can be any positive integer number. 0 is to turn off the verbose.
 492      This option asks load balancer to output load balancing information to stdout.
 493      The bigger the verbose level is, the more verbose the output is.
 494 \item {\em +LBPeriod \{seconds\}} \\
 495      \{Seconds\} can be any float number. This option sets the minimum period time in
 496 seconds between two consecutive load balancing steps. The default value is
 497 1 second. That is to say that a load balancing step will not happen until
 498 1 second after the last load balancing step.
 499 \item {\em +LBSameCpus} \\
 500      This option simply tells load balancer that all processors are of same speed.
 501      The load balancer will then skip the measurement of CPU speed at runtime. This is the default.
 502 \item {\em +LBTestPESpeed} \\
 503      This option tells the load balancer to test the speed of all processors at runtime.
 504      The load balancer may use this measurement to perform speed-aware load balancing.
 505 \item {\em +LBObjOnly} \\
 506      This tells load balancer to ignore processor background load when making migration decisions.
 507 \item {\em +LBSyncResume} \\
 508      After load balancing step, normally a processor can resume computation
 509 once all objects are received on that processor, even when other processors
 510 are still working on migrations.  If this turns out to be a problem,
 511 that is when some processors start working on computation while the other
 512 processors are still busy migrating objects, then this option can be used to force
 513 a global barrier on all processors to make sure that processors can only resume
 514 computation after migrations are completed on all processors.
 515 \item {\em +LBOff} \\
 516      This option turns off load balancing instrumentation
 517      of both CPU and communication usage at startup time.
 518 \item {\em +LBCommOff} \\
 519      This option turns off load balancing instrumentation of communication at startup time.
 520      The instrument of CPU usage is left on.
 521 \end{itemize}
 522
 523 \end{enumerate}
 524
 525 \section{Seed load balancers - load balancing Chares at creation time}
 526 \label{seedlb}
 527
 528 Seed load balancing involves the movement of object creation messages, or
 529 "seeds", to create a balance of work across a set of processors.
 530 This seed load balancing scheme is used to balance chares  at creation time.
 531 After the chare constructor is executed on a processor, the seed balancer does not
 532 migrate it.
 533 %the seed load balancer. The measurement based load balancer described in
 534 %previous subsection perform the task of moving chares during work to achieve
 535 %load balance.
 536 Depending on the movement strategy, several seed load balancers are available now.
 537 Examples can be found \examplerefdir{NQueen}.
 538 \begin{enumerate}
 539 \item {\em random}\\
 540  A strategy that places seeds randomly when they are created and does
 541 no movement of seeds thereafter. This is used as the default seed
 542 load balancer.
 543 \item {\em neighbor}\\
 544  A strategy which imposes a virtual topology on the processors,
 545  load exchange happens among neighbors only. The overloaded processors
 546  initiate the load balancing and send work to its neighbors
 547  when it becomes overloaded. The default topology is mesh2D, one can use
 548  command line option to choose other topology such as ring, mesh3D and
 549  dense graph.
 550 \item {\em spray}\\
 551  A strategy which imposes a spanning tree organization on the processors,
 552  results in communication via global reduction among all processors
 553  to compute global average load via periodic reduction.
 554  It uses averaging of loads to determine how seeds should be
 555 distributed.
 556 \item  {\em workstealing} \\
 557  A strategy that the idle processor requests a random processor and steal
 558  chares.
 559 \end{enumerate}
 560
 561 Other strategies can also be explored by following the simple API of the
 562 seed load balancer.
 563 \linebreak
 564
 565 \zap{
 566 {\bf Seed load balancers for Chares:}
 567
 568 Seed load balancers can be directly used for load balancing Chares.
 569 The default seed load balancer which is always linked is the random seed load balancer.
 570 Users can choose another strategy listed above and link as a plugin
 571 module into binary as described below.
 572
 573 {\bf Seed load balancers for Array Elements:}
 574
 575 Seed load balancers can also be used for array elements in the same way
 576 as they are used for individual chares.
 577 Chare array is a collection of individual Chares in Charm++.
 578 Since Chare Array has its internal strategy of static mapping of individual
 579 array elements to processors using {\em CkArrayMap}~\ref{array map}~\footnote{by default it always distributed array elements to processors in Round-Robin fashion unless a different CkArrayMap is used},
 580 a special CkArrayMap called {\em CldMap} must be created and passed into
 581 array creation calls to interface with seed load balancer.
 582
 583 For creating an empty array and then inserting chares into it, the API is as follows:
 584
 585 \begin{alltt}
 586   CkArrayOptions opt;
 587   CkGroupID cldmapID = CProxy_CldMap::ckNew();
 588   opt.setMap(cldmapID);
 589   CProxy_WorkUnit arr = CProxy_WorkUnit::ckNew(param, opt);
 590   for (int i=0; i<numChares; i++)
 591     arr[i].insert(param);
 592 \end{alltt}
 593
 594 For initially populating the array with chares at time of creation the API is as follows:
 595 \begin{alltt}
 596   CkArrayOptions opt(numChares);
 597   CkGroupID cldmapID = CProxy_CldMap::ckNew();
 598   opt.setMap(cldmapID);
 599   CProxy_WorkUnit arr = CProxy_WorkUnit::ckNew(param, opt);
 600 \end{alltt}
 601
 602 The details about array creation are explained in section~\ref{advanced arrays} of the manual.
 603
 604 } % end zap
 605
 606
 607 {\bf Compile and run time options for seed load balancers}
 608
 609
 610 To choose a seed load balancer other than the default {\em rand} strategy,
 611 use link time command line option {\bf -balance foo}.
 612
 613 When using {\rm neighbor} seed load balancer, one can also specify
 614 the virtual topology at runtime. Use {\bf +LBTopo topo}, where {\em topo}
 615 can be one of: (a) ring, (b) mesh2d, (c) mesh3d and (d) graph.
 616
 617 To write a seed load balancer, name your file as {\em cldb.foo.c},
 618 where {\em foo} is the strategy name.  Compile it in the form of library
 619 under charm/lib, named as {\em libcldb-foo.a}, where {\em foo} is the strategy
 620 name used above. Now one can use {\bf -balance foo} as compile time option
 621 to {\bf charmc} to link with the {\em foo} seed load balancer.
 622
 623 \section{Simple Load Balancer Usage Example - Automatic with Sync LB}
 624 \label{lbexample}
 625
 626 A simple example of how to use a load balancer in sync mode in one's
 627 application is presented below.
 628
 629 \begin{alltt}
 630 /*** lbexample.ci ***/
 631 mainmodule lbexample \{
 632   readonly CProxy_Main mainProxy;
 633   readonly int nElements;
 634
 635   mainchare Main \{
 636     entry Main(CkArgMsg *m);
 637     entry void done(void);
 638   \};
 639
 640   array [1D] LBExample \{
 641     entry LBExample(void);
 642     entry void doWork();
 643   \};
 644 \};
 645 \end{alltt}
 646
 647 --------------------------------------------------------------------------------
 648
 649 \begin{alltt}
 650 /*** lbexample.C ***/
 651 #include <stdio.h>
 652 #include "lbexample.decl.h"
 653
 654 /*readonly*/ CProxy_Main mainProxy;
 655 /*readonly*/ int nElements;
 656
 657 #define MAX_WORK_CNT 50
 658 #define LB_INTERVAL 5
 659
 660 /*mainchare*/
 661 class Main : public CBase_Main
 662 \{
 663 private:
 664   int count;
 665 public:
 666   Main(CkArgMsg* m)
 667   \{
 668     /*....Initialization....*/
 669     mainProxy = thisProxy;
 670     CProxy_LBExample arr = CProxy_LBExample::ckNew(nElements);
 671     arr.doWork();
 672   \};
 673
 674   void done(void)
 675   \{
 676     count++;
 677     if(count==nElements)\{
 678       CkPrintf("All done");
 679       CkExit();
 680     \}
 681   \};
 682 \};
 683
 684 /*array [1D]*/
 685 class LBExample : public CBase_LBExample
 686 \{
 687 private:
 688   int workcnt;
 689 public:
 690   LBExample()
 691   \{
 692     workcnt=0;
 693     /* May initialize some variables to be used in doWork */
 694     //Must be set to true to make AtSync work
 695     usesAtSync = true;
 696   \}
 697
 698   LBExample(CkMigrateMessage *m) \{ /* Migration constructor -- invoked when chare migrates */ \}
 699
 700   /* Must be written for migration to succeed */
 701   void pup(PUP::er &p)\{
 702     p|workcnt;
 703     /* There may be some more variables used in doWork */
 704   \}
 705
 706   void doWork()
 707   \{
 708     /* Do work proportional to the chare index to see the effects of LB */
 709
 710     workcnt++;
 711     if(workcnt==MAX_WORK_CNT)
 712       mainProxy.done();
 713
 714     if(workcnt\%LB_INTERVAL==0)
 715       AtSync();
 716     else
 717       doWork();
 718   \}
 719
 720   void ResumeFromSync()\{
 721     doWork();
 722   \}
 723 \};
 724
 725 #include "lbexample.def.h"
 726 \end{alltt}