Bug #1267: Integrate METIS graph partitioning library
[charm.git] / doc / charm++ / loadb.tex
blobee9c0172315c38e0a53b262383bcf4cf244eb2b6
1 Load balancing in \charmpp{} is enabled by its ability to place, or
2 migrate, chares or chare array elements. Typical application usage to
3 exploit this feature will construct many more chares than processors, and
4 enable their runtime migration.
6 Iterative applications, which are commonplace in physical simulations,
7 are the most suitable target for \charmpp{}'s measurement based load
8 balancing techniques. Such applications may contain a series of
9 time-steps, and/or iterative solvers that run to convergence. For such
10 computations, typically, the heuristic principle that we call
11 ``principle of persistence'' holds: the computational loads and
12 communication patterns between objects (chares) tend to persist over
13 multiple iterations, even in dynamic applications. In such cases,
14 the recent past is a good predictor of the near future. Measurement-based
15 chare migration strategies are useful in this context. Currently these
16 apply to chare-array elements, but they may be extended to chares in
17 the future.
19 For applications without such iterative structure, or with iterative
20 structure, but without predictability (i.e. where the principle of
21 persistence does not apply), Charm++ supports ``seed balancers'' that
22 move ``seeds'' for new chares among processors (possibly repeatedly)
23 to achieve load balance. These strategies are currently available for
24 both chares and chare-arrays. Seed balancers were the original load
25 balancers provided in Charm since the late 80's. They are extremely
26 useful for state-space search applications, and are also useful in
27 other computations, as well as in conjunction with migration
28 strategies.
30 For iterative computations when there is a correlation between iterations/steps,
31 but either it is not strong, or the machine environment is not predictable
32 (due to noise from OS interrupts on small time steps, or time-shared desktop
33 machines), one can use a combination of the two kinds of strategies. The
34 baseline load balancing is provided by migration strategies, but in each
35 iteration one also spawns off work in the form of chares that can run on any
36 processor. The seed balancer will handle such work as it arises.
38 Examples are in \examplerefdir{load\_balancing} and
39 \testrefdir{load\_balancing}
41 \section{Measurement-based Object Migration Strategies}
42 \label{lbFramework}
43 \label{migrationlb}
45 In \charmpp{}, objects (except groups, nodegroups) can migrate from
46 processor to processor at runtime. Object migration can potentially
47 improve the performance of the parallel program by migrating objects from
48 overloaded processors to underloaded ones.
50 %However, it is not
51 %trivial to decide which objects to move and where to move them in
52 %order to achieve load balance in a fashion without the knowledge about the
53 %application. The strategy used in \charmpp{} load balancing framework
54 %is a measurement-based one.
56 \charmpp{} implements a generic, measurement-based load balancing framework
57 which automatically instruments all \charmpp{} objects, collects computation
58 load and communication structure during execution and stores them into a
59 \kw{load balancing database}. \charmpp{} then provides a collection of \kw{load
60 balancing strategies} whose job it is to decide on a new mapping of objects to
61 processors based on the information from the database. Such measurement based
62 strategies are efficient when we can reasonably assume that objects in a
63 \charmpp{} application tend to exhibit temporal correlation in their
64 computation and communication patterns, i.e. future can be to some extent
65 predicted using the historical measurement data, allowing effective
66 measurement-based load balancing without application-specific knowledge.
68 Two key terms in the \charmpp{} load balancing framework are:
69 \begin{itemize}
71 \item \kw{Load balancing database} provides the interface of almost all load
72 balancing calls. On each processor, it stores the load balancing instrumented
73 data and coordinates the load balancing manager and balancer. It is implemented
74 as a Chare Group called \kw{LBDatabase}.
76 \item \kw{Load balancer or strategy} takes the load balancing database and
77 produces the new mapping of the objects. In \charmpp{}, it is implemented as
78 Chare Group inherited from BaseLB. Three kinds of schemes are implemented: (a)
79 centralized load balancers, (b) fully distributed load balancers and (c)
80 hierarchical load balancers.
82 \end{itemize}
84 \section{Available Load Balancing Strategies}
85 \label{lbStrategy}
87 Load balancing can be performed in either a centralized, a fully distributed,
88 or an hierarchical fashion.
90 In centralized approaches, the entire machine's load and communication
91 structure are accumulated to a single point, typically processor 0, followed by
92 a decision making process to determine the new distribution of \charmpp
93 objects. Centralized load balancing requires synchronization which may incur an
94 overhead and delay. However, due to the fact that the decision process has a
95 high degree of the knowledge about the entire platform, it tends to be more
96 accurate.
98 In distributed approaches, load data is only exchanged among
99 neighboring processors. There is no global synchronization. However,
100 they will not, in general, provide an immediate restoration for load balance -
101 the process is iterated until the load balance can be achieved.
103 In hierarchical approaches, processors are divided into independent autonomous
104 sets of processor groups and these groups are organized in hierarchies,
105 thereby decentralizing the load balancing task. Different strategies can be
106 used to balance the load on processors inside each processor group, and
107 processors across groups in a hierarchical fashion.
109 Listed below are some of the available non-trivial centralized load balancers
110 and their brief descriptions:
111 \begin{itemize}
112 \item {\bf RandCentLB}: Randomly assigns objects to processors;
113 %\item {\bf RecBisectBfLB}: Recursively partition with Breadth first enumeration;
114 \item {\bf MetisLB}: Uses METIS\texttrademark\hspace{0mm} to partitioning object communication graph.
115 \item {\bf GreedyLB}: Uses a greedy algorithm that always assigns the heaviest object to the least loaded processor.
116 \item {\bf GreedyCommLB}: Extends the greedy algorithm to take the communication graph into account.
117 \item {\bf TopoCentLB}: Extends the greedy algorithm to take processor topology into account.
118 \item {\bf RefineLB}: Moves objects away from the most overloaded processors to reach average, limits the number of objects migrated.
119 \item {\bf RefineSwapLB}: Moves objects away from the most overloaded processors
120 to reach average. In case it cannot migrate an object from an overloaded
121 processor to an underloaded processor, it swaps objects to reduce the load on
122 the overloaded processor. This strategy limits the number of objects migrated.
123 \item {\bf RefineCommLB}: Same idea as in RefineLB, but takes communication into account.
124 \item {\bf RefineTopoLB}: Same idea as in RefineLB, but takes processor topology into account.
125 \item {\bf BlockLB}: This strategy does a blocked distribution of objects to
126 processors.
127 \item {\bf ComboCentLB}: A special load balancer that can be used to combine any number of centralized load balancers mentioned above.
128 \end{itemize}
130 Listed below are the distributed load balancers:
131 \begin{itemize}
132 \item {\bf NeighborLB}: A neighborhood load balancer in which each processor tries to average out its load only among its neighbors.
133 \item {\bf WSLB}: A load balancer for workstation clusters, which can detect load changes on desktops (and other timeshared processors) and adjust load without interfering with other's use of the desktop.
134 \item {\bf DistributedLB}: A load balancer which uses partial information about
135 underloaded and overloaded processors in the system to do probabilistic transfer
136 of load. This is a refinement based strategy.
137 \end{itemize}
139 An example of a hierarchical strategy can be found in:
140 \begin{itemize}
141 \item {\bf HybridLB}: This calls GreedyLB at the lower level and RefineLB at
142 the root.
143 \end{itemize}
145 Users can choose any load balancing strategy they think is appropriate for their
146 application. The compiler and runtime options are described in
147 section~\ref{lbOption}.
148 \linebreak
150 {\bf Metabalancer to automatically schedule load balancing}
152 Metabalancer can be invoked to automatically decide when to
153 invoke the load balancer, given the load-balancing strategy.
154 Metabalancer uses a linear prediction model
155 to set the load balancing period based on observed load imbalance.
157 The runtime option {\em +MetaLB} can be used to invoke this feature to automatically invoke
158 the load balancing strategy based on the imbalance observed.
159 This option needs to be specified alongside the {\em +balancer} option to specify the
160 load balancing strategy to use. Metabalancer relies on the AtSync() calls specified
161 in Section~\ref{lbarray} to collect load statistics.
163 {\em +MetaLBModelDir} \verb|<path-to-model>|
164 can be used to invoke the Metabalancer feature
165 to automatically decide which load balancing strategy to invoke. A model trained on a
166 generic representative load imbalance benchmark can be found in
167 \verb|charm/src/ck-ldb/rf_model|.
168 Metabalancer makes a decision on which load balancing strategy
169 to invoke out of a subset of strategies, namely GreedyLB, RefineLB,
170 HybridLB, DistributedLB, MetisLB and ScotchLB.
171 For using the model based prediction in Metabalancer, \charmpp{} needs to be built with
172 all the above load balancing strategies, including ScotchLB that relies
173 on the external partitioning library SCOTCH specified
174 in the Section~\ref{lbOption}.
177 %In some cases, one may need to create and invoke multiple load balancing
178 %strategies/algorithms at the different phases. \charmpp{} now supports
179 %multiple load balancers created at runtime. For example, one can use
180 %an aggressive load balancer such as GreedyRefLB in the first load balancing
181 %step, and use RefineLB for the later load balancing steps.
183 \section{Load Balancing Chare Arrays}
184 \label{lbarray}
186 The load balancing framework is well integrated with chare array implementation
187 -- when a chare array is created, it automatically registers its elements with
188 the load balancing framework. The instrumentation of compute time (WALL/CPU
189 time) and communication pattern is done automatically and APIs are provided
190 for users to trigger the load balancing. To use the load balancer, you must
191 make your array elements migratable (see migration section above) and choose a
192 \kw{load balancing strategy} (see the section \ref{lbStrategy} for a
193 description of available load balancing strategies).
195 There are three different ways to use load balancing for chare arrays to meet
196 different needs of the applications. These methods are different in how and
197 when a load balancing phase starts. The three methods are: {\bf periodic load
198 balancing mode}, {\bf at sync mode} and {\bf manual mode}.
200 In {\em periodic load balancing mode}, a user specifies only how often
201 load balancing is to occur, using +LBPeriod runtime option to specify
202 the time interval.
204 In {\em at sync mode}, the application invokes the load balancer
205 explicitly at appropriate (generally at a pre-existing synchronization
206 boundary) to trigger load balancing by inserting a function call
207 (AtSync) in the application source code.
209 In the prior two load balancing modes, users do not need to worry
210 about how to start load balancing. However, in one scenario, those
211 automatic load balancers will fail to work - when array elements are
212 created by dynamic insertion. This is because the above two load
213 balancing modes require an application to have fixed the number of
214 objects at the time of load balancing. The array manager needs to
215 maintain a head count of local array elements for the local barrier.
216 In this case, the application must use the {\em manual mode} to
217 trigger load balancer.
219 The detailed APIs of these three methods are described as follows:
221 \begin{enumerate}
223 \item {\bf Periodical load balancing mode}: In the default setting, load
224 balancing happens whenever the array elements are ready, with an interval of 1
225 second. It is desirable for the application to set a larger interval using
226 +LBPeriod runtime option. For example ``+LBPeriod 5.0'' can be used to start load
227 balancing roughly every 5 seconds. By default, array elements may be asked to
228 migrate at any time, provided that they are not in the middle of executing an
229 entry method. The array element's variable \kw{usesAtSync} being false
230 attributes to this default behavior.
232 \item {\bf At sync mode}: Using this method, elements can be migrated only at
233 certain points in the execution when the application invokes \kw{AtSync()}. In order to use the at
234 sync mode, one should set \kw{usesAtSync} to true in the array element
235 constructor. When an element is ready to migrate, call
236 \kw{AtSync()}~\footnote{AtSync() is a member function of class ArrayElement}.
237 When all local elements call \kw{AtSync}, the load balancer is triggered. Once
238 all migrations are completed, the load balancer calls the virtual function
239 \kw{ArrayElement::ResumeFromSync()} on each of the array elements. This
240 function can be redefined in the application.
242 Note that the minimum time for \kw{AtSync()} load balancing to occur
243 is controlled by the LBPeriod. Unusually high frequency load
244 balancing (more frequent than 500ms) will perform better if this value
245 is set via +LBPeriod or \kw{SetLBPeriod()} to a number shorter than your load
246 balancing interval.
248 Note that {\em AtSync()} is not a blocking call, it just gives a hint to load
249 balancing that it is time for load balancing. During the time between {\em
250 AtSync} and {\em ResumeFromSync}, the object may be migrated. One can choose
251 to let objects continue working with incoming messages, however keep in mind
252 the object may suddenly show up in another processor and make sure no
253 operations that could possibly prevent migration be performed. This is
254 the automatic way of doing load balancing where the application does not need to define ResumeFromSync().
256 The more commonly used approach is to force the object to be idle until load
257 balancing finishes. The user places an AtSync call at the end of some iteration
258 and when all elements reach that call load balancing is triggered. The objects
259 can start executing again when \kw{ResumeFromSync()} is called. In this case,
260 the user redefines ResumeFromSync() to trigger the next iteration of the
261 application. This manual way of using the at sync mode results in a barrier at
262 load balancing (see example here~\ref{lbexample}).
264 \item {\bf Manual mode}: The load balancer can be programmed to be started
265 manually. To switch to the manual mode, the application calls {\em TurnManualLBOn()}
266 on every processor to prevent the load balancer from starting automatically. {\em
267 TurnManualLBOn()} should be called as early as possible in the program. It
268 could be called at the initialization part of the program, for example from a
269 global variable constructor, or in an initproc call (Section~\ref{initproc}). It can also be
270 called in the constructor of a static array or before the {\em
271 doneInserting} call for a dynamic array. It can be called multiple times on
272 one processor, but only the last one takes effect.
274 The function call {\em CkStartLB()} starts load balancing immediately. This call
275 should be made at only one place on only one processor. This function is
276 not blocking, the object will continue to process messages and the load
277 balancing, when triggered, happens in the background.
279 {\em TurnManualLBOff()} turns off manual load balancing and switches back to
280 the automatic load balancing mode.
282 \end{enumerate}
284 \section{Migrating objects}
285 \label{lbmigobj}
287 Load balancers migrate objects automatically.
288 For an array element to migrate, user can refer to Section~\ref{arraymigratable}
289 for how to write a ``pup'' for an array element.
291 In general one needs to pack the whole snapshot of the member data in an
292 array element in the pup subroutine. This is because the migration of
293 the object may happen at any time. In certain load balancing schemes where
294 the user explicitly controls when load balancing occurs, the user may choose
295 to pack only a part of the data and may skip temporary data.
297 An array element can migrate by calling the \kw{migrateMe}(\uw{destination
298 processor}) member function-- this call must be the last action
299 in an element entry method. The system can also migrate array elements
300 for load balancing (see the section~\ref{lbarray}).
302 To migrate your array element to another processor, the \charmpp{}
303 runtime will:
305 \begin{itemize}
306 \item Call your \kw{ckAboutToMigrate} method
307 \item Call your \uw{pup} method with a sizing \kw{PUP::er} to determine how
308 big a message it needs to hold your element.
309 \item Call your \uw{pup} method again with a packing \kw{PUP::er} to pack
310 your element into a message.
311 \item Call your element's destructor (deleting the old copy).
312 \item Send the message (containing your element) across the network.
313 \item Call your element's migration constructor on the new processor.
314 \item Call your \uw{pup} method on with an unpacking \kw{PUP::er} to unpack
315 the element.
316 \item Call your \kw{ckJustMigrated} method
317 \end{itemize}
319 Migration constructors, then, are normally empty-- all the unpacking
320 and allocation of the data items is done in the element's \uw{pup} routine.
321 Deallocation is done in the element destructor as usual.
324 \section{Other utility functions}
326 There are several utility functions that can be called in applications to
327 configure the load balancer, etc. These functions are:
329 \begin{itemize}
330 \item {\bf LBTurnInstrumentOn()} and {\bf LBTurnInstrumentOff()}: are plain C
331 functions to control the load balancing statistics instrumentation
332 on or off on the calling processor. No implicit broadcast or
333 synchronization exists in these functions.
334 Fortran interface: {\bf FLBTURNINSTRUMENTON()} and {\bf FLBTURNINSTRUMENTOFF()}.
335 \item {\bf setMigratable(bool migratable)}: is a member function of array
336 element. This function can be called
337 in an array element constructor to tell the load balancer whether this object
338 is migratable or not\footnote{Currently not all load balancers
339 recognize this setting though.}.
340 \item {\bf LBSetPeriod(double s)}: this function can be called
341 anywhere (even in Charm++ initnodes or initprocs) to specify
342 the load balancing period time in seconds.
343 It tells load balancer not to start next
344 load balancing in less than $s$ seconds. This can be used to prevent
345 load balancing from occurring too often in
346 {\em automatic without sync mode}. Here is how to use it:
347 \begin{alltt}
348 // if used in an array element
349 LBDatabase *lbdb = getLBDB();
350 lbdb->SetLBPeriod(5.0);
352 // if used outside of an array element
353 LBSetPeriod(5.0);
354 \end{alltt}
355 Alternatively, one can specify +LBPeriod \{seconds\} at command line.
356 \end{itemize}
358 \section{Compiler and runtime options to use load balancing module}
359 \label{lbOption}
361 Load balancing strategies are implemented as libraries in \charmpp{}. This
362 allows programmers to easily experiment with different existing strategies
363 by simply linking a pool of strategy modules and choosing
364 one to use at runtime via a command line option.
366 {\bf Note:} linking a load balancing module is different from activating it:
367 \begin{itemize}
368 \item link an LB module: is to link a Load Balancer module(library) at
369 compile time. You can link against multiple LB libraries as candidates.
370 \item activate an LB: is to actually ask the runtime to create an LB strategy and
371 start it. You can only activate load balancers that have been linked at
372 compile time.
373 \end{itemize}
376 Below are the descriptions about the compiler and runtime options:
378 \begin{enumerate}
379 \item {\bf compile time options:}
381 \begin{itemize}
382 \item {\em -module NeighborLB -module GreedyCommLB ...} \\
383 links the modules NeighborLB, GreedyCommLB etc into an application, but these
384 load balancers will remain inactive at execution time unless overridden by other
385 runtime options.
386 \item {\em -module CommonLBs} \\
387 links a special module CommonLBs which includes some commonly used \charmpp{}
388 built-in load balancers. The commonly used load balancers include {\tt
389 DummyLB, GreedyLB, CommLB, RandCentLB, RefineLB, RefineCommLB, RotateLB, DistributedLB, HybridLB, ComboCentLB, RefineSwapLB, NeighborLB, OrbLB, BlockLB, GreedyCommLB}
390 \item {\em -balancer GreedyCommLB} \\
391 links the load balancer GreedyCommLB and invokes it at runtime.
392 \item {\em -balancer GreedyCommLB -balancer RefineLB} \\
393 invokes GreedyCommLB at the first load balancing step and RefineLB in all
394 subsequent load balancing steps.
395 \item {\em -balancer ComboCentLB:GreedyLB,RefineLB} \\
396 You can create a new combination load balancer made of multiple
397 load balancers. In the above example, GreedyLB and RefineLB strategies are
398 applied one after the other in each load balancing step.
399 \end{itemize}
401 The list of existing load balancers is given in Section
402 \ref{lbStrategy}. Note: you can have multiple -module *LB options. LB
403 modules are linked into a program, but they are not activated
404 automatically at runtime. Using -balancer A at compile time will
405 activate load balancer A automatically at runtime. Having -balancer A
406 implies -module A, so you don't have to write -module A again,
407 although that is not invalid. Using CommonLBs is a convenient way to
408 link against the commonly used existing load balancers.
410 The SCOTCH-based load balancer(s) use an external partitioning library requiring 3rd party software:
412 SCOTCH can be downloaded from:
413 \url{http://www.labri.fr/perso/pelegrin/scotch/}
415 Use the {\em --incdir and --libdir} build time option to add your installation of any third party libraries you wish to use to the \charmpp{} search paths.
417 \item {\bf Building individual load balancers}
419 Load balancers can be built individually by changing the current working directory to the {\em tmp} subdirectory of your build and making them by name.
421 \begin{alltt}
422 cd netlrts-linux-x86\_64/tmp
423 make PhasebyArrayLB
424 \end{alltt}
426 \item {\bf Write and use your own load balancer}
428 Refer Section~\ref{lbWriteNewLB} for writing a new load balancer. Compile it in
429 the form of library and name it {\em libmoduleFooLB.a} where {\em FooLB} is the
430 new load balancer. Add the path to the library and link the load balancer into
431 an application using {\em -module FooLB}.
433 You can create a library by modifying the Makefile in the following way. This
434 will create {\em libmoduleFooLB.a}.
435 \begin{alltt}
436 libmoduleFooLB.a: FooLB.o
437 $(CHARMC) -o libmoduleFooLB.a FooLB.o
438 \end{alltt}
440 To include this balancer in your application, the Makefile can be changed in the
441 following way
442 \begin{alltt}
443 $(TARGET): $(OBJECTS)
444 $(CHARMC) -o $(TARGET) -L/path-to-the-lib $(OBJS) -module FooLB
445 \end{alltt}
448 \item {\bf runtime options:}
450 Runtime balancer selection options are similar to the compile time
451 options as described above, but they can be used to override those
452 compile time options.
454 \begin{itemize}
455 \item {\em +balancer help} \\
456 displays all available balancers that have been linked in.
457 \item {\em +balancer GreedyCommLB} \\
458 invokes GreedyCommLB
459 \item {\em +balancer GreedyCommLB +balancer RefineLB} \\
460 invokes GreedyCommLB at the first load balancing step and RefineLB in all
461 subsequent load balancing steps.
462 \item {\em +balancer ComboCentLB:GreedyLB,RefineLB} \\
463 same as the example in the -balancer compile time option.
464 \end{itemize}
466 Note: +balancer option works only if you have already linked the corresponding
467 load balancers module at compile time.
468 Giving +balancer with a wrong LB name will result in a runtime error.
469 When you have used -balancer A as compile time option, you do not need to use
470 +balancer A again to activate it at runtime. However, you can
471 use +balancer B to override the compile time option and choose to
472 activate B instead of A.
474 \item {\bf Handling the case that no load balancer is activated by users}
476 When no balancer is linked by users,
477 but the program counts on a load balancer because it used {\em AtSync()}
478 and expect {\em ResumeFromSync()} to be called to continue,
479 a special load balancer called {\em NullLB} will be
480 automatically created to run the program.
481 This default load balancer calls {\em ResumeFromSync()} after {\em AtSync()}.
482 It keeps a program from hanging after calling {\em AtSync()}.
483 {\em NullLB} will be suppressed if another load balancer is created.
485 \item {\bf Other useful runtime options}
487 There are a few other runtime options for load balancing that may be useful:
489 \begin{itemize}
490 \item {\em +LBDebug \{verbose level\}} \\
491 \{verbose level\} can be any positive integer number. 0 is to turn off the verbose.
492 This option asks load balancer to output load balancing information to stdout.
493 The bigger the verbose level is, the more verbose the output is.
494 \item {\em +LBPeriod \{seconds\}} \\
495 \{Seconds\} can be any float number. This option sets the minimum period time in
496 seconds between two consecutive load balancing steps. The default value is
497 1 second. That is to say that a load balancing step will not happen until
498 1 second after the last load balancing step.
499 \item {\em +LBSameCpus} \\
500 This option simply tells load balancer that all processors are of same speed.
501 The load balancer will then skip the measurement of CPU speed at runtime. This is the default.
502 \item {\em +LBTestPESpeed} \\
503 This option tells the load balancer to test the speed of all processors at runtime.
504 The load balancer may use this measurement to perform speed-aware load balancing.
505 \item {\em +LBObjOnly} \\
506 This tells load balancer to ignore processor background load when making migration decisions.
507 \item {\em +LBSyncResume} \\
508 After load balancing step, normally a processor can resume computation
509 once all objects are received on that processor, even when other processors
510 are still working on migrations. If this turns out to be a problem,
511 that is when some processors start working on computation while the other
512 processors are still busy migrating objects, then this option can be used to force
513 a global barrier on all processors to make sure that processors can only resume
514 computation after migrations are completed on all processors.
515 \item {\em +LBOff} \\
516 This option turns off load balancing instrumentation
517 of both CPU and communication usage at startup time.
518 \item {\em +LBCommOff} \\
519 This option turns off load balancing instrumentation of communication at startup time.
520 The instrument of CPU usage is left on.
521 \end{itemize}
523 \end{enumerate}
525 \section{Seed load balancers - load balancing Chares at creation time}
526 \label{seedlb}
528 Seed load balancing involves the movement of object creation messages, or
529 "seeds", to create a balance of work across a set of processors.
530 This seed load balancing scheme is used to balance chares at creation time.
531 After the chare constructor is executed on a processor, the seed balancer does not
532 migrate it.
533 %the seed load balancer. The measurement based load balancer described in
534 %previous subsection perform the task of moving chares during work to achieve
535 %load balance.
536 Depending on the movement strategy, several seed load balancers are available now.
537 Examples can be found \examplerefdir{NQueen}.
538 \begin{enumerate}
539 \item {\em random}\\
540 A strategy that places seeds randomly when they are created and does
541 no movement of seeds thereafter. This is used as the default seed
542 load balancer.
543 \item {\em neighbor}\\
544 A strategy which imposes a virtual topology on the processors,
545 load exchange happens among neighbors only. The overloaded processors
546 initiate the load balancing and send work to its neighbors
547 when it becomes overloaded. The default topology is mesh2D, one can use
548 command line option to choose other topology such as ring, mesh3D and
549 dense graph.
550 \item {\em spray}\\
551 A strategy which imposes a spanning tree organization on the processors,
552 results in communication via global reduction among all processors
553 to compute global average load via periodic reduction.
554 It uses averaging of loads to determine how seeds should be
555 distributed.
556 \item {\em workstealing} \\
557 A strategy that the idle processor requests a random processor and steal
558 chares.
559 \end{enumerate}
561 Other strategies can also be explored by following the simple API of the
562 seed load balancer.
563 \linebreak
565 \zap{
566 {\bf Seed load balancers for Chares:}
568 Seed load balancers can be directly used for load balancing Chares.
569 The default seed load balancer which is always linked is the random seed load balancer.
570 Users can choose another strategy listed above and link as a plugin
571 module into binary as described below.
573 {\bf Seed load balancers for Array Elements:}
575 Seed load balancers can also be used for array elements in the same way
576 as they are used for individual chares.
577 Chare array is a collection of individual Chares in Charm++.
578 Since Chare Array has its internal strategy of static mapping of individual
579 array elements to processors using {\em CkArrayMap}~\ref{array map}~\footnote{by default it always distributed array elements to processors in Round-Robin fashion unless a different CkArrayMap is used},
580 a special CkArrayMap called {\em CldMap} must be created and passed into
581 array creation calls to interface with seed load balancer.
583 For creating an empty array and then inserting chares into it, the API is as follows:
585 \begin{alltt}
586 CkArrayOptions opt;
587 CkGroupID cldmapID = CProxy_CldMap::ckNew();
588 opt.setMap(cldmapID);
589 CProxy_WorkUnit arr = CProxy_WorkUnit::ckNew(param, opt);
590 for (int i=0; i<numChares; i++)
591 arr[i].insert(param);
592 \end{alltt}
594 For initially populating the array with chares at time of creation the API is as follows:
595 \begin{alltt}
596 CkArrayOptions opt(numChares);
597 CkGroupID cldmapID = CProxy_CldMap::ckNew();
598 opt.setMap(cldmapID);
599 CProxy_WorkUnit arr = CProxy_WorkUnit::ckNew(param, opt);
600 \end{alltt}
602 The details about array creation are explained in section~\ref{advanced arrays} of the manual.
604 } % end zap
607 {\bf Compile and run time options for seed load balancers}
610 To choose a seed load balancer other than the default {\em rand} strategy,
611 use link time command line option {\bf -balance foo}.
613 When using {\rm neighbor} seed load balancer, one can also specify
614 the virtual topology at runtime. Use {\bf +LBTopo topo}, where {\em topo}
615 can be one of: (a) ring, (b) mesh2d, (c) mesh3d and (d) graph.
617 To write a seed load balancer, name your file as {\em cldb.foo.c},
618 where {\em foo} is the strategy name. Compile it in the form of library
619 under charm/lib, named as {\em libcldb-foo.a}, where {\em foo} is the strategy
620 name used above. Now one can use {\bf -balance foo} as compile time option
621 to {\bf charmc} to link with the {\em foo} seed load balancer.
623 \section{Simple Load Balancer Usage Example - Automatic with Sync LB}
624 \label{lbexample}
626 A simple example of how to use a load balancer in sync mode in one's
627 application is presented below.
629 \begin{alltt}
630 /*** lbexample.ci ***/
631 mainmodule lbexample \{
632 readonly CProxy_Main mainProxy;
633 readonly int nElements;
635 mainchare Main \{
636 entry Main(CkArgMsg *m);
637 entry void done(void);
640 array [1D] LBExample \{
641 entry LBExample(void);
642 entry void doWork();
645 \end{alltt}
647 --------------------------------------------------------------------------------
649 \begin{alltt}
650 /*** lbexample.C ***/
651 #include <stdio.h>
652 #include "lbexample.decl.h"
654 /*readonly*/ CProxy_Main mainProxy;
655 /*readonly*/ int nElements;
657 #define MAX_WORK_CNT 50
658 #define LB_INTERVAL 5
660 /*mainchare*/
661 class Main : public CBase_Main
663 private:
664 int count;
665 public:
666 Main(CkArgMsg* m)
668 /*....Initialization....*/
669 mainProxy = thisProxy;
670 CProxy_LBExample arr = CProxy_LBExample::ckNew(nElements);
671 arr.doWork();
674 void done(void)
676 count++;
677 if(count==nElements)\{
678 CkPrintf("All done");
679 CkExit();
684 /*array [1D]*/
685 class LBExample : public CBase_LBExample
687 private:
688 int workcnt;
689 public:
690 LBExample()
692 workcnt=0;
693 /* May initialize some variables to be used in doWork */
694 //Must be set to true to make AtSync work
695 usesAtSync = true;
698 LBExample(CkMigrateMessage *m) \{ /* Migration constructor -- invoked when chare migrates */ \}
700 /* Must be written for migration to succeed */
701 void pup(PUP::er &p)\{
702 p|workcnt;
703 /* There may be some more variables used in doWork */
706 void doWork()
708 /* Do work proportional to the chare index to see the effects of LB */
710 workcnt++;
711 if(workcnt==MAX_WORK_CNT)
712 mainProxy.done();
714 if(workcnt\%LB_INTERVAL==0)
715 AtSync();
716 else
717 doWork();
720 void ResumeFromSync()\{
721 doWork();
725 #include "lbexample.def.h"
726 \end{alltt}