doc/charm++/machineModel.tex

   1 \section{Machine Model}
   2 \label{machineModel}
   3 \label{sec:machine}
   4 At its basic level, \charmpp{} machine model is very simple: Think of
   5 each chare as a separate processor by itself. The methods of each
   6 chare can access its own instance variables (which are all private, at
   7 this level), and any global variables declared as {\em readonly}. It
   8 also has access to the names of all other chares (the ``global object
   9 space''), but all that it can do with that is to send asynchronous
  10 remote method invocations towards other chare objects. (Of course, the
  11 instance variables can include as many other regular C++ objects that
  12 it ``has''; but no chare objects. It can only have references to other
  13 chare objects).
  14
  15 In accordance with this vision, the first part of the manual (up to
  16 and including the chapter on load balancing) has almost no mention of
  17 entities with physical meanings (cores, nodes, etc.). The runtime
  18 system is responsible for the magic of keeping closely communicating
  19 objects on nearby physical locations, and optimizing communications
  20 within chares on the same node or core by exploiting the physically
  21 available shared memory. The programmer does not have to deal with
  22 this at all. The only exception to this pure model in the basic part
  23 are the functions used for finding out which ``processor'' an object
  24 is running on, and for finding how many total processors are there.
  25
  26 However, for implementing lower level libraries, and certain optimizations,
  27 programmers need to be aware of processors. In any case, it is useful
  28 to understand how the \charmpp{} implementation works under the hood. So,
  29 we describe the machine model, and some associated terminology here.
  30
  31 In terms of physical resources, we assume the parallel machine
  32 consists of one or more {\em nodes}, where a node is a largest unit
  33 over which cache coherent shared memory is feasible (and therefore,
  34 the maximal set of cores per which a single process {\em can} run.
  35 Each node may include one or more processor chips, with shared or
  36 private caches between them. Each chip may contain multiple cores, and
  37 each core may support multiple hardware threads (SMT for example).
  38
  39 \charmpp{} recognizes two logical entities: a PE (processing element) and
  40 a logical node, or simply ``node''. In a \charmpp{} program, a PE is a
  41 unit of mapping and scheduling: each PE has a scheduler with an
  42 associated pool of messages. Each chare is assumed to reside on one PE
  43 at a time. A logical node is implemented as an OS process. In non-SMP mode
  44 there is no distinction between a PE and a logical node. Otherwise, a PE takes
  45 the form of an OS thread, and a logical node may contain one or more PEs.
  46 Physical nodes may be partitioned into one or more logical nodes. Since PEs
  47 within a logical node share the same memory address space, the \charmpp{}
  48 runtime system optimizes communication between them by using shared memory.
  49 Depending on the runtime command-line parameters, a PE may optionally
  50 be associated with a subset of cores or hardware threads.
  51
  52 A \charmpp{} program can be launched with one or more
  53 (logical) nodes per physical node. For example, on a machine with a four-core
  54 processor, where each core has two hardware threads, common configurations in
  55 non-SMP mode would be one node per core (four nodes/PEs total) or one node per
  56 hardware thread (eight nodes/PEs total). In SMP mode, the most common choice to
  57 fully subscribe the physical node would be one logical node containing
  58 {\em seven} PEs--one OS thread is set aside per process for network
  59 communications. (When built in the ``multicore'' mode that lacks network
  60 support, a comm thread is unnecessary, and eight PEs can be used in this case.
  61 A comm thread is also omitted when using some high-performance network layers
  62 such as PAMI.)
  63 Alternatively, one can choose to partition the physical node into multiple
  64 logical nodes, each containing multiple PEs. One example would be {\em three}
  65 PEs per logical node and two logical nodes per physical node, again reserving
  66 a comm thread per logical node.
  67
  68 It is not a general practice in \charmpp{} to oversubscribe the underlying
  69 physical cores or hardware threads on each node. In other words, a
  70 \charmpp{} program is usually not launched with more PEs than there
  71 are physical cores or hardware threads allocated to it. More information about
  72 these launch time options are provided in Appendix~\ref{sec:run}.
  73 And utility functions to retrieve the information about those
  74 \charmpp{} logical machine entities in user programs can be referred
  75 in section~\ref{basic utility fns}.