CHANGES

   1 This file describes the most significant changes. For more detail, use
   2 'git log' on a clone of the charm repository.
   3
   4 ================================================================================
   5 What's new in Charm++ 6.8.2
   6 ================================================================================
   7
   8 This is a minor release containing only the following changes on top of 6.8.1:
   9
  10 - Fix for a crash in memory deregistration on the OFI communication layer in SMP mode.
  11
  12 - Tuned eager/rendezvous messaging thresholds for the PAMI communication layer
  13   on POWER8 systems.
  14
  15 ================================================================================
  16 What's new in Charm++ 6.8.1
  17 ================================================================================
  18
  19 This is a backwards-compatible patch/bug-fix release. Roughly 100 bug
  20 fixes, improvements, and cleanups have been applied across the entire
  21 system. Notable changes are described below:
  22
  23 General System Improvements
  24
  25 - Enable network- and node-topology-aware trees for group and chare
  26   array reductions and broadcasts
  27
  28 - Add a message receive 'fast path' for quicker array element lookup
  29
  30 - Feature #1434: Optimize degenerate CkLoop cases
  31
  32 - Fix a rare race condition in Quiescence Detection that could allow
  33   it to fire prematurely (bug #1658)
  34   * Thanks to Nikhil Jain (LLNL) and Karthik Senthil for isolating
  35     this in the Quicksilver proxy application
  36
  37 - Fix various LB bugs
  38   * Fix RefineSwapLB to properly handle non-migratable objects
  39   * GreedyRefine: improvements for concurrent=false and HybridLB integration
  40   * Bug #1649: NullLB shouldnt wait for LB period
  41
  42 - Fix Projections tracing bug #1437: CkLoop work traces to the
  43   previous entry on the PE rather than to the caller
  44
  45 - Modify [aggregate] entry method (TRAM) support to only deliver
  46   PE-local messages inline for [inline]-annotated methods. This avoids
  47   the potential for excessively deep recursion that could overrun
  48   thread stacks.
  49
  50 - Fix various compilation warnings
  51
  52 Platform Support
  53
  54 - Improve experimental support for PAMI network layer on POWER8 Linux platforms
  55   * Thanks to Sameer Kumar of IBM for contributing these patches
  56
  57 - Add an experimental 'ofi' network layer to run on Intel Omni-Path
  58   hardware using libfabric
  59   * Thanks to Yohann Burette and Mikhail Shiryaev of Intel for
  60     contributing this new network layer
  61
  62 - The GNI network layer (used on Cray XC/XK/XE systems) now respects
  63   the ++quiet command line argument during startup
  64
  65 AMPI Improvements
  66
  67 - Support for MPI_IN_PLACE in all collectives and for persistent requests
  68
  69 - Improved Alltoall(v,w) implementations
  70
  71 - AMPI now passes all MPICH-3.2 tests for groups, virtual topologies, and infos
  72
  73 - Fixed Isomalloc to not leave behind mapped memory when migrating off a PE
  74
  75 ================================================================================
  76 What's new in Charm++ 6.8.0
  77 ================================================================================
  78
  79 Over 900 bug fixes, improvements, and cleanups have been applied
  80 across the entire system. Major changes are described below:
  81
  82 Charm++ Features
  83
  84 - Calls to entry methods taking a single fixed-size parameter can now
  85   automatically be aggregated and routed through the TRAM library by
  86   marking them with the [aggregate] attribute.
  87
  88 - Calls to parameter-marshalled entry methods with large array
  89   arguments can ask for asynchronous zero-copy send behavior with an
  90   `nocopy' tag in the parameter's declaration.
  91
  92 - The runtime system now integrates an OpenMP runtime library so that
  93   code using OpenMP parallelism will dispatch work to idle worker
  94   threads within the Charm++ process.
  95
  96 - Applications can ask the runtime system to perform automatic
  97   high-level end-of-run performance analysis by linking with the
  98   `-tracemode perfReport' option.
  99
 100 - Added a new dynamic remapping/load-balancing strategy,
 101   GreedyRefineLB, that offers high result quality and well bounded
 102   execution time.
 103
 104 - Improved and expanded topology-aware spanning tree generation
 105   strategies, including support for runs on a torus with holes, such
 106   as Blue Waters and other Cray XE/XK systems.
 107
 108 - Charm++ programs can now define their own main() function, rather
 109   than using a generated implementation from a mainmodule/mainchare
 110   combination. This extends the existing Charm++/MPI interoperation
 111   feature.
 112
 113 - Improvements to Sections:
 114
 115   * Array sections API has been simplified, with array sections being
 116   automatically delegated to CkMulticastMgr (the most efficient implementation
 117   in Charm++). Changes are reflected in Chapter 14 of the manual.
 118
 119   * Group sections can now be delegated to CkMulticastMgr (improved performance
 120   compared to default implementation). Note that they have to be manually
 121   delegated. Documentation is in Chapter 14 of Charm++ manual.
 122
 123   * Group section reductions are now supported for delegated sections
 124   via CkMulticastMgr.
 125
 126   * Improved performance of section creation in CkMulticastMgr.
 127
 128   * CkMulticastMgr uses the improved spanning tree strategies. See above.
 129
 130 - GPU manager now creates one instance per OS process and scales the
 131   pre-allocated memory pool size according to the GPU memory size and
 132   number of GPU manager instances on a physical node.
 133
 134 - Several GPU Manager API changes including:
 135
 136   * Replaced references to global variables in the GPU manager API with calls to
 137   functions.
 138
 139   * The user is no longer required to specify a bufferID in dataInfo struct.
 140
 141   * Replaced calls to kernelSelect with direct invocation of functions passed
 142   via the work request object (allows CUDA to be built with all programs).
 143
 144 - Added support for malleable jobs that can dynamically shrink and
 145   expand the set of compute nodes hosting Charm++ processes.
 146
 147 - Greatly expanded and improved reduction operations:
 148
 149   * Added built-in reductions for all logical and bitwise operations
 150   on integer and boolean input.
 151
 152   * Reductions over groups and chare arrays that apply commutative,
 153   associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now
 154   processed in a streaming fashion. This reduces the memory footprint of
 155   reductions. User-defined reductions can opt into this mode as well.
 156
 157   * Added a new `Tuple' reducer that allows combining multiple reductions
 158   of different input data and operations from a common set of source
 159   objects to a single target callback.
 160
 161   * Added a new `Summary Statistics' reducer that provides count, mean,
 162   and standard deviation using a numerically-stable streaming algorithm.
 163
 164 - Added a `++quiet' option to suppress charmrun and charm++ non-error
 165   messages at startup.
 166
 167 - Calls to chare array element entry methods with the [inline] tag now
 168   avoid copying their arguments when the called method takes its
 169   parameters by const&, offering a substantial reduction in overhead in
 170   those cases.
 171
 172 - Synchronous entry methods that block until completion (marked with
 173   the [sync] attribute) can now return any type that defines a PUP
 174   method, rather than only message types.
 175
 176 - Static (non-generated) header files are now warning-free for
 177   gcc -Wall -Wextra -pedantic.
 178
 179 - Deprecated setReductionClient and CkSetReductionClient in favor of
 180   explicitly passing callbacks to contribute calls.
 181
 182 - On C++ standard library implementations with support for
 183   std::is_constructible (e.g. GCC libstdc++ >4.5), chare array
 184   elements only need to define a constructor taking CkMigrateMessage*
 185   if it will actually be migrated.
 186
 187 - The PUP serialization framework gained support for some C++11
 188   library classes, including unique_ptr and unordered_map, when the
 189   underlying types have PUP operators.
 190
 191 AMPI Features
 192
 193 - More efficient implementations of message matching infrastructure, multiple
 194   completion routines, and all varieties of reductions and gathers.
 195
 196 - Support for user-defined non-commutative reductions, MPI_BOTTOM, cancelling
 197   receive requests, MPI_THREAD_FUNNELED, PSCW synchronization for RMA, and more.
 198
 199 - Fixes to AMPI's extensions for load balancing and to Isomalloc on SMP builds.
 200
 201 - More robust derived datatype support, optimizations for truly contiguous types.
 202
 203 - ROMIO is now built on AMPI and linked in by ampicc by default.
 204
 205 - A version of HDF5 v1.10.1 that builds and runs on AMPI with virtualization
 206   is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi
 207
 208 - Improved support for performance analysis and visualization with Projections.
 209
 210 Platforms and Portability
 211
 212 - The runtime system code now requires compiler support for C++11
 213   R-value references and move constructors. This is not expected to be
 214   incompatible with any currently supported compilers.
 215
 216 - The next feature release (anticipated to be 6.9.0 or 7.0) will require
 217   full C++11 support from the compiler and standard library.
 218
 219 - Added support for IBM POWER8 systems with the PAMI communication API,
 220   such as development/test platforms for the upcoming Sierra and Summit
 221   supercomputers at LLNL and ORNL. Contributed by Sameer Kumar of IBM.
 222
 223 - Mac OS (darwin) builds now default to the modern libc++ standard
 224   library instead of the older libstdc++.
 225
 226 - Blue Gene/Q build targets have been added for the `bgclang' compiler.
 227
 228 - Charm++ can now be built on Cray's CCE 8.5.4+.
 229
 230 - Charm++ will now build without custom configuration on Arch Linux
 231
 232 - Charmrun can automatically detect rank and node count from
 233   Slurm/srun environment variables.
 234
 235 - Many obsolete architecture, network, and compiler support files have
 236   been removed. These include:
 237   * IBM Blue Gene/P
 238   * Sony/Toshiba/IBM Cell (including PlayStation 3)
 239   * Cray XT
 240   * Intel IA-64 (Itanium)
 241   * Intel x86-32 for Windows, Mac OS X (darwin), and Solaris
 242   * Cygwin for Windows
 243   * Older IBM AIX/POWER configurations
 244   * GCC 3 and KAI compilers
 245   * Sun/Oracle Solaris
 246
 247 ================================================================================
 248 What's new in Charm++ 6.7.1
 249 ================================================================================
 250
 251 Changes in this release are primarily bug fixes for 6.7.0. The major exception
 252 is AMPI, which has seen changes to its extension APIs and now complies with more
 253 of the MPI standard. A brief list of changes follows:
 254
 255 Charm++ Bug Fixes
 256
 257 - Startup and exit sequences are more robust
 258
 259 - Error and warning messages are generally more informative
 260
 261 - CkMulticast's set and concat reducers work correctly
 262
 263 AMPI Features
 264
 265 - AMPI's extensions have been renamed to use the prefix AMPI_ instead of MPI_
 266   and to generally follow MPI's naming conventions
 267
 268 - AMPI_Migrate(MPI_Info) is now used for dynamic load balancing and all fault
 269   tolerance schemes (see the AMPI manual)
 270
 271 - AMPI officially supports MPI-2.2, and also implements the non-blocking
 272   collectives and neighborhood collectives from MPI-3.1
 273
 274 Platforms and Portability
 275
 276 - Cray regularpages build target has been fixed
 277
 278 - Clang compiler target for BlueGene/Q systems added
 279
 280 - Comm. thread tracing for SMP mode added
 281
 282 - AMPI's compiler wrappers are easier to use with autoconf and cmake
 283
 284 ================================================================================
 285 What's new in Charm++ 6.7.0
 286 ================================================================================
 287
 288 Over 120 bugs fixed, spanning areas across the entire system
 289
 290 Charm++ Features
 291
 292 - New API for efficient formula-based distributed sparse array creation
 293
 294 - CkLoop is now built by default
 295
 296 - CBase_Foo::pup need not be called from Foo::pup in user code anymore - runtime
 297   code handles this automatically
 298
 299 - Error reporting and recovery in .ci files is greatly improved, providing more
 300   precise line numbers and often column information
 301
 302 - Many data races occurring under shared-memory builds (smp, multicore) were
 303   fixed, facilitating use of tools like ThreadSanitizer and Helgrind
 304
 305 AMPI Enhancements
 306
 307 - Further MPI standard compliance in AMPI allows users to build and run
 308   Hypre-2.10.1 on AMPI with virtualization, migration, etc.
 309
 310 - Improved AMPI Fortran2003 PUP interface 'apup', similar to C++'s STL PUP
 311
 312 Platforms and Portability
 313
 314 - Compiling Charm++ now requires support for C++11 variadic templates. In GCC,
 315   this became available with version 4.3, released in 2008
 316
 317 - New machine target for multicore Linux ARM7: multicore-linux-arm7
 318
 319 - Preliminary support for POWER8 processors, in preparation for the upcoming
 320   Summit and Sierra supercomputers
 321
 322 - The charmrun process launcher is now much more robust in the face of slow
 323   or rate-limited connections to compute nodes
 324
 325 - PXSHM now auto-detects the node size, so the '+nodesize' is no longer needed
 326
 327 - Out-of-tree builds are now supported
 328
 329 Deprecations
 330
 331 - CommLib has been removed.
 332
 333 - CmiBool has been dropped in favor of C++'s bool
 334
 335
 336 ================================================================================
 337 What's new in Charm++ 6.6.1
 338 ================================================================================
 339
 340 Changes in this release are primarily bug fixes for 6.6.0. A concise list of
 341 affected components follows:
 342
 343 - CkIO
 344
 345 - Reductions with syncFT
 346
 347 - mpicxx based MPI builds
 348
 349 - Increased support for macros in CI file
 350
 351 - GNI + RDMA related communication
 352
 353 - MPI_STATUSES_IGNORE support for AMPIF
 354
 355 - Restart on different node count with chkpt
 356
 357 - Immediate msgs on multicore builds
 358
 359 ================================================================================
 360 What's new in Charm++ 6.6.0
 361 ================================================================================
 362
 363 - Machine target files for Cray XC systems ('gni-crayxc') have been added
 364
 365 - Interoperability with MPI code using native communication interfaces on Blue
 366   Gene Q (PAMI) and Cray XE/XK/XC (uGNI) systems, in addition to the universal
 367   MPI communication interface
 368
 369 - Support for partitioned jobs on all machine types, including TCP/IP and IB
 370   Verbs networks using 'netlrts' and 'verbs' machine layers
 371
 372 - A substantially improved version of our asynchronous library, CkIO, for
 373   parallel output of large files
 374
 375 - Narrowing the circumstances in which the runtime system will send
 376   overhead-inducing ReductionStarting messages
 377
 378 - A new fully distributed load balancing strategy, DistributedLB, that produces
 379   high quality results with very low latency
 380
 381 - An API for applications to feed custom per-object data to specialized load
 382   balancing strategies (e.g. physical simulation coordinates)
 383
 384 - SMP builds on LRTS-based machine layers (pamilrts, gni, mpi, netlrts, verbs)
 385   support tracing messages through communication threads
 386
 387 - Thread affinity mapping with +pemap now supports Intel's Hyperthreading more
 388   conveniently
 389
 390 - After restarting from a checkpoint, thread affinity will use new
 391   +pemap/+commap arguments
 392
 393 - Queue order randomization options were added to assist in debugging race
 394   conditions in application and runtime code
 395
 396 - The full runtime code and associated libraries can now compile under the C11
 397   and C++11/14 standards.
 398
 399 - Numerous bug fixes, performance enhancements, and smaller improvements in the
 400   provided runtime facilities
 401
 402 - Deprecations
 403   * The long-unsupported FEM library has been deprecated in favor of ParFUM
 404   * The CmiBool typedefs have been deleted, as C++ bool has long been universal
 405   * Future versions of the runtime system and libraries will require some degree
 406     of support for C++11 features from compilers
 407
 408 ================================================================================
 409 What's new in Charm++ 6.5.0
 410 ================================================================================
 411
 412 - The Charm++ manual has been thoroughly revised to improve its organization,
 413   comprehensiveness, and clarity, with many additional example code snippets
 414   throughout.
 415
 416 - The runtime system now includes the 'Metabalancer', which can provide
 417   substantial performance improvements for applications that exhibit dynamic
 418   load imbalance. It provides two primary benefits. First, it automatically
 419   optimizes the frequency of load balancer invocation, to avoid work stoppage
 420   when it will provide too little benefit. Second, calls to AtSync() are made
 421   less synchronous, to further reduce overhead when the load balancer doesn't
 422   need to run. To activate the Metabalancer, pass the option +MetaLB at
 423   runtime. To get the full benefits, calls to AtSync() should be made at every
 424   iteration, rather than at some arbitrary longer interval as was previously
 425   common.
 426
 427 - Many feature additions and usability improvements have been made in the
 428   interface translator that generates code from .ci files:
 429   * Charmxi now provides much better error reports, including more accurate
 430     line numbers and clearer reasons for failure, including some semantic
 431     problems that would otherwise appear when compiling the C++ code or even at
 432     runtime.
 433   * A new SDAG construct 'case' has been added that defines a disjunction over a
 434     set of 'when' clauses: only one 'when' out of a set will ever be triggered.
 435   * Entry method templates are now supported. An example program can be found
 436     in tests/charm++/method_templates/.
 437   * SDAG keyword "atomic" has been deprecated in favor of the newly supported
 438     keyword "serial". The two are synonymous, but "atomic" is now provided only
 439     for backward compatibility.
 440   * It is no longer necessary to call __sdag_init() in chares that contain SDAG
 441     code - the generated code does this automatically. The function is left as
 442     a no-op for compatibility, but may be removed in a future version.
 443   * Code generated from .ci files is now primarily in .def.h files, with only
 444     declarations in .decl.h. This improves debugging, speeds compilation,
 445     provides clearer compiler output, and enables more complete encapsulation,
 446     especially in SDAG code.
 447   * Mainchare constructors are expected to take CkArgMsg*, and always have
 448     been. However, charmxi would allow declarations with no argument, and
 449     assume the message. This is now deprecated, and generates a warning.
 450
 451 - Projections tracing has been extended and improved in various ways
 452   * The trace module can generate a record of network topology of the nodes in
 453     a run for certain platforms (including Cray), which Projections can
 454     visualize.
 455   * If the gzip library (libz) is available when Charm++ is compiled, traces
 456     are compressed by default.
 457   * If traces were flushed as a results of filled buffers during the run, a
 458     warning will be printed at exit to indicate that the user should be wary of
 459     interference that may have resulted.
 460   * In SMP builds, it is now possible to trace message progression through the
 461     communication threads. This is disabled by default to avoid overhead and
 462     potential misleading interpretation.
 463
 464 - Array elements can be block-mapped at the SMP node level instead of at the
 465   per-PE level (option "+useNodeBlkMapping").
 466
 467 - AMPI can now privatize global and static variables using TLS. This is
 468   supported in C and C++ with __thread markings on the variable declarations
 469   and definitions, and in Fortran with a patched version of the gfortran
 470   compiler. To activate this feature, append '-tls' to the '-thread' option's
 471   argument when you link your AMPI program.
 472
 473 - Charm can now be built to only support message priorities of a specific data
 474   type. This enables an optimized message queue within the the runtime
 475   system. Typical applications with medium sized compute grains may not benefit
 476   noticeably when switching to the new scheduler. However, this may permit
 477   further optimizations in later releases.
 478
 479   The new queue is enabled by specifying the data type of the message
 480   priorities while building charm using --with-prio-type=dtype. Here, dtype can
 481   be one of char, short, int, long, float, double and bitvec. Specifying bitvec
 482   will permit arbitrary-length bitvector priorities, and is the current default
 483   mode of operation. However, we may change this in a future release.
 484
 485 - Converse now provides a complete set of wrappers for
 486   fopen/fread/fwrite/fclose to handle EINTR, which is not uncommon on the
 487   increasingly-popular Lustre. They are named CmiF{open,read,write,close}, and
 488   are available from C and C++ code.
 489
 490 - The utility class 'CkEntryOptions' now permits method chaining for cleaner
 491   usage. This applies to all its set methods (setPriority, setQueueing,
 492   setGroupDepID). Example usage can be found in examples/charm++/prio/pgm.C.
 493
 494 - When creating groups or chare arrays that depend on the previous construction
 495   of another such entity on the local PE, it is now possible to declare that
 496   dependence to the runtime. Creation messages whose dependence is not yet
 497   satisfied will be buffered until it is.
 498
 499 - For any given chare class Foo and entry method Bar, the supporting class's
 500   member CkIndex_Foo::Bar() is used to lookup/specify the entry method
 501   index. This release adds a newer API for such members where the argument is a
 502   function pointer of the same signature as the entry method. Those new
 503   functions are used like CkIndex_Foo::idx_Bar(&Foo::Bar). This permits entry
 504   point index lookup without instantiating temporary variables just to feed the
 505   CkIndex_Foo::Bar() methods. In cases where Foo::Bar is overloaded, &Foo::Bar
 506   must be cast to the desired type to disambiguate it.
 507
 508 - CkReduction::reducerType now have PUP methods defined; and can hence be
 509   passed as parameter-marshalled arguments to entry methods.
 510
 511 - The runtime option +stacksize for controlling the allocation of user-level
 512   threads' stacks now accepts shorthanded annotation such as 1M.
 513
 514 - The -optimize flag to the charmc compiler wrapper now passes more aggressive
 515   options to the various underlying compilers than the previous '-O'.
 516
 517 - The charmc compiler wrapper now provides a flag -use-new-std to enable
 518   support for C11 and C++11 where available. To use this in application code,
 519   the runtime system must have been built with that flag as well.
 520
 521 - When using, CmiMemoryUsage(), the runtime can be instructed not to use the
 522   underlying mallinfo() library call, which can be inaccurate in settings where
 523   usage exceeds INT_MAX. This is accomplished by setting the environment
 524   variable "MEMORYUSAGE_NO_MALLINFO".
 525
 526 - Experimental Features
 527   * Initial implementation of a fast message-logging protocol. Use option
 528     'mlogft' to build it.
 529   * Message compression support for persistent message on Gemini machine layer.
 530   * Node-level inter-PE loop/task parallelization is now supported through
 531     CkLoop
 532   * New temperature/CPU frequency aware load balancer
 533   * Support interoperation of Charm++ and native MPI code through dynamically
 534     switching control between the two
 535   * API in centralized load balancers to get and set PE speed
 536   * A new scheme for optimization of double in-memory checkpoint/restart.
 537   * Message combining library for improved fine-grained communication
 538     performance
 539   * Support for partitioning of allocated nodes into subsets that run
 540     independent Charm++ instances but can interact with each other.
 541
 542 Platform-Specific Changes
 543 -------------------------
 544
 545 - Cray XE/XK
 546   * The gemini_gni network layer has been heavily tuned and optimized,
 547     providing substantial improvements in performance, scalability, and
 548     stability.
 549   * The gemini_gni-crayxe machine layer supports a 'hugepages' option at build
 550     time, rather than requiring manual configuration file editing.
 551   * Persistent message optimizations can be used to reduce latency and
 552     overheads
 553   * Experimental support for 'urgent' sends, which are sent ahead of any other
 554     outgoing messages queued for transmission.
 555
 556 - IBM Blue Gene Q: Experimental machine-layer support for the native PAMI
 557   interface and MPI, with and without SMP support. This supports many new
 558   systems, including LLNL's Sequoia, ALCF's Mira, and FZ Juelich's Juqueen.
 559
 560   There are three network-layer implementations for these systems: 'mpi',
 561   'pami', and 'pamilrts'. The 'mpi' layer is stable, but its performance and
 562   scalability suffers from the additional overhead of using MPI rather than
 563   driving the interconnect directly. The 'pami' layer is well tested for NAMD,
 564   but has shown instability for other applications. It is likely to be replaced
 565   by the 'pamilrts' layer, which is more generally stable and seems to provide
 566   the same performance, in the next release.
 567
 568   In addition to the common 'smp' option to build the runtime system with
 569   shared memory support, there is an 'async' option which sometimes provides
 570   better performance on SMP builds. This option passes tests on 'pamilrts', but
 571   is still experimental.
 572
 573   Note: Applications that have large number of messages may crash in default
 574   setup due to overflow in the low-level FIFOs. Environment variables
 575   MUSPI_INJFIFOSIZE and PAMI_RGETINJFIFOSIZE can be set to avoid application
 576   failures due to large number of small and large messages respectively. The
 577   default value of these variable is 65536 which is sufficient for 1000
 578   messages in flight.
 579
 580 - Infiniband Verbs: Better support for more flavors of ibverbs libraries
 581
 582 - MPI Network Layer
 583   * Experimental rendezvous protocol for better performance above some MPI
 584     implementations.
 585   * Some tuning parameters ("+dynCapSend" and "+dynCapRecv") are now
 586     configurable at job launch, rather than Charm++ compilation.
 587
 588 - PGI C++: Disable automatic 'using namespace std;'
 589
 590 - Charm++ now supports ARM, both non-smp and smp.
 591
 592 - Mac OS X: Compilation options to build and link correctly on newer versions
 593
 594
 595 ================================================================================
 596 What's new in Charm++ 6.4.0
 597 ================================================================================
 598
 599 --------------------------------------------------------------------------------
 600 Platform Support
 601 --------------------------------------------------------------------------------
 602
 603 - Cray XE and XK systems using the Gemini network via either MPI
 604   (mpi-crayxe) or the native uGNI (gemini_gni-crayxe)
 605
 606 - IBM Blue Gene Q, using MPI (mpi-bluegeneq) or PAMI (pami-bluegeneq)
 607
 608 - Clang, Cray, and Fujitsu compilers
 609
 610 - MPI-based machine layers can now run on >64k PEs
 611
 612 --------------------------------------------------------------------------------
 613 General Changes
 614 --------------------------------------------------------------------------------
 615
 616 - Added a new [reductiontarget] attribute to enable
 617   parameter-marshaled recipients of reduction messages
 618
 619 - Enabled pipelining of large messages in CkMulticast by default
 620
 621 - New load balancers added:
 622   * TreeMatch
 623   * Zoltan
 624   * Scotch graph partitioning based: ScotchLB and Refine and Topo variants
 625   * RefineSwap
 626
 627 - Load balancing improvements:
 628
 629   * Allow reduced load database size using floats instead of doubles
 630   * Improved hierarchical balancer
 631   * Periodic balancing adapts its interval dynamically
 632   * User code can request a callback when migration is complete
 633   * More balancers properly consider object migratability and PE
 634     availability and speed
 635   * Instrumentation records multicasts
 636
 637 - Chare arrays support options that can enable some optimizations
 638
 639 - New 'completion detection' library for parallel process termination
 640   detection, when the need for modularity excludes full quiescence
 641   detection
 642
 643 - New 'mesh streamer' library for fine-grain many-to-many collectives,
 644   handling message bundling and network topology
 645
 646 - Memory pooling allocator performance and resource usage improved
 647   substantially
 648
 649 - AMPI: More routines support MPI_IN_PLACE, and those that don't check
 650   for it
 651
 652 ================================================================================
 653 What's new in Charm++ 6.2.1 (since 6.2.0)
 654 ================================================================================
 655
 656 --------------------------------------------------------------------------------
 657 New Supported Platforms:
 658 --------------------------------------------------------------------------------
 659
 660 POWER7 with LAPI on Linux
 661
 662 Infiniband on PowerPC
 663
 664 --------------------------------------------------------------------------------
 665 General Changes
 666 --------------------------------------------------------------------------------
 667
 668 - Better support for multicasts on groups
 669 - Topology information gathering has been optimized
 670 - Converse (seed) load balancers have many new optimizations applied
 671 - CPU affinity can be set more easily using +pemap and +commap options
 672   instead of the older +coremap
 673 - HybridLB (hierarchical balancing for very large core-count systems)
 674   has been substantially improved
 675 - Load balancing infrastructure has further optimizations and bug fixes
 676 - Object mappings can be read from a file, to allow offline
 677   topology-aware placement
 678 - Projections logs can be spread across multiple directories, speeding
 679   up output when dealing with thousands of cores (+trace-subdirs N
 680   will divide log files evenly among N subdirectories of the trace
 681   root, named PROGNAME.projdir.K)
 682 - AMPI now implements MPI_Issend
 683 - AMPI's MPI_Alltoall uses a flooding algorithm more agressively,
 684   versus pairwise exchange
 685 - Virtualized ARMCI support has been extended to cover the functions
 686   needed by CAF
 687
 688 --------------------------------------------------------------------------------
 689 Architecture-specific changes
 690 --------------------------------------------------------------------------------
 691
 692 - LAPI SMP has many new optimizations applied
 693
 694 - Net builds support the use of clusters' mpiexec systems for job
 695   launch, via the ++mpiexec option to charmrun
 696
 697 ================================================================================
 698 What's new in Charm++ 6.2.0 (since 6.1)
 699 ================================================================================
 700
 701 --------------------------------------------------------------------------------
 702 New Supported Platforms:
 703 --------------------------------------------------------------------------------
 704
 705 64-bit MIPS, such as SiCortex, using mpi-linux-mips64
 706
 707 Windows HPC cluster, using mpi-win32/mpi-win64
 708
 709 Mac OSX 10.6, Snow Leopard (32-bit and 64-bit).
 710
 711 --------------------------------------------------------------------------------
 712 General Changes
 713 --------------------------------------------------------------------------------
 714
 715 Runtime support
 716  - Smarter build/configure scripts
 717  - A new interface for model-based load balancing
 718  - new CPU topology API
 719  - a general implementation of CmiMemoryUsage()
 720  - Bug fix: Quiescence detection (QD) works with immediate messages
 721  - New reduction functions implemented in Converse
 722  - CCS (Converse Client-Server) can deliver message to more than one processor
 723  - Added a memory-aware adaptive scheduler, which can be optionally
 724    compiled in to charm
 725  - Added preliminary support for automatic message prioritization
 726    (disabled by default)
 727
 728 Charm++
 729  - Cross-array and cross-group sections
 730  - Structured Dagger (SDAG): Support templated arguments properly
 731  - Plain chares support checkpoint/restart (both in-memory and disk-based)
 732  - Conditional packing of messages and parameters in SMP scenario
 733  - Changes to the CkArrayIndex class hierarchy
 734    -- sizeof() all CkArrayIndex* classes is now the same
 735    -- Codes using custom array indices have to use placement-new to construct
 736       their custom index. Refer example code: examples/charm++/hello/fancyarray/
 737    -- *** Backward Incompatibility ***
 738       CkArrayIndex[4D/5D/6D]::index are now of type int (instead of short)
 739       However the data is stored as shorts. Access by casting
 740       CkArrayIndexND::data() appropriately
 741    -- *** Deprecated ***
 742       The direct use of public data member
 743       CkArrayIndexND::index (N=1..6) is deprecated. We reserve the right to
 744       change/remove this variable in future releases of Charm++.
 745       Instead, please access the indices via member function:
 746       int CkArrayIndexND::data()
 747
 748 Adaptive MPI (AMPI)
 749  - Compilers renamed to avoid collision with host MPI (ampicc, ampiCC,
 750    ampif77, ampif90)
 751  - Improved MPI standard conformance, and documentation of non-conformance
 752    * Bug fixes in: MPI_Ssend, MPI_Cart_shift, MPI_Get_count
 753    * Support MPI_IN_PLACE in MPI_(All)Reduce
 754    * Define various missing constants
 755  - Return the received message's tag in response to a non-blocking
 756    wildcard receive, to support SuperLU
 757  - Improved tracing for BigSim
 758
 759 Multiphase Shared Arrays (MSA)
 760  - Typed handles to enforce phases
 761  - Split-phase synchronization to enable message-driven execution
 762  - 3D arrays
 763
 764 TCharm
 765  - Automatic tracing of API calls for simulation and analysis
 766
 767 Debugging
 768  - Wider support for architectures other than net- (in particular MPI layers)
 769  - Improved support for large scale debugging (better scalability)
 770  - Enhanced record/replay stability to handle various events, and to
 771    signal unexpected messages
 772  - New detailed record/replay: The full content of messages can be
 773    recorded, and a single processor can be re-executed outside of the
 774    parallel application
 775
 776 Performance analysis
 777  - Tracing of nested entry methods
 778
 779 Automatic Performance Tuning
 780  - Created an automatic tuning framework [still for experimental use only]
 781
 782 CkMulticast
 783  - Network-topology / node aware spanning trees used internally for and
 784    lower bytes on the network and improved performance in multicasts and
 785    reductions delegated to this library
 786
 787 Comlib
 788  - Improved OneTimeMulticastStrategy classes
 789
 790 BigSim
 791  - Out-of-core support, with prefetching capability
 792  - Detailed tracing of MPI calls
 793  - Detailed record/replay support at emulation time, capable of
 794    replaying any emulated processor after obtained recorded logs.
 795
 796 --------------------------------------------------------------------------------
 797 Architecture-specific changes
 798 --------------------------------------------------------------------------------
 799
 800 Net-*
 801  - Can run jobs with more than 1024 PEs
 802
 803 Net-Linux
 804  - New charmrun option ++no-va-randomization to disable address space
 805    randomization (ASLR). This is most useful for running AMPI with
 806    isomalloc
 807
 808 MPI
 809  - Default to using ampicxx instead of mpiCC
 810
 811 MPI-SMP
 812  - The +p option now has the same semantics as in other smp builds
 813
 814 Power 7
 815  - Support for VSX in SIMD abstraction API
 816
 817 Blue Gene/L
 818  - Compilers and options have been updated to the latest ones
 819
 820 Blue Gene/P
 821  - Added routines for measuring performance counters on BG/P.
 822  - Updated to support latest DCMF driver version. On ANL's Intrepid, you may
 823    need to set BGP_INSTALL=/bgsys/drivers/V1R4M1_460_2009-091110P/ppc in your
 824    environment. This is the default on ANL's Surveyor.
 825
 826 Cray XT
 827  - cputopology information is now available on XT3/4/5
 828
 829 Infiniband (ibverbs)
 830  - Bug fix: plug memory leaks that caused failures in long runs
 831  - Optimized to reduce startup delays
 832
 833 LAPI
 834  - Support for SMP (experimental)
 835
 836
 837 ================================================================================
 838 Note that changes from 5.9, 6.0, and 6.1 are not documented here. A partial list
 839 can be found on the charm download page, or by reading through version control
 840 logs.
 841
 842 ================================================================================
 843 What's New since Charm++ 5.4 release 1
 844 ================================================================================
 845
 846 --------------------------------------------------------------------------------
 847 New Supported Platforms:
 848 --------------------------------------------------------------------------------
 849 1. Charm++ ported to IA64 Itanium running Win2K and Linux, Charm++ also support
 850    Intel C/C++ compilers;
 851
 852 2. Charm++ ported to Power Macintosh powerpc running Darwin;
 853
 854 3. Charm++ ported to Myrinet networking with GM API;
 855
 856 --------------------------------------------------------------------------------
 857 Summary of New Features:
 858 --------------------------------------------------------------------------------
 859 1. Structure Dagger
 860    Structured Dagger is a coordination language built on top of CHARM++.
 861    Structured Dagger allows easy expression of dependences among messages and
 862    computations and also among computations within the same object using
 863    when-blocks and various structured constructs.
 864
 865 2. Entry functions support parameter marshalling
 866    Now you can declare and invoke remote entry functions using parameter
 867    marshalling instead of defining messages.
 868
 869 3. Easier running - standalone mode
 870    For net-* version running locally, you can now run Charm programs without
 871    charmrun. Running a node program directly from command line is now the
 872    same as "charmrun +p1 <program>"; for SMP version, you can also specify
 873    multiple (local) processors, as in "program +p2".
 874
 875
 876 --------------------------------------------------------------------------------
 877 Summary of Changes:
 878 --------------------------------------------------------------------------------
 879 1. "build" changed for compilation of Charm++
 880    To build Charm++ from scratch, we now take additional command line options
 881    to compile with addon features and using different compilers other than gcc.
 882    For example, to build Linux IA64 with Myrinet support, type command:
 883    ./build net-linux-ia64  gm
 884
 885
 886                 *******   Old Change histories *******
 887
 888
 889 ================================================================================
 890 What's New in Charm++ 5.4 release 1 since 5.0
 891 ================================================================================
 892
 893 --------------------------------------------------------------------------------
 894 New Supported Platforms:
 895 --------------------------------------------------------------------------------
 896
 897 1. Win9x/2000/NT:  with Visual C++ or Cygwin gcc/g++, you can compile and run
 898    Charm++ programs on all Win32 platforms.
 899
 900 2. Scyld Beowulf:  Charm++ has been ported to the Linux-based Scyld Beowulf
 901    operating system. For more information on Scyld, see <http://www.scyld.com>
 902
 903 3. MPI with VMI:   Charm++ has been ported to NCSA's Virtual Machine Interface,
 904    which is an efficient messaging library for heterogeneous cluster
 905    communication.
 906
 907
 908 --------------------------------------------------------------------------------
 909 Summary of New Features:
 910 --------------------------------------------------------------------------------
 911 1. Dynamic Load balancing:
 912    Chare migration is supported in the new release. Migration-based dynamic
 913    load balancing framework with various load balancing strategies library has
 914    been added.
 915
 916 2. Chare Array
 917    Charm++ array is supported. You can now create an array of Chare objects
 918    and use array index to refer the Charm++ array elements. A reduction
 919    library on top of Chare array has been implemented and included.
 920
 921 3. Projections
 922    Projections, a Java application for Charm++ program performance analysis and
 923    visualization, has been included and distributed in the new release. Two
 924    trace modes are available: trace-projections and trace-summary. Trace-summary
 925    is a light-weight trace library compared to trace-projections.
 926
 927 4. AMPI
 928    AMPI is a load-balancing based library for porting legacy MPI applications
 929    to Charm++. With few changes in the original MPI code to AMPI, the new
 930    legacy MPI application on Charm++ will gain from Charm++'s adptive
 931    load balancing ability.
 932
 933 5. Easier invocation
 934    "Charmrun" is now available on all platforms, with a uniform command line
 935    syntax. You can forget the difference between net-* versions and MPI versions,
 936    and run charm++ application with this same charmrun command syntax.
 937    ++local option is added in charmrun for net-* version, it provides
 938    simple local use of Charm and no longer require the ability to
 939    "rsh localhost" or a nodelist file in order to run charm only on the local
 940    machine. This is especially attractive when you run Charm++ on Windows.
 941
 942 6. New libraries:
 943    Many new libraries have been added in this release. They include:
 944    1) master-slave library: for writing manager-worker paradigm programs.
 945    2) receiver library: provide asynchronous communication mode for chare array.
 946    3) f90charm:  provides Fortran90 bindings for Charm++ Array.
 947    4) BlueGene:  a Charm++/Converse emulator for IBM proposed Blue Gene.
 948
 949 --------------------------------------------------------------------------------
 950 Summary of Changes:
 951 --------------------------------------------------------------------------------
 952 1. message declaration syntax in .ci file:
 953    The message declaration syntax for packed/varsize messages has been changed.
 954    The packed/varsize keywords are eliminated, and you can specify the actual
 955    actual varsize arrays in the interface file and have the translator generate
 956    alloc, pack and unpack.
 957
 958
 959 Here is the detailed list of Changes:
 960
 961 --------------------------------------------------------------------------------
 962 Major Features:
 963 --------------------------------------------------------------------------------
 964
 965 10/06/1999      rbrunner        Added migration-based dynamic load balancing
 966                                 framework.
 967 11/15/1999      olawlor         Added reduction support foe Charm++ arrays
 968 02/06/2000      milind          Added AMPI, an implementation of MPI with
 969                                 dynamic load balancing
 970 02/18/2000      paranjpy        New platforms supported: net-win32, and                                         net-win32-smp
 971 04/04/2000      olawlor         Added arbitrarily indexed Charm++ arrays.
 972                                 Also, added translator support for new arrays.
 973 04/15/2000      olawlor         Added "puppers" for packing and unpacking
 974                                 objects.
 975 06/14/2000      milind          Added the threaded FEM framework.
 976
 977 --------------------------------------------------------------------------------
 978 Minor Features:
 979 --------------------------------------------------------------------------------
 980
 981 10/09/1999      rbrunner        Added packlib, a library for C and C++ to
 982                                 pack-unpack data to/from Charm++ messages.
 983 10/13/1999      gzheng          New LB strategy: RefineLB
 984 10/13/1999      paranjpy        New LB Strategy: Heap
 985 10/14/1999      milind          New LB Strategy: Metis
 986 10/19/1999      olawlor         New test program for testing LB strategies.
 987 10/21/1999      gzheng          New trace mode: trace-summary
 988 10/28/1999      milind          New supported platform: net-sol-x86
 989 10/29/1999      milind          Added runtime checks for ChareID assignment.
 990 11/10/1999      rbrunner        Added Neighborhood base strategy for LB
 991                                 framework.
 992 11/15/1999      olawlor         conv-host now reads in a startup file
 993                                 ~/.conv-hostrc
 994 11/15/1999      olawlor         New test program for testing array reductions.
 995 11/16/1999      rbrunner        Added processor-speed checking functions to
 996                                 LB framework
 997 11/19/1999      milind          Mapped SIGUSR to a Ccd condtion handler
 998 11/22/1999      rbrunner        New LB strategy: WSLB
 999 11/29/1999      ruiliu          Modified Metis LB strategy to deal with
1000                                 different processor speeds
1001 12/16/1999      rbrunner        New LB strategy: GreedyRef
1002 12/16/1999      rbrunner        New LB strategy: RandRef
1003 12/21/1999      skumar2         New LB strategy: CommLB
1004 01/03/2000      rbrunner        New LB strategy: RecBisectBfLB
1005 01/08/2000      skumar2         New LB strategy: Comm1LB, with varying processor
1006                                 speeds
1007 01/18/2000      milind          Modified SM library syntax, and added a test
1008                                 program for SM.
1009 01/19/2000      gzheng          Added irecv, a library to simplify conversion
1010                                 of message-passing programs to Charm++
1011 02/20/2000      olawlor         Added preliminary broadcast support to Charm++
1012                                 arrays.
1013 02/23/2000      paranjpy        Added converse-level quiescence detection
1014 03/02/2000      milind          Added ++server-port option to pre-specify
1015                                 CCS port.
1016 03/10/2000      wilmarth        Random seed-based load balancer now uses
1017                                 bit-vector for active PEs.
1018 03/21/2000      gzheng          Added support for marking user-defined events
1019                                 in trace-summary.
1020 03/28/2000      wilmarth        Added CMK_TRUECRASH. Very helpful for
1021                                 post-mortem debugging of Charm++ programs on
1022                                 net-* versions.
1023 03/31/2000      jdesouza        Added Fortran90 support to the Charm++
1024                                 interface translator.
1025 03/09/2000      milind          Added support for -LANG and -rpath options
1026                                 in charmc for Origin2000.
1027 04/28/2000      milind          Added prioritized converse threads.
1028 05/01/2000      milind          Added test programs for TeMPO, AMPI and irecv.
1029 05/04/2000      milind          New supported platform: mpi-sp.
1030 05/04/2000      gzheng          Added irecv pingpong program.
1031 05/17/2000      olawlor         Each chare, group and array element now has to
1032                                 have migration constructor.
1033 05/24/2000      milind          Added Jacobi3D programs for irecv and AMPI both.
1034 05/24/2000      milind          Made migratable an optional attribute of
1035                                 chares, groups, and nodegroups.
1036                                 Arrays are by default migratable.
1037 05/29/2000      paranjpy        Added pup methods to arrays, reductions etc
1038                                 internal objects.
1039 06/13/2000      milind          Made CtvInitialize idempotent.  That is, it
1040                                 can be called by any number of threads now,
1041                                 only the first one will actually do
1042                                 CtvInitialize.
1043 06/20/2000      milind          Added a simple test program for the FEM
1044                                 framework.
1045 07/06/2000      milind          Imported Metis 4.0 sources in the CVS tree.
1046                                 Also added code to make metis libraries and
1047                                 executables to Makefile.
1048 07/07/2000      milind          Added more meaningfull error messages using
1049                                 perror in addition to a cryptic error codes in
1050                                 net-* versions.
1051 07/10/2000      milind          fem and femf are now recognized as "languages"
1052                                 by charmc.
1053 07/10/2000      saboo           Added the derived datatypes library.
1054 07/13/2000      milind          Added +idle_timeout functionality. It takes a
1055                                 commandline parameter denoting milliseconds of
1056                                 maximum consecutive idle time allowed per
1057                                 processor.
1058 07/14/2000      milind          Added group multicast. Added
1059                                 CkSendMsgBranchMulti, CldEnqueueMulti, and
1060                                 translator changes to support it.
1061 07/14/2000      milind          SUPER_INSTALL now takes "-*" arguments prior
1062                                 to the target, that will be passed to make as
1063                                 "makeflags". This makes it easy to suppress
1064                                 make's output of commands etc (with the -s
1065                                 flag). As a result of this, several Makefiles
1066                                 have been massaged.
1067 07/18/2000      milind          Added support for using "dbx" on suns as
1068                                 debugger.
1069 07/19/2000      milind          Added ability to tracemode projections which
1070                                 produces binary trace files. Use flag
1071                                 +binary-trace on the command line.
1072 07/26/2000      milind          Separated AMPI from TeMPO.
1073 07/28/2000      milind          Added test programs to test reduce, alltoall
1074                                 and allreduce functionality of AMPI.
1075 08/02/2000      milind          Added an option to let the user specify which
1076                                 "xterm" to use.  For example, on some systems
1077                                 (CDE), only dtterm is installed.  So, by
1078                                 putting ++xterm dtterm on the conv-host
1079                                 commandline, one can use dtterm when ++in-xterm
1080                                 option is specified on conv-host commandline.
1081 08/14/2000      milind          FEM Framework: Added capabilities to handle
1082                                 esoteric meshes to standalone offline programs.
1083                                 Makefile now produces gmap and fgmap programs,
1084                                 which are used for this purpose.  They convert
1085                                 the mesh to a graph before partitioning it
1086                                 using Metis.
1087 08/24/2000      milind          Added the 2D crack propagation program as a
1088                                 test program for FEM framework.
1089 08/25/2000      milind          Initial implementation of isomalloc-based
1090                                 threads.  This implementation uses a fixed
1091                                 stack size for all threads (can be set at
1092                                 runtime.)
1093 08/26/2000      milind          Added a macro CtvAccessOther that lets you
1094                                 get/set a Ctv variable of any thread.  It
1095                                 should be invoked as CtvAccessOther(thread,
1096                                 varname); Added CthGetData function to each of
1097                                 the threads implementation.  This function is
1098                                 used in the CtvAccessOther macro.
1099 08/27/2000      milind          FEM Framework: Separated mesh to graph
1100                                 conversion capability into a separate program.
1101                                 This way, the generated graph can be partitioned
1102                                 repeatedly.
1103 09/04/2000      milind          Added the class static readonly variables to
1104                                 ci file syntax.
1105 09/05/2000      milind          FEM Framework: A very fast O(n) algorithm for
1106                                 mesh2graph , uses more memory, but the tradeoff
1107                                 was worth it. Coded by Karthik Mahesh, minor
1108                                 optimizations by Milind.
1109 09/05/2000      milind          Added a barebones charm kernel scheduling
1110                                 overhead measurement program.
1111 09/15/2000      milind          Added pup support for AMPI and FEM framework.
1112 09/20/2000      olawlor         Added capability to have an array of base type
1113                                 where individual element could be of derived
1114                                 types.
1115 10/03/2000      gzheng          New supported platform: net-linux-axp
1116 10/05/2000      skumar2         Added program littleMD to the test suite.
1117 10/07/2000      skumar2         New job scheduler (Faucets projects).
1118 10/15/2000      milind          Improved support for Fortran90 in charmc.
1119 11/04/2000      jdesouza        Made the Faucets scheduler multi-threaded.
1120 11/05/2000      olawlor         FEM Framework: supports multiple element types,
1121                                 mesh re-assembly, etc.
1122 11/15/2000      gzheng          New platform support: net-cygwin
1123 11/18/2000      gzheng          conv-host no longer needs /bin/csh to start
1124                                 remote program.  set
1125                                 CMK_CONV_HOST_CSH_UNAVAILABLE to 1 to use
1126                                 /bin/sh instead.
1127 11/25/2000      milind          Finished experimental implementation of
1128                                 converse-threads based on co-operative pthreads.
1129 11/25/2000      milind          Added a benchmark suite of all pingpongs in
1130                                 Charm++.
1131 11/28/2000      milind          Removed deletion of _idx at the end of every
1132                                 send or doneInserting call.  Instead now it is
1133                                 in the destructor of the proxy. This allows us
1134                                 to cache proxies, when proxy creation becomes
1135                                 a bottleneck.
1136 11/28/2000      olawlor         Added "seek blocks" to puppers.  This should
1137                                 allow out-of-order pup'ing without the ugliness
1138                                 of getBuf; and in a way that works with all
1139                                 PUP::ers.
1140 11/29/2000      olawlor         Simplified and regularized command-line-argument
1141                                 handling.
1142 11/29/2000      milind          AMPI: Added multiple-communicators capability.
1143 12/05/2000      gzheng          Now /bin/sh is default shell to fork node
1144                                 program on remote machines.
1145 12/13/2000      olawlor         Added charmrun wrapper for poe on mpi-sp.
1146 12/14/2000      milind          Added bluegene emulator sources and test
1147                                 programs.  Added "bluegene" as a language known
1148                                 to charmc.  Makefile now has a target called
1149                                 bluegene.  Added preliminary bluegene
1150                                 documentation.  (copied from Arun's webpage.)
1151 12/15/2000      gzheng          f90charm addition to Makefile and charmc. Also,
1152                                 added fixed size arrays support to f90charm. A
1153                                 test program f90charm/hello is checked in.
1154 12/17/2000      milind          Added rtest test program. Contributed by jim to
1155                                 test Converse message transmission.
1156 12/20/2000      olawlor         Added charmconfig script. Enables automatic
1157                                 determination of C++ compiler properties,
1158                                 replacing the verbose and error-prone
1159                                 conv-mach.h entries for CMK_BOOL,
1160                                 CMK_STL_USE_DOT_H, CMK_CPP_CAST_OK, ...
1161 12/20/2000      olawlor         Charm++ Arrays optimizations: Key and object
1162                                 now variable-length fields, instead of pointers.
1163                                 This extra flexibility lets us save many
1164                                 dynamic allocations in the array framework.
1165 12/20/2000      olawlor         Added PUP::able support-- dynamic type
1166                                 identification, allocation, and deletion.
1167                                 Allows you to write:   p(objPtr); and
1168                                 objPointer will be properly identified,
1169                                 allocated, packed, and deallocated (depending
1170                                 on the PUP::er).  Requires you to register any
1171                                 such classes with DECLARE_PUPable and
1172                                 DEFINE_PUPable.
1173 12/20/2000      olawlor         Arrays optimizations: Made CkArrayIndex
1174                                 fixed-size.  This significantly improves
1175                                 messaging speed (7 us instead of 10 us
1176                                 roundtrip).  Move spring cleaning check into a
1177                                 CcdCallFnAfter, which gains more speed (down to
1178                                 4 us roundtrip).
1179 12/20/2000      olawlor         More optimizations: Minor speed tweaks--
1180                                 conv-ccs.c uses hashtable for handler lookup;
1181                                 conv-conds skips timer test until needed;
1182                                 convcore.c scheduler loop optmizations (no
1183                                 superfluous EndIdle calls); threads.c
1184                                 CMK_OPTIMIZE-> no mprotect.
1185 12/20/2000      olawlor         More Optimizations: Minor speed tweaks-- ck.C
1186                                 groups cldEnqueue skip; init.h defines
1187                                 CkLocalBranch inline; and supporting changes.
1188 12/22/2000      gzheng          IA64 support for Converse user level threads.
1189 01/02/2001      olawlor         CCS: Minor update-- enabled CcsProbe, cleaned
1190                                 up superflous debug messages in server, added
1191                                 Java interface (originally written for
1192                                 AppSpecter).
1193 01/09/2001      gzheng          charmconfig converted to autoconf style, need
1194                                 to change configure.in and conv-autoconfig.h.in,
1195                                 and run autoconf to get configure and copy to
1196                                 charmconfig.  added fortran subroutine name
1197                                 test and get libpthread.a
1198 01/10/2001      milind          Added telnet method of getting libpthread.a
1199                                 from charm webserver.
1200 01/11/2001      olawlor         Moved projections files here from
1201                                 CVSROOT/projections-java.  Added fast Java
1202                                 versions of the .log file input routines in
1203                                 LogReader, LogLoader, LogAnalyzer, and
1204                                 UsageCalc.  Added "U.java" user interface
1205                                 utility file, allowing times to be input in
1206                                 seconds, milliseconds, or microseconds,
1207                                 instead of just microseconds.
1208 01/15/2001      gzheng          add +trace-root to specify the directory to
1209                                 put log files in. this is need in Scyld cluster
1210                                 where there is no NFS mounting and no i/o
1211                                 access to home directory sharing on nodes.
1212 01/15/2001      milind          Made AMPI into a f90 module instead of
1213                                 'ampif.h' inclusion.  AMPI f90 bindings are
1214                                 now more inclusive.  Fixed argc,argv handling
1215                                 bugs in ArgsInfo message.  Fixed a bug in pup
1216                                 that caused thread not to be sized, but was
1217                                 packed nevertheless. Moved irecv to waitall
1218                                 instead of at in ampi_start.  Made
1219                                 AMPI_COMM_WORLD to be 0, because it clashed
1220                                 with wildcard(-1).  AMPI_COMM_UNIVERSE is now
1221                                 handled properly in the AMPI module.
1222                                 C/C++ data members are NOT visible to
1223                                 Fortran 90.
1224 01/18/2001      gzheng          New supported platform: net-linux-scyld
1225 01/20/2001      olawlor         Moved array index field from CMessage_* to the
1226                                 Ck envelope itself.  This is the right thing
1227                                 to do, because any message may be sent to/from
1228                                 an array element.  To reduce the wasted space
1229                                 in a message, a union is used to overlay the
1230                                 fields for the various possible message types.
1231 01/29/2001      olawlor         Freed charmrun on net-* version from using
1232                                 remote shell to fork off processes. One can now
1233                                 use a daemon provided in the distribution.
1234 02/07/2001      olawlor         Added debugging support to puppers.
1235 02/13/2000      gzheng          Added ++local option to charmrun to start node
1236                                 program locally without any daemon; fix the
1237                                 hang program if you type wrong pgm name in
1238                                 scyld version, and redirect all output to
1239                                 /dev/null, otherwise all node program can send
1240                                 its output to console in scyld. Also implemented                                ++local in net-win32 version.
1241 02/26/2000      milind          Changed the varsize syntax. Now one can specify
1242                                 actual varsize arrays in the interface file
1243                                 and have the translator generate alloc, pack
1244                                 and unpack.
1245
1246 --------------------------------------------------------------------------------
1247 Bug Fixes:
1248 --------------------------------------------------------------------------------
1249
1250 10/29/1999      milind          Replaced jmemcpy by memcpy in net versions, as
1251                                 it was causing a bit to flip (bug reported
1252                                 by jim.)
1253 10/29/1999      milind          Fixed multiline macros in all header files.
1254 02/05/2000      milind          Fixed linking errors by getting the order of
1255                                 libraries right from the charmc command-line.
1256 02/18/2000      paranjpy        Fixed Charm++ initialization bug on SMPs.
1257 02/21/2000      milind          Fixed a context-switching bug in mipspro version
1258                                 of QuickThreads.
1259 02/25/2000      milind          Charm++ interface translator was segfaulting
1260                                 on interface file errors. Fixed that. Also,
1261                                 added linenumbers to error messages.
1262 03/02/2000      milind          Made CCS work on SMPs.
1263 03/07/2000      milind          Made ConverseInit consistent with the manual on
1264                                 Origin2000 version.
1265 04/18/2000      milind          Fixed a bug in CkWaitFuture, which was caching
1266                                 a variable locally, while it was changed by
1267                                 another thread.
1268 05/04/2000      paranjpy        Fixed argv deletion bug on net-win32-smp.
1269 06/08/2000      milind          sp3 version: changed optimization flags, which
1270                                 where power2 processor-specific.
1271 06/20/2000      milind          mpi-* versions: Fixed ConverseExit since it was
1272                                 not obeying the following statement in the MPI
1273                                 standard: The user must ensure that all pending
1274                                 communications involving a process completes
1275                                 before the process calls MPI_FINALIZE.
1276 07/05/2000      milind          Fixed a nasty bug in charmc in the -cp option.
1277                                 It used to append the name provided to -o flag
1278                                 to the directory provided to the -cp flag.
1279                                 Thus, -o ../pgm -cp ../bin options meant that
1280                                 the pgm would be copied to ../bin/.., which is
1281                                 not the expected behavior. This fix correctly
1282                                 copies pgm to ../bin.
1283 07/07/2000      milind          Removed variable arg_myhome, as it was not
1284                                 being used anywhere, and also, setting it was
1285                                 causing problems of env var HOME was not set.
1286 07/27/2000      milind          thishandle for the arrayelement was not being
1287                                 correctly set.  Bug was reported by Neelam.
1288 08/26/2000      milind          Origin2000: Changed the page alignment to
1289                                 reflect the mmap alignment.  The mmap man page
1290                                 specifically states that it is not the same as
1291                                 page size.
1292 09/02/2000      milind          Fixed a bug in code generated for threaded
1293                                 (void) entry methods of array elements. The
1294                                 dummy message that is passed to that method in
1295                                 a thread has to be deleted before calling the
1296                                 object method, because upon object method's
1297                                 return, the thread might have migrated.
1298 09/03/2000      olawlor         Minor fix-fixes: 1.) Change to LBObjid hash
1299                                 function would fail for >4-int object indices.
1300                                 Replaced with proper function, which also
1301                                 preserves the 1-int case.  2.) Array element
1302                                 sends must go via the message queue to prevent
1303                                 stack build-up for deep single-processor call
1304                                 chains. These might happen, e.g., in a driver
1305                                 element calling itself for the main time loop.
1306                                 Messages are now properly noted as sent, then
1307                                 wait through the queue for delivery.  This
1308                                 entailed minor reorganization of the message
1309                                 delivery subsystem.
1310 09/21/2000      olawlor         Tiny SMP thread fix-- registrations of a
1311                                 thread-private variable now reserve space on
1312                                 calls after the first.  This wastes space for
1313                                 multiple CthInitialize's-- it's a quick hack to
1314                                 get threads working again on SMP versions.
1315 10/16/2000      olawlor         A few CCS fixes:   -Added split-phase reply
1316                                 (delay reply indefinitely)  -Cleaned up error
1317                                 handling -Pass user data as "void *" instead of
1318                                 "char *"
1319 11/03/2000      wilmarth        Removed 0 size array allocation in Charm++
1320                                 quiescence detection.
1321 11/20/2000      gzheng          Rewrote part of Fiber thread, including a bug
1322                                 fix for a the non thread-safe function, and a
1323                                 different fiber free strategy.
1324 11/29/2000      gzheng          The LB init procedure tried to allocate
1325                                 65536*160 as initial size, which is 10M memory
1326                                 for communication table, which is too big.
1327                                 Cut it down to roughly 1M, and it can expand
1328                                 in later code.
1329 12/05/2000      gzheng          In many cases, conv-host exits without print
1330                                 out the error message from remote shell. try
1331                                 to fix it by calling sync to flush the pipe
1332                                 before exit 1.
1333 12/10/2000      milind          net-linux: Made static linking the default
1334                                 option because dynamic linking runtime causes
1335                                 isomalloc threads to crash.
1336 12/18/2000      milind          Increased portability of isomalloc threads by
1337                                 removing dependence on alloca.
1338 12/28/2000      milind          Fixed ctrl-getone abort bug on SMP.
1339 12/28/2000      milind          Made _groupTable a pointer on which a
1340                                 constructor is explicitly called.  Since it
1341                                 was a Cpv variable, its constructor was not
1342                                 called by default in case of an SMP version.
1343 12/29/2000      olawlor         Prevent infinite copy constructor recursion on
1344                                 Origin2000.
1345 01/10/2001      olawlor         Added "explicit" keyword to remove ambiguity
1346                                 for KCC, which was confused by the private
1347                                 PUP::er(int) "cast" constructor and the operator
1348                                 |(PUP::er &p,T &t) into rejecting all operator|
1349                                 (int,int) as ambiguous.
1350 2001/01/17      gzheng          fix the charmconfig bug on paragon-red: the
1351                                 failure testing of fortran won't stop the
1352                                 compilation.
1353 01/20/2001      olawlor         Arrays reduction:  Fixed bug-- reduction may end
1354                                 because all contributors migrate away.
1355 01/29/2001      olawlor         Fix heap-corrupting bug-- call ->init() on
1356                                 nodeGroupTable, which sets the "pending"
1357                                 message queue to NULL.  This prevents a nasty
1358                                 delete-unitialized-data bug later on.  Also
1359                                 delayed queue creation until messages actually
1360                                 arrive.
1361
1362 --------------------------------------------------------------------------------
1363 Documentation Changes:
1364 --------------------------------------------------------------------------------
1365
1366 01/31/2000      milind          Installation manual: Fixed bugs pointed out by
1367                                 quantum TA
1368 02/28/2000      wilmarth        Added a new look Charm++ manual.
1369 06/20/2000      milind          Added pdflatex support to generate PDF versions
1370                                 of manuals from LaTeX sources.
1371 12/05/2000      milind          Added Orion's FEM manual. Converted from HTML.
1372 12/10/2000      milind          Added pplmanual.sty for all manuals.
1373 12/17/2000      milind          Added master-slave library documentation to
1374                                 convext manual.
1375 12/21/2000      saboo           Added DDT documentation.
1376 01/02/2001      olawlor         Updated for new CCS version.
1377
1378 --------------------------------------------------------------------------------
1379 Other Changes:
1380 --------------------------------------------------------------------------------
1381
1382 10/24/1999      olawlor         charmc is changed to Bourne shell script
1383                                 instead of csh. All conv-mach.csh are
1384                                 replaced by conv-mach.sh.
1385 10/25/1999      olawlor         SUPER_INSTALL is converted to use bourne shell.
1386 10/28/1999      milind          All Makefiles now take OPTS commandline
1387                                 arguments.
1388 01/16/2000      olawlor         Simplified Charm++ interface translator.
1389 02/23/2000      ruiliu          Changed rand() calls from all over the codes
1390                                 to the new Converse random number generator.
1391 02/26/2000      milind          Simplified the converse scheduler loop by
1392                                 combining the maxmsgs and poll modes.
1393 08/31/2000      milind          Imported system documentation into the CVS tree.
1394                                 Also added super_install target for docs with
1395                                 necessary Makefile modifications.
1396 09/08/2000      olawlor         Made soft links use relative pathnames instead
1397                                 of absolute.  This lets you move a charm++
1398                                 installation without having to recompile
1399                                 anything.
1400 09/11/2000      olawlor         Grouped commonly needed code in the new util
1401                                 directory. Also, added pup_c a C wrapper for
1402                                 puppers.
1403 09/11/2000      olawlor         Slightly reorganized header structure.  Now no
1404                                 headers should need to be listed twice (once in
1405                                 ALLHEADERS, again in CKHEADERS).  Now headers
1406                                 are soft-linked instead of copied.  This makes
1407                                 development much easier.  Added support for the
1408                                 new Common/util directory.
1409 09/21/2000      olawlor         Major reorganization of net-* codes. Now all
1410                                 the TCP socket routines are in separate files.
1411                                 Also combined windoes NT code with unix codes.
1412 09/21/2000      olawlor         Major rewrite of CCS-- underlying protocol is
1413                                 now binary (send/recv binary data everywhere);
1414                                 conv-host forwards requests to nodes; and
1415                                 source has been significantly re-arranged.
1416                                 (especially if NODE_0_IS_CONVHOST).
1417 11/22/2000      milind          Removed IDL translator from distribution.
1418 12/01/2000      olawlor         Renamed conv-host charmrun; added test for
1419                                 script conv-host. Also added charmrun for most
1420                                 other machines.
1421 12/17/2000      milind          Moved List related data structures into
1422                                 cklists.h in util. Removed most of the redundant
1423                                 list implementations.
1424 12/20/2000      gzheng          SUPER_INSTALL: format the output of list of
1425                                 versions and make the help page fit into one
1426                                 page of xterm.
1427 12/24/2000      milind          Added test-{charm,converse,ampi,fem} targets to
1428                                 super_install.
1429 12/28/2000      milind          net-sol-smp now uses pthreads.
1430 01/29/2001      olawlor         Merged windowsNT and unix build procedures by
1431                                 basing the Windows build on cygwin. Added
1432                                 scripts to deal with unix and windows
1433                                 differences.