1 This file describes the most significant changes. For more detail, use
2 'git log' on a clone of the charm repository.
4 ================================================================================
5 What's new in Charm++ 6.9.0
6 ================================================================================
8 This is a feature release, with the following major additions:
12 - Charm++ now requires C++11 and better supports use of modern C++ in applications.
14 - New "Zero Copy" messaging APIs for more efficient communication of large arrays.
16 - charm4py provides a new Python interface to Charm++, without the need for .ci files.
18 - AMPI performance, standard compliance, and usability improvements.
20 - GPU Manager improvements for asynchronous offloading and a new CUDA-like API (HAPI).
22 Charm++ Features & Fixes:
24 - Added new, more intuitive process launching commands based on hwloc support,
25 such as '++processPer{Host,Socket,Core,PU} <num>' and '++oneWthPer{Host,Socket,Core,PU}'.
26 Also added a '++autoProvision' option, which by default uses all hardware resources
29 - Added a new 'zero copy' direct API which allows users to communicate large message buffers
30 directly via RDMA on networks that support it, avoiding any intermediate buffering of data
31 between the sender and the receiver. The API is also optimized for shared memory.
33 - A new Python interface to Charm++, named charm4py, is now available for Python users.
34 More documentation on it can be found here: http://charm4py.readthedocs.io
36 - Charmxi now supports r-value references, std::array, std::tuple, the 'typename' keyword,
37 parameter packs, variadic templates, array indices with template parameters, and attributes
38 on explicit instantiations of templated entry methods.
40 - Projections traces of templated entry methods now display demangled template type names.
42 - [local] and [inline] entry method attributes now work for templated entry methods and now
43 support perfect forwarding of their arguments.
45 - Added various type traits for generic programming with Charm++ entities inside
48 - Chare array index types are now exposed as 'array_index_t'.
50 - Support for default arguments to Group entry methods.
52 - Charm++ now throws a runtime error when a user calls an SDAG entry method containing a
53 'when' clause directly, without calling it via a proxy.
55 - Users can now pass std::vector's directly to contribute() rather than passing the size and
56 buffer address separately. Cross-array section reduction contributions can now take a callback.
58 - Added a simplified STL-based interface for section creation.
60 - Added PUP support for C++ enums, for std::deque and std::forward_list, for STL containers
61 of objects with abstract base classes, and for avoiding default construction during unpacking
62 by defining a constructor that takes a value of type PUP::reconstruct.
64 - Improved performance for PUP of STL containers of arithmetic types and types
67 - Allow setting queueing type and priorities on entry methods with no parameters.
69 - Enable setting Group and Node Group dependencies on all types of entry methods and
70 constructors, as well as multiple dependencies.
72 - Support for model-based runtime load balancing strategy selection via MetaLB. This can be enabled
73 with +MetaLBModelDir <path-to-model> used alongside +MetaLB option. A trained model can be
74 found in charm/src/ck-ldb/rf_model.
76 - A new lock-free producer-consumer queue implementation has been added as a build option
77 '--enable-lockless-queue' for LRTS's within-process messaging queues in SMP mode.
79 - CkLoop now supports lambda syntax, adds a Hybrid mode that combines static scheduling with dynamic
80 work stealing, and adds Drone mode support in which chares are mapped to rank 0 on each logical
81 node so that other PEs can act as drones to execute tasks.
83 - Updated our integrated LLVM OpenMP runtime to support more OpenMP directives.
85 - Updated f90charm interface for more functionality and usability, and fixed all example programs.
87 - The Infiniband 'verbs' communication layer now automatically selects the fastest active
88 Infiniband device and port at startup.
90 - Fixed '-tracemode utilization', tracing of user-level threads, and nested local/inline methods.
92 - Fixed a performance bug introduced in v6.8.0 for dynamic location management.
94 - Added support for using Boost's lightweight uFcontext user-level threads, now the default
95 ULT implementation on most platforms.
97 - '++debug' now works using lldb on Mac (Darwin) systems.
99 - CkAbort() is now marked with the C++ attribute [[noreturn]].
101 - CkExit() now takes an optional integer argument which is returned from the program's exit.
103 - Improved error checking throughout, and fixes to race conditions during startup.
107 - Improved performance of point-to-point message matching and reduced per-rank memory footprint.
109 - Fixes to derived datatypes handling, MPI_Sendrecv_replace, MPI_(I)Alltoall{v,w},
110 MPI_(I)Scatter(v), MPI_IN_PLACE in gather collectives, MPI_Buffer_detach, MPI_Type_free,
111 MPI_Op_free, and MPI_Comm_free.
113 - Implemented support for generalized requests, MPI_Comm_create_group, keyval attribute callbacks,
114 the distributed graph virtual topology, large count routines, matched probe and recv, and
115 MPI_Comm_idup(_with_info) routines.
117 - Added support for using -tlsglobals for privatization of global/static variables
118 in shared objects. Previously -tlsglobals required static linking.
120 - '-memory os-isomalloc', which uses the system's malloc underneath, now works everywhere
121 Isomalloc does. Both versions of Isomalloc now wrap calls to posix_memalign(), and we
122 removed the need to link with '-Wl,--allow-multiple-definition' on some systems.
124 - Updated AMPI_Migrate() with built-in MPI_Info objects, such as AMPI_INFO_LB_SYNC.
126 - AMPI now only renames the user's MPI calls from MPI_* to AMPI_* if Charm++/AMPI is
127 built on top of another MPI implementation for its communication substrate.
129 - Support for compiling mpif.h in both fixed form and free form.
131 - PMPI profiling interface support added.
133 - Added an ampirun script that wraps charmrun to enable easier integration with
134 build and test scripts that take mpirun/mpiexec as an option.
138 - Enable concurrent kernel execution by removing the limit imposed by the internal
139 implementation that used only three streams.
141 - New API (Hybrid API, or HAPI) that is more similar to the CUDA API.
143 - Added NVIDIA NVTX support for profiling host-side functions.
145 - Deprecated the workRequest API. New users are now strongly recommended to use
146 the new API, or Hybrid API (HAPI).
148 Build System Changes:
150 - Charm++ now requires C++11 support, and as such defaults to using bgclang on BGQ.
151 Compilers GCC v4.8+, ICC v15.0+, XLC v13.1+, Cray CC v8.6+, MSVC v19.00.24+ and
152 Clang v3.3+ are required.
154 - Building Charm++ from the git repository now requires autoconf and automake.
156 - Support for the Flang Fortran compiler added.
158 - Users can now specify compiler versions to our top-level build script when building
161 - Windows users can now build Charm++ with GCC, Clang, or MSVC.
163 - All of Charm++ and AMPI can now be built as shared objects.
165 - Added a CMake wrapper for compiling .ci files.
167 - Charm++ is now available in Spack under the name 'charmpp'.
169 - Added {pamilrts,mpi,multicore,netlrts}-linux-ppc64le build targets for new IBM POWER systems.
171 - Added {multicore,netlrts}-linux-arm8 build targets for AArch64 / ARM64 systems.
173 ================================================================================
174 What's new in Charm++ 6.8.2
175 ================================================================================
177 This is a minor release containing only the following changes on top of 6.8.1:
179 - Fix for a crash in memory deregistration on the OFI communication layer in SMP mode.
181 - Tuned eager/rendezvous messaging thresholds for the PAMI communication layer
184 ================================================================================
185 What's new in Charm++ 6.8.1
186 ================================================================================
188 This is a backwards-compatible patch/bug-fix release. Roughly 100 bug
189 fixes, improvements, and cleanups have been applied across the entire
190 system. Notable changes are described below:
192 General System Improvements
194 - Enable network- and node-topology-aware trees for group and chare
195 array reductions and broadcasts
197 - Add a message receive 'fast path' for quicker array element lookup
199 - Feature #1434: Optimize degenerate CkLoop cases
201 - Fix a rare race condition in Quiescence Detection that could allow
202 it to fire prematurely (bug #1658)
203 * Thanks to Nikhil Jain (LLNL) and Karthik Senthil for isolating
204 this in the Quicksilver proxy application
206 - Fix various LB bugs
207 * Fix RefineSwapLB to properly handle non-migratable objects
208 * GreedyRefine: improvements for concurrent=false and HybridLB integration
209 * Bug #1649: NullLB shouldnt wait for LB period
211 - Fix Projections tracing bug #1437: CkLoop work traces to the
212 previous entry on the PE rather than to the caller
214 - Modify [aggregate] entry method (TRAM) support to only deliver
215 PE-local messages inline for [inline]-annotated methods. This avoids
216 the potential for excessively deep recursion that could overrun
219 - Fix various compilation warnings
223 - Improve experimental support for PAMI network layer on POWER8 Linux platforms
224 * Thanks to Sameer Kumar of IBM for contributing these patches
226 - Add an experimental 'ofi' network layer to run on Intel Omni-Path
227 hardware using libfabric
228 * Thanks to Yohann Burette and Mikhail Shiryaev of Intel for
229 contributing this new network layer
231 - The GNI network layer (used on Cray XC/XK/XE systems) now respects
232 the ++quiet command line argument during startup
236 - Support for MPI_IN_PLACE in all collectives and for persistent requests
238 - Improved Alltoall(v,w) implementations
240 - AMPI now passes all MPICH-3.2 tests for groups, virtual topologies, and infos
242 - Fixed Isomalloc to not leave behind mapped memory when migrating off a PE
244 ================================================================================
245 What's new in Charm++ 6.8.0
246 ================================================================================
248 Over 900 bug fixes, improvements, and cleanups have been applied
249 across the entire system. Major changes are described below:
253 - Calls to entry methods taking a single fixed-size parameter can now
254 automatically be aggregated and routed through the TRAM library by
255 marking them with the [aggregate] attribute.
257 - Calls to parameter-marshalled entry methods with large array
258 arguments can ask for asynchronous zero-copy send behavior with an
259 `nocopy' tag in the parameter's declaration.
261 - The runtime system now integrates an OpenMP runtime library so that
262 code using OpenMP parallelism will dispatch work to idle worker
263 threads within the Charm++ process.
265 - Applications can ask the runtime system to perform automatic
266 high-level end-of-run performance analysis by linking with the
267 `-tracemode perfReport' option.
269 - Added a new dynamic remapping/load-balancing strategy,
270 GreedyRefineLB, that offers high result quality and well bounded
273 - Improved and expanded topology-aware spanning tree generation
274 strategies, including support for runs on a torus with holes, such
275 as Blue Waters and other Cray XE/XK systems.
277 - Charm++ programs can now define their own main() function, rather
278 than using a generated implementation from a mainmodule/mainchare
279 combination. This extends the existing Charm++/MPI interoperation
282 - Improvements to Sections:
284 * Array sections API has been simplified, with array sections being
285 automatically delegated to CkMulticastMgr (the most efficient implementation
286 in Charm++). Changes are reflected in Chapter 14 of the manual.
288 * Group sections can now be delegated to CkMulticastMgr (improved performance
289 compared to default implementation). Note that they have to be manually
290 delegated. Documentation is in Chapter 14 of Charm++ manual.
292 * Group section reductions are now supported for delegated sections
295 * Improved performance of section creation in CkMulticastMgr.
297 * CkMulticastMgr uses the improved spanning tree strategies. See above.
299 - GPU manager now creates one instance per OS process and scales the
300 pre-allocated memory pool size according to the GPU memory size and
301 number of GPU manager instances on a physical node.
303 - Several GPU Manager API changes including:
305 * Replaced references to global variables in the GPU manager API with calls to
308 * The user is no longer required to specify a bufferID in dataInfo struct.
310 * Replaced calls to kernelSelect with direct invocation of functions passed
311 via the work request object (allows CUDA to be built with all programs).
313 - Added support for malleable jobs that can dynamically shrink and
314 expand the set of compute nodes hosting Charm++ processes.
316 - Greatly expanded and improved reduction operations:
318 * Added built-in reductions for all logical and bitwise operations
319 on integer and boolean input.
321 * Reductions over groups and chare arrays that apply commutative,
322 associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now
323 processed in a streaming fashion. This reduces the memory footprint of
324 reductions. User-defined reductions can opt into this mode as well.
326 * Added a new `Tuple' reducer that allows combining multiple reductions
327 of different input data and operations from a common set of source
328 objects to a single target callback.
330 * Added a new `Summary Statistics' reducer that provides count, mean,
331 and standard deviation using a numerically-stable streaming algorithm.
333 - Added a `++quiet' option to suppress charmrun and charm++ non-error
336 - Calls to chare array element entry methods with the [inline] tag now
337 avoid copying their arguments when the called method takes its
338 parameters by const&, offering a substantial reduction in overhead in
341 - Synchronous entry methods that block until completion (marked with
342 the [sync] attribute) can now return any type that defines a PUP
343 method, rather than only message types.
345 - Static (non-generated) header files are now warning-free for
346 gcc -Wall -Wextra -pedantic.
348 - Deprecated setReductionClient and CkSetReductionClient in favor of
349 explicitly passing callbacks to contribute calls.
351 - On C++ standard library implementations with support for
352 std::is_constructible (e.g. GCC libstdc++ >4.5), chare array
353 elements only need to define a constructor taking CkMigrateMessage*
354 if it will actually be migrated.
356 - The PUP serialization framework gained support for some C++11
357 library classes, including unique_ptr and unordered_map, when the
358 underlying types have PUP operators.
362 - More efficient implementations of message matching infrastructure, multiple
363 completion routines, and all varieties of reductions and gathers.
365 - Support for user-defined non-commutative reductions, MPI_BOTTOM, cancelling
366 receive requests, MPI_THREAD_FUNNELED, PSCW synchronization for RMA, and more.
368 - Fixes to AMPI's extensions for load balancing and to Isomalloc on SMP builds.
370 - More robust derived datatype support, optimizations for truly contiguous types.
372 - ROMIO is now built on AMPI and linked in by ampicc by default.
374 - A version of HDF5 v1.10.1 that builds and runs on AMPI with virtualization
375 is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi
377 - Improved support for performance analysis and visualization with Projections.
379 Platforms and Portability
381 - The runtime system code now requires compiler support for C++11
382 R-value references and move constructors. This is not expected to be
383 incompatible with any currently supported compilers.
385 - The next feature release (anticipated to be 6.9.0 or 7.0) will require
386 full C++11 support from the compiler and standard library.
388 - Added support for IBM POWER8 systems with the PAMI communication API,
389 such as development/test platforms for the upcoming Sierra and Summit
390 supercomputers at LLNL and ORNL. Contributed by Sameer Kumar of IBM.
392 - Mac OS (darwin) builds now default to the modern libc++ standard
393 library instead of the older libstdc++.
395 - Blue Gene/Q build targets have been added for the `bgclang' compiler.
397 - Charm++ can now be built on Cray's CCE 8.5.4+.
399 - Charm++ will now build without custom configuration on Arch Linux
401 - Charmrun can automatically detect rank and node count from
402 Slurm/srun environment variables.
404 - Many obsolete architecture, network, and compiler support files have
405 been removed. These include:
407 * Sony/Toshiba/IBM Cell (including PlayStation 3)
409 * Intel IA-64 (Itanium)
410 * Intel x86-32 for Windows, Mac OS X (darwin), and Solaris
412 * Older IBM AIX/POWER configurations
413 * GCC 3 and KAI compilers
416 ================================================================================
417 What's new in Charm++ 6.7.1
418 ================================================================================
420 Changes in this release are primarily bug fixes for 6.7.0. The major exception
421 is AMPI, which has seen changes to its extension APIs and now complies with more
422 of the MPI standard. A brief list of changes follows:
426 - Startup and exit sequences are more robust
428 - Error and warning messages are generally more informative
430 - CkMulticast's set and concat reducers work correctly
434 - AMPI's extensions have been renamed to use the prefix AMPI_ instead of MPI_
435 and to generally follow MPI's naming conventions
437 - AMPI_Migrate(MPI_Info) is now used for dynamic load balancing and all fault
438 tolerance schemes (see the AMPI manual)
440 - AMPI officially supports MPI-2.2, and also implements the non-blocking
441 collectives and neighborhood collectives from MPI-3.1
443 Platforms and Portability
445 - Cray regularpages build target has been fixed
447 - Clang compiler target for BlueGene/Q systems added
449 - Comm. thread tracing for SMP mode added
451 - AMPI's compiler wrappers are easier to use with autoconf and cmake
453 ================================================================================
454 What's new in Charm++ 6.7.0
455 ================================================================================
457 Over 120 bugs fixed, spanning areas across the entire system
461 - New API for efficient formula-based distributed sparse array creation
463 - CkLoop is now built by default
465 - CBase_Foo::pup need not be called from Foo::pup in user code anymore - runtime
466 code handles this automatically
468 - Error reporting and recovery in .ci files is greatly improved, providing more
469 precise line numbers and often column information
471 - Many data races occurring under shared-memory builds (smp, multicore) were
472 fixed, facilitating use of tools like ThreadSanitizer and Helgrind
476 - Further MPI standard compliance in AMPI allows users to build and run
477 Hypre-2.10.1 on AMPI with virtualization, migration, etc.
479 - Improved AMPI Fortran2003 PUP interface 'apup', similar to C++'s STL PUP
481 Platforms and Portability
483 - Compiling Charm++ now requires support for C++11 variadic templates. In GCC,
484 this became available with version 4.3, released in 2008
486 - New machine target for multicore Linux ARM7: multicore-linux-arm7
488 - Preliminary support for POWER8 processors, in preparation for the upcoming
489 Summit and Sierra supercomputers
491 - The charmrun process launcher is now much more robust in the face of slow
492 or rate-limited connections to compute nodes
494 - PXSHM now auto-detects the node size, so the '+nodesize' is no longer needed
496 - Out-of-tree builds are now supported
500 - CommLib has been removed.
502 - CmiBool has been dropped in favor of C++'s bool
505 ================================================================================
506 What's new in Charm++ 6.6.1
507 ================================================================================
509 Changes in this release are primarily bug fixes for 6.6.0. A concise list of
510 affected components follows:
514 - Reductions with syncFT
516 - mpicxx based MPI builds
518 - Increased support for macros in CI file
520 - GNI + RDMA related communication
522 - MPI_STATUSES_IGNORE support for AMPIF
524 - Restart on different node count with chkpt
526 - Immediate msgs on multicore builds
528 ================================================================================
529 What's new in Charm++ 6.6.0
530 ================================================================================
532 - Machine target files for Cray XC systems ('gni-crayxc') have been added
534 - Interoperability with MPI code using native communication interfaces on Blue
535 Gene Q (PAMI) and Cray XE/XK/XC (uGNI) systems, in addition to the universal
536 MPI communication interface
538 - Support for partitioned jobs on all machine types, including TCP/IP and IB
539 Verbs networks using 'netlrts' and 'verbs' machine layers
541 - A substantially improved version of our asynchronous library, CkIO, for
542 parallel output of large files
544 - Narrowing the circumstances in which the runtime system will send
545 overhead-inducing ReductionStarting messages
547 - A new fully distributed load balancing strategy, DistributedLB, that produces
548 high quality results with very low latency
550 - An API for applications to feed custom per-object data to specialized load
551 balancing strategies (e.g. physical simulation coordinates)
553 - SMP builds on LRTS-based machine layers (pamilrts, gni, mpi, netlrts, verbs)
554 support tracing messages through communication threads
556 - Thread affinity mapping with +pemap now supports Intel's Hyperthreading more
559 - After restarting from a checkpoint, thread affinity will use new
560 +pemap/+commap arguments
562 - Queue order randomization options were added to assist in debugging race
563 conditions in application and runtime code
565 - The full runtime code and associated libraries can now compile under the C11
566 and C++11/14 standards.
568 - Numerous bug fixes, performance enhancements, and smaller improvements in the
569 provided runtime facilities
572 * The long-unsupported FEM library has been deprecated in favor of ParFUM
573 * The CmiBool typedefs have been deleted, as C++ bool has long been universal
574 * Future versions of the runtime system and libraries will require some degree
575 of support for C++11 features from compilers
577 ================================================================================
578 What's new in Charm++ 6.5.0
579 ================================================================================
581 - The Charm++ manual has been thoroughly revised to improve its organization,
582 comprehensiveness, and clarity, with many additional example code snippets
585 - The runtime system now includes the 'Metabalancer', which can provide
586 substantial performance improvements for applications that exhibit dynamic
587 load imbalance. It provides two primary benefits. First, it automatically
588 optimizes the frequency of load balancer invocation, to avoid work stoppage
589 when it will provide too little benefit. Second, calls to AtSync() are made
590 less synchronous, to further reduce overhead when the load balancer doesn't
591 need to run. To activate the Metabalancer, pass the option +MetaLB at
592 runtime. To get the full benefits, calls to AtSync() should be made at every
593 iteration, rather than at some arbitrary longer interval as was previously
596 - Many feature additions and usability improvements have been made in the
597 interface translator that generates code from .ci files:
598 * Charmxi now provides much better error reports, including more accurate
599 line numbers and clearer reasons for failure, including some semantic
600 problems that would otherwise appear when compiling the C++ code or even at
602 * A new SDAG construct 'case' has been added that defines a disjunction over a
603 set of 'when' clauses: only one 'when' out of a set will ever be triggered.
604 * Entry method templates are now supported. An example program can be found
605 in tests/charm++/method_templates/.
606 * SDAG keyword "atomic" has been deprecated in favor of the newly supported
607 keyword "serial". The two are synonymous, but "atomic" is now provided only
608 for backward compatibility.
609 * It is no longer necessary to call __sdag_init() in chares that contain SDAG
610 code - the generated code does this automatically. The function is left as
611 a no-op for compatibility, but may be removed in a future version.
612 * Code generated from .ci files is now primarily in .def.h files, with only
613 declarations in .decl.h. This improves debugging, speeds compilation,
614 provides clearer compiler output, and enables more complete encapsulation,
615 especially in SDAG code.
616 * Mainchare constructors are expected to take CkArgMsg*, and always have
617 been. However, charmxi would allow declarations with no argument, and
618 assume the message. This is now deprecated, and generates a warning.
620 - Projections tracing has been extended and improved in various ways
621 * The trace module can generate a record of network topology of the nodes in
622 a run for certain platforms (including Cray), which Projections can
624 * If the gzip library (libz) is available when Charm++ is compiled, traces
625 are compressed by default.
626 * If traces were flushed as a results of filled buffers during the run, a
627 warning will be printed at exit to indicate that the user should be wary of
628 interference that may have resulted.
629 * In SMP builds, it is now possible to trace message progression through the
630 communication threads. This is disabled by default to avoid overhead and
631 potential misleading interpretation.
633 - Array elements can be block-mapped at the SMP node level instead of at the
634 per-PE level (option "+useNodeBlkMapping").
636 - AMPI can now privatize global and static variables using TLS. This is
637 supported in C and C++ with __thread markings on the variable declarations
638 and definitions, and in Fortran with a patched version of the gfortran
639 compiler. To activate this feature, append '-tls' to the '-thread' option's
640 argument when you link your AMPI program.
642 - Charm can now be built to only support message priorities of a specific data
643 type. This enables an optimized message queue within the the runtime
644 system. Typical applications with medium sized compute grains may not benefit
645 noticeably when switching to the new scheduler. However, this may permit
646 further optimizations in later releases.
648 The new queue is enabled by specifying the data type of the message
649 priorities while building charm using --with-prio-type=dtype. Here, dtype can
650 be one of char, short, int, long, float, double and bitvec. Specifying bitvec
651 will permit arbitrary-length bitvector priorities, and is the current default
652 mode of operation. However, we may change this in a future release.
654 - Converse now provides a complete set of wrappers for
655 fopen/fread/fwrite/fclose to handle EINTR, which is not uncommon on the
656 increasingly-popular Lustre. They are named CmiF{open,read,write,close}, and
657 are available from C and C++ code.
659 - The utility class 'CkEntryOptions' now permits method chaining for cleaner
660 usage. This applies to all its set methods (setPriority, setQueueing,
661 setGroupDepID). Example usage can be found in examples/charm++/prio/pgm.C.
663 - When creating groups or chare arrays that depend on the previous construction
664 of another such entity on the local PE, it is now possible to declare that
665 dependence to the runtime. Creation messages whose dependence is not yet
666 satisfied will be buffered until it is.
668 - For any given chare class Foo and entry method Bar, the supporting class's
669 member CkIndex_Foo::Bar() is used to lookup/specify the entry method
670 index. This release adds a newer API for such members where the argument is a
671 function pointer of the same signature as the entry method. Those new
672 functions are used like CkIndex_Foo::idx_Bar(&Foo::Bar). This permits entry
673 point index lookup without instantiating temporary variables just to feed the
674 CkIndex_Foo::Bar() methods. In cases where Foo::Bar is overloaded, &Foo::Bar
675 must be cast to the desired type to disambiguate it.
677 - CkReduction::reducerType now have PUP methods defined; and can hence be
678 passed as parameter-marshalled arguments to entry methods.
680 - The runtime option +stacksize for controlling the allocation of user-level
681 threads' stacks now accepts shorthanded annotation such as 1M.
683 - The -optimize flag to the charmc compiler wrapper now passes more aggressive
684 options to the various underlying compilers than the previous '-O'.
686 - The charmc compiler wrapper now provides a flag -use-new-std to enable
687 support for C11 and C++11 where available. To use this in application code,
688 the runtime system must have been built with that flag as well.
690 - When using, CmiMemoryUsage(), the runtime can be instructed not to use the
691 underlying mallinfo() library call, which can be inaccurate in settings where
692 usage exceeds INT_MAX. This is accomplished by setting the environment
693 variable "MEMORYUSAGE_NO_MALLINFO".
695 - Experimental Features
696 * Initial implementation of a fast message-logging protocol. Use option
697 'mlogft' to build it.
698 * Message compression support for persistent message on Gemini machine layer.
699 * Node-level inter-PE loop/task parallelization is now supported through
701 * New temperature/CPU frequency aware load balancer
702 * Support interoperation of Charm++ and native MPI code through dynamically
703 switching control between the two
704 * API in centralized load balancers to get and set PE speed
705 * A new scheme for optimization of double in-memory checkpoint/restart.
706 * Message combining library for improved fine-grained communication
708 * Support for partitioning of allocated nodes into subsets that run
709 independent Charm++ instances but can interact with each other.
711 Platform-Specific Changes
712 -------------------------
715 * The gemini_gni network layer has been heavily tuned and optimized,
716 providing substantial improvements in performance, scalability, and
718 * The gemini_gni-crayxe machine layer supports a 'hugepages' option at build
719 time, rather than requiring manual configuration file editing.
720 * Persistent message optimizations can be used to reduce latency and
722 * Experimental support for 'urgent' sends, which are sent ahead of any other
723 outgoing messages queued for transmission.
725 - IBM Blue Gene Q: Experimental machine-layer support for the native PAMI
726 interface and MPI, with and without SMP support. This supports many new
727 systems, including LLNL's Sequoia, ALCF's Mira, and FZ Juelich's Juqueen.
729 There are three network-layer implementations for these systems: 'mpi',
730 'pami', and 'pamilrts'. The 'mpi' layer is stable, but its performance and
731 scalability suffers from the additional overhead of using MPI rather than
732 driving the interconnect directly. The 'pami' layer is well tested for NAMD,
733 but has shown instability for other applications. It is likely to be replaced
734 by the 'pamilrts' layer, which is more generally stable and seems to provide
735 the same performance, in the next release.
737 In addition to the common 'smp' option to build the runtime system with
738 shared memory support, there is an 'async' option which sometimes provides
739 better performance on SMP builds. This option passes tests on 'pamilrts', but
740 is still experimental.
742 Note: Applications that have large number of messages may crash in default
743 setup due to overflow in the low-level FIFOs. Environment variables
744 MUSPI_INJFIFOSIZE and PAMI_RGETINJFIFOSIZE can be set to avoid application
745 failures due to large number of small and large messages respectively. The
746 default value of these variable is 65536 which is sufficient for 1000
749 - Infiniband Verbs: Better support for more flavors of ibverbs libraries
752 * Experimental rendezvous protocol for better performance above some MPI
754 * Some tuning parameters ("+dynCapSend" and "+dynCapRecv") are now
755 configurable at job launch, rather than Charm++ compilation.
757 - PGI C++: Disable automatic 'using namespace std;'
759 - Charm++ now supports ARM, both non-smp and smp.
761 - Mac OS X: Compilation options to build and link correctly on newer versions
764 ================================================================================
765 What's new in Charm++ 6.4.0
766 ================================================================================
768 --------------------------------------------------------------------------------
770 --------------------------------------------------------------------------------
772 - Cray XE and XK systems using the Gemini network via either MPI
773 (mpi-crayxe) or the native uGNI (gemini_gni-crayxe)
775 - IBM Blue Gene Q, using MPI (mpi-bluegeneq) or PAMI (pami-bluegeneq)
777 - Clang, Cray, and Fujitsu compilers
779 - MPI-based machine layers can now run on >64k PEs
781 --------------------------------------------------------------------------------
783 --------------------------------------------------------------------------------
785 - Added a new [reductiontarget] attribute to enable
786 parameter-marshaled recipients of reduction messages
788 - Enabled pipelining of large messages in CkMulticast by default
790 - New load balancers added:
793 * Scotch graph partitioning based: ScotchLB and Refine and Topo variants
796 - Load balancing improvements:
798 * Allow reduced load database size using floats instead of doubles
799 * Improved hierarchical balancer
800 * Periodic balancing adapts its interval dynamically
801 * User code can request a callback when migration is complete
802 * More balancers properly consider object migratability and PE
803 availability and speed
804 * Instrumentation records multicasts
806 - Chare arrays support options that can enable some optimizations
808 - New 'completion detection' library for parallel process termination
809 detection, when the need for modularity excludes full quiescence
812 - New 'mesh streamer' library for fine-grain many-to-many collectives,
813 handling message bundling and network topology
815 - Memory pooling allocator performance and resource usage improved
818 - AMPI: More routines support MPI_IN_PLACE, and those that don't check
821 ================================================================================
822 What's new in Charm++ 6.2.1 (since 6.2.0)
823 ================================================================================
825 --------------------------------------------------------------------------------
826 New Supported Platforms:
827 --------------------------------------------------------------------------------
829 POWER7 with LAPI on Linux
831 Infiniband on PowerPC
833 --------------------------------------------------------------------------------
835 --------------------------------------------------------------------------------
837 - Better support for multicasts on groups
838 - Topology information gathering has been optimized
839 - Converse (seed) load balancers have many new optimizations applied
840 - CPU affinity can be set more easily using +pemap and +commap options
841 instead of the older +coremap
842 - HybridLB (hierarchical balancing for very large core-count systems)
843 has been substantially improved
844 - Load balancing infrastructure has further optimizations and bug fixes
845 - Object mappings can be read from a file, to allow offline
846 topology-aware placement
847 - Projections logs can be spread across multiple directories, speeding
848 up output when dealing with thousands of cores (+trace-subdirs N
849 will divide log files evenly among N subdirectories of the trace
850 root, named PROGNAME.projdir.K)
851 - AMPI now implements MPI_Issend
852 - AMPI's MPI_Alltoall uses a flooding algorithm more agressively,
853 versus pairwise exchange
854 - Virtualized ARMCI support has been extended to cover the functions
857 --------------------------------------------------------------------------------
858 Architecture-specific changes
859 --------------------------------------------------------------------------------
861 - LAPI SMP has many new optimizations applied
863 - Net builds support the use of clusters' mpiexec systems for job
864 launch, via the ++mpiexec option to charmrun
866 ================================================================================
867 What's new in Charm++ 6.2.0 (since 6.1)
868 ================================================================================
870 --------------------------------------------------------------------------------
871 New Supported Platforms:
872 --------------------------------------------------------------------------------
874 64-bit MIPS, such as SiCortex, using mpi-linux-mips64
876 Windows HPC cluster, using mpi-win32/mpi-win64
878 Mac OSX 10.6, Snow Leopard (32-bit and 64-bit).
880 --------------------------------------------------------------------------------
882 --------------------------------------------------------------------------------
885 - Smarter build/configure scripts
886 - A new interface for model-based load balancing
887 - new CPU topology API
888 - a general implementation of CmiMemoryUsage()
889 - Bug fix: Quiescence detection (QD) works with immediate messages
890 - New reduction functions implemented in Converse
891 - CCS (Converse Client-Server) can deliver message to more than one processor
892 - Added a memory-aware adaptive scheduler, which can be optionally
894 - Added preliminary support for automatic message prioritization
895 (disabled by default)
898 - Cross-array and cross-group sections
899 - Structured Dagger (SDAG): Support templated arguments properly
900 - Plain chares support checkpoint/restart (both in-memory and disk-based)
901 - Conditional packing of messages and parameters in SMP scenario
902 - Changes to the CkArrayIndex class hierarchy
903 -- sizeof() all CkArrayIndex* classes is now the same
904 -- Codes using custom array indices have to use placement-new to construct
905 their custom index. Refer example code: examples/charm++/hello/fancyarray/
906 -- *** Backward Incompatibility ***
907 CkArrayIndex[4D/5D/6D]::index are now of type int (instead of short)
908 However the data is stored as shorts. Access by casting
909 CkArrayIndexND::data() appropriately
910 -- *** Deprecated ***
911 The direct use of public data member
912 CkArrayIndexND::index (N=1..6) is deprecated. We reserve the right to
913 change/remove this variable in future releases of Charm++.
914 Instead, please access the indices via member function:
915 int CkArrayIndexND::data()
918 - Compilers renamed to avoid collision with host MPI (ampicc, ampiCC,
920 - Improved MPI standard conformance, and documentation of non-conformance
921 * Bug fixes in: MPI_Ssend, MPI_Cart_shift, MPI_Get_count
922 * Support MPI_IN_PLACE in MPI_(All)Reduce
923 * Define various missing constants
924 - Return the received message's tag in response to a non-blocking
925 wildcard receive, to support SuperLU
926 - Improved tracing for BigSim
928 Multiphase Shared Arrays (MSA)
929 - Typed handles to enforce phases
930 - Split-phase synchronization to enable message-driven execution
934 - Automatic tracing of API calls for simulation and analysis
937 - Wider support for architectures other than net- (in particular MPI layers)
938 - Improved support for large scale debugging (better scalability)
939 - Enhanced record/replay stability to handle various events, and to
940 signal unexpected messages
941 - New detailed record/replay: The full content of messages can be
942 recorded, and a single processor can be re-executed outside of the
946 - Tracing of nested entry methods
948 Automatic Performance Tuning
949 - Created an automatic tuning framework [still for experimental use only]
952 - Network-topology / node aware spanning trees used internally for and
953 lower bytes on the network and improved performance in multicasts and
954 reductions delegated to this library
957 - Improved OneTimeMulticastStrategy classes
960 - Out-of-core support, with prefetching capability
961 - Detailed tracing of MPI calls
962 - Detailed record/replay support at emulation time, capable of
963 replaying any emulated processor after obtained recorded logs.
965 --------------------------------------------------------------------------------
966 Architecture-specific changes
967 --------------------------------------------------------------------------------
970 - Can run jobs with more than 1024 PEs
973 - New charmrun option ++no-va-randomization to disable address space
974 randomization (ASLR). This is most useful for running AMPI with
978 - Default to using ampicxx instead of mpiCC
981 - The +p option now has the same semantics as in other smp builds
984 - Support for VSX in SIMD abstraction API
987 - Compilers and options have been updated to the latest ones
990 - Added routines for measuring performance counters on BG/P.
991 - Updated to support latest DCMF driver version. On ANL's Intrepid, you may
992 need to set BGP_INSTALL=/bgsys/drivers/V1R4M1_460_2009-091110P/ppc in your
993 environment. This is the default on ANL's Surveyor.
996 - cputopology information is now available on XT3/4/5
999 - Bug fix: plug memory leaks that caused failures in long runs
1000 - Optimized to reduce startup delays
1003 - Support for SMP (experimental)
1006 ================================================================================
1007 Note that changes from 5.9, 6.0, and 6.1 are not documented here. A partial list
1008 can be found on the charm download page, or by reading through version control
1011 ================================================================================
1012 What's New since Charm++ 5.4 release 1
1013 ================================================================================
1015 --------------------------------------------------------------------------------
1016 New Supported Platforms:
1017 --------------------------------------------------------------------------------
1018 1. Charm++ ported to IA64 Itanium running Win2K and Linux, Charm++ also support
1019 Intel C/C++ compilers;
1021 2. Charm++ ported to Power Macintosh powerpc running Darwin;
1023 3. Charm++ ported to Myrinet networking with GM API;
1025 --------------------------------------------------------------------------------
1026 Summary of New Features:
1027 --------------------------------------------------------------------------------
1029 Structured Dagger is a coordination language built on top of CHARM++.
1030 Structured Dagger allows easy expression of dependences among messages and
1031 computations and also among computations within the same object using
1032 when-blocks and various structured constructs.
1034 2. Entry functions support parameter marshalling
1035 Now you can declare and invoke remote entry functions using parameter
1036 marshalling instead of defining messages.
1038 3. Easier running - standalone mode
1039 For net-* version running locally, you can now run Charm programs without
1040 charmrun. Running a node program directly from command line is now the
1041 same as "charmrun +p1 <program>"; for SMP version, you can also specify
1042 multiple (local) processors, as in "program +p2".
1045 --------------------------------------------------------------------------------
1047 --------------------------------------------------------------------------------
1048 1. "build" changed for compilation of Charm++
1049 To build Charm++ from scratch, we now take additional command line options
1050 to compile with addon features and using different compilers other than gcc.
1051 For example, to build Linux IA64 with Myrinet support, type command:
1052 ./build net-linux-ia64 gm
1055 ******* Old Change histories *******
1058 ================================================================================
1059 What's New in Charm++ 5.4 release 1 since 5.0
1060 ================================================================================
1062 --------------------------------------------------------------------------------
1063 New Supported Platforms:
1064 --------------------------------------------------------------------------------
1066 1. Win9x/2000/NT: with Visual C++ or Cygwin gcc/g++, you can compile and run
1067 Charm++ programs on all Win32 platforms.
1069 2. Scyld Beowulf: Charm++ has been ported to the Linux-based Scyld Beowulf
1070 operating system. For more information on Scyld, see <http://www.scyld.com>
1072 3. MPI with VMI: Charm++ has been ported to NCSA's Virtual Machine Interface,
1073 which is an efficient messaging library for heterogeneous cluster
1077 --------------------------------------------------------------------------------
1078 Summary of New Features:
1079 --------------------------------------------------------------------------------
1080 1. Dynamic Load balancing:
1081 Chare migration is supported in the new release. Migration-based dynamic
1082 load balancing framework with various load balancing strategies library has
1086 Charm++ array is supported. You can now create an array of Chare objects
1087 and use array index to refer the Charm++ array elements. A reduction
1088 library on top of Chare array has been implemented and included.
1091 Projections, a Java application for Charm++ program performance analysis and
1092 visualization, has been included and distributed in the new release. Two
1093 trace modes are available: trace-projections and trace-summary. Trace-summary
1094 is a light-weight trace library compared to trace-projections.
1097 AMPI is a load-balancing based library for porting legacy MPI applications
1098 to Charm++. With few changes in the original MPI code to AMPI, the new
1099 legacy MPI application on Charm++ will gain from Charm++'s adptive
1100 load balancing ability.
1102 5. Easier invocation
1103 "Charmrun" is now available on all platforms, with a uniform command line
1104 syntax. You can forget the difference between net-* versions and MPI versions,
1105 and run charm++ application with this same charmrun command syntax.
1106 ++local option is added in charmrun for net-* version, it provides
1107 simple local use of Charm and no longer require the ability to
1108 "rsh localhost" or a nodelist file in order to run charm only on the local
1109 machine. This is especially attractive when you run Charm++ on Windows.
1112 Many new libraries have been added in this release. They include:
1113 1) master-slave library: for writing manager-worker paradigm programs.
1114 2) receiver library: provide asynchronous communication mode for chare array.
1115 3) f90charm: provides Fortran90 bindings for Charm++ Array.
1116 4) BlueGene: a Charm++/Converse emulator for IBM proposed Blue Gene.
1118 --------------------------------------------------------------------------------
1120 --------------------------------------------------------------------------------
1121 1. message declaration syntax in .ci file:
1122 The message declaration syntax for packed/varsize messages has been changed.
1123 The packed/varsize keywords are eliminated, and you can specify the actual
1124 actual varsize arrays in the interface file and have the translator generate
1125 alloc, pack and unpack.
1128 Here is the detailed list of Changes:
1130 --------------------------------------------------------------------------------
1132 --------------------------------------------------------------------------------
1134 10/06/1999 rbrunner Added migration-based dynamic load balancing
1136 11/15/1999 olawlor Added reduction support foe Charm++ arrays
1137 02/06/2000 milind Added AMPI, an implementation of MPI with
1138 dynamic load balancing
1139 02/18/2000 paranjpy New platforms supported: net-win32, and net-win32-smp
1140 04/04/2000 olawlor Added arbitrarily indexed Charm++ arrays.
1141 Also, added translator support for new arrays.
1142 04/15/2000 olawlor Added "puppers" for packing and unpacking
1144 06/14/2000 milind Added the threaded FEM framework.
1146 --------------------------------------------------------------------------------
1148 --------------------------------------------------------------------------------
1150 10/09/1999 rbrunner Added packlib, a library for C and C++ to
1151 pack-unpack data to/from Charm++ messages.
1152 10/13/1999 gzheng New LB strategy: RefineLB
1153 10/13/1999 paranjpy New LB Strategy: Heap
1154 10/14/1999 milind New LB Strategy: Metis
1155 10/19/1999 olawlor New test program for testing LB strategies.
1156 10/21/1999 gzheng New trace mode: trace-summary
1157 10/28/1999 milind New supported platform: net-sol-x86
1158 10/29/1999 milind Added runtime checks for ChareID assignment.
1159 11/10/1999 rbrunner Added Neighborhood base strategy for LB
1161 11/15/1999 olawlor conv-host now reads in a startup file
1163 11/15/1999 olawlor New test program for testing array reductions.
1164 11/16/1999 rbrunner Added processor-speed checking functions to
1166 11/19/1999 milind Mapped SIGUSR to a Ccd condtion handler
1167 11/22/1999 rbrunner New LB strategy: WSLB
1168 11/29/1999 ruiliu Modified Metis LB strategy to deal with
1169 different processor speeds
1170 12/16/1999 rbrunner New LB strategy: GreedyRef
1171 12/16/1999 rbrunner New LB strategy: RandRef
1172 12/21/1999 skumar2 New LB strategy: CommLB
1173 01/03/2000 rbrunner New LB strategy: RecBisectBfLB
1174 01/08/2000 skumar2 New LB strategy: Comm1LB, with varying processor
1176 01/18/2000 milind Modified SM library syntax, and added a test
1178 01/19/2000 gzheng Added irecv, a library to simplify conversion
1179 of message-passing programs to Charm++
1180 02/20/2000 olawlor Added preliminary broadcast support to Charm++
1182 02/23/2000 paranjpy Added converse-level quiescence detection
1183 03/02/2000 milind Added ++server-port option to pre-specify
1185 03/10/2000 wilmarth Random seed-based load balancer now uses
1186 bit-vector for active PEs.
1187 03/21/2000 gzheng Added support for marking user-defined events
1189 03/28/2000 wilmarth Added CMK_TRUECRASH. Very helpful for
1190 post-mortem debugging of Charm++ programs on
1192 03/31/2000 jdesouza Added Fortran90 support to the Charm++
1193 interface translator.
1194 03/09/2000 milind Added support for -LANG and -rpath options
1195 in charmc for Origin2000.
1196 04/28/2000 milind Added prioritized converse threads.
1197 05/01/2000 milind Added test programs for TeMPO, AMPI and irecv.
1198 05/04/2000 milind New supported platform: mpi-sp.
1199 05/04/2000 gzheng Added irecv pingpong program.
1200 05/17/2000 olawlor Each chare, group and array element now has to
1201 have migration constructor.
1202 05/24/2000 milind Added Jacobi3D programs for irecv and AMPI both.
1203 05/24/2000 milind Made migratable an optional attribute of
1204 chares, groups, and nodegroups.
1205 Arrays are by default migratable.
1206 05/29/2000 paranjpy Added pup methods to arrays, reductions etc
1208 06/13/2000 milind Made CtvInitialize idempotent. That is, it
1209 can be called by any number of threads now,
1210 only the first one will actually do
1212 06/20/2000 milind Added a simple test program for the FEM
1214 07/06/2000 milind Imported Metis 4.0 sources in the CVS tree.
1215 Also added code to make metis libraries and
1216 executables to Makefile.
1217 07/07/2000 milind Added more meaningfull error messages using
1218 perror in addition to a cryptic error codes in
1220 07/10/2000 milind fem and femf are now recognized as "languages"
1222 07/10/2000 saboo Added the derived datatypes library.
1223 07/13/2000 milind Added +idle_timeout functionality. It takes a
1224 commandline parameter denoting milliseconds of
1225 maximum consecutive idle time allowed per
1227 07/14/2000 milind Added group multicast. Added
1228 CkSendMsgBranchMulti, CldEnqueueMulti, and
1229 translator changes to support it.
1230 07/14/2000 milind SUPER_INSTALL now takes "-*" arguments prior
1231 to the target, that will be passed to make as
1232 "makeflags". This makes it easy to suppress
1233 make's output of commands etc (with the -s
1234 flag). As a result of this, several Makefiles
1236 07/18/2000 milind Added support for using "dbx" on suns as
1238 07/19/2000 milind Added ability to tracemode projections which
1239 produces binary trace files. Use flag
1240 +binary-trace on the command line.
1241 07/26/2000 milind Separated AMPI from TeMPO.
1242 07/28/2000 milind Added test programs to test reduce, alltoall
1243 and allreduce functionality of AMPI.
1244 08/02/2000 milind Added an option to let the user specify which
1245 "xterm" to use. For example, on some systems
1246 (CDE), only dtterm is installed. So, by
1247 putting ++xterm dtterm on the conv-host
1248 commandline, one can use dtterm when ++in-xterm
1249 option is specified on conv-host commandline.
1250 08/14/2000 milind FEM Framework: Added capabilities to handle
1251 esoteric meshes to standalone offline programs.
1252 Makefile now produces gmap and fgmap programs,
1253 which are used for this purpose. They convert
1254 the mesh to a graph before partitioning it
1256 08/24/2000 milind Added the 2D crack propagation program as a
1257 test program for FEM framework.
1258 08/25/2000 milind Initial implementation of isomalloc-based
1259 threads. This implementation uses a fixed
1260 stack size for all threads (can be set at
1262 08/26/2000 milind Added a macro CtvAccessOther that lets you
1263 get/set a Ctv variable of any thread. It
1264 should be invoked as CtvAccessOther(thread,
1265 varname); Added CthGetData function to each of
1266 the threads implementation. This function is
1267 used in the CtvAccessOther macro.
1268 08/27/2000 milind FEM Framework: Separated mesh to graph
1269 conversion capability into a separate program.
1270 This way, the generated graph can be partitioned
1272 09/04/2000 milind Added the class static readonly variables to
1274 09/05/2000 milind FEM Framework: A very fast O(n) algorithm for
1275 mesh2graph , uses more memory, but the tradeoff
1276 was worth it. Coded by Karthik Mahesh, minor
1277 optimizations by Milind.
1278 09/05/2000 milind Added a barebones charm kernel scheduling
1279 overhead measurement program.
1280 09/15/2000 milind Added pup support for AMPI and FEM framework.
1281 09/20/2000 olawlor Added capability to have an array of base type
1282 where individual element could be of derived
1284 10/03/2000 gzheng New supported platform: net-linux-axp
1285 10/05/2000 skumar2 Added program littleMD to the test suite.
1286 10/07/2000 skumar2 New job scheduler (Faucets projects).
1287 10/15/2000 milind Improved support for Fortran90 in charmc.
1288 11/04/2000 jdesouza Made the Faucets scheduler multi-threaded.
1289 11/05/2000 olawlor FEM Framework: supports multiple element types,
1290 mesh re-assembly, etc.
1291 11/15/2000 gzheng New platform support: net-cygwin
1292 11/18/2000 gzheng conv-host no longer needs /bin/csh to start
1294 CMK_CONV_HOST_CSH_UNAVAILABLE to 1 to use
1296 11/25/2000 milind Finished experimental implementation of
1297 converse-threads based on co-operative pthreads.
1298 11/25/2000 milind Added a benchmark suite of all pingpongs in
1300 11/28/2000 milind Removed deletion of _idx at the end of every
1301 send or doneInserting call. Instead now it is
1302 in the destructor of the proxy. This allows us
1303 to cache proxies, when proxy creation becomes
1305 11/28/2000 olawlor Added "seek blocks" to puppers. This should
1306 allow out-of-order pup'ing without the ugliness
1307 of getBuf; and in a way that works with all
1309 11/29/2000 olawlor Simplified and regularized command-line-argument
1311 11/29/2000 milind AMPI: Added multiple-communicators capability.
1312 12/05/2000 gzheng Now /bin/sh is default shell to fork node
1313 program on remote machines.
1314 12/13/2000 olawlor Added charmrun wrapper for poe on mpi-sp.
1315 12/14/2000 milind Added bluegene emulator sources and test
1316 programs. Added "bluegene" as a language known
1317 to charmc. Makefile now has a target called
1318 bluegene. Added preliminary bluegene
1319 documentation. (copied from Arun's webpage.)
1320 12/15/2000 gzheng f90charm addition to Makefile and charmc. Also,
1321 added fixed size arrays support to f90charm. A
1322 test program f90charm/hello is checked in.
1323 12/17/2000 milind Added rtest test program. Contributed by jim to
1324 test Converse message transmission.
1325 12/20/2000 olawlor Added charmconfig script. Enables automatic
1326 determination of C++ compiler properties,
1327 replacing the verbose and error-prone
1328 conv-mach.h entries for CMK_BOOL,
1329 CMK_STL_USE_DOT_H, CMK_CPP_CAST_OK, ...
1330 12/20/2000 olawlor Charm++ Arrays optimizations: Key and object
1331 now variable-length fields, instead of pointers.
1332 This extra flexibility lets us save many
1333 dynamic allocations in the array framework.
1334 12/20/2000 olawlor Added PUP::able support-- dynamic type
1335 identification, allocation, and deletion.
1336 Allows you to write: p(objPtr); and
1337 objPointer will be properly identified,
1338 allocated, packed, and deallocated (depending
1339 on the PUP::er). Requires you to register any
1340 such classes with DECLARE_PUPable and
1342 12/20/2000 olawlor Arrays optimizations: Made CkArrayIndex
1343 fixed-size. This significantly improves
1344 messaging speed (7 us instead of 10 us
1345 roundtrip). Move spring cleaning check into a
1346 CcdCallFnAfter, which gains more speed (down to
1348 12/20/2000 olawlor More optimizations: Minor speed tweaks--
1349 conv-ccs.c uses hashtable for handler lookup;
1350 conv-conds skips timer test until needed;
1351 convcore.c scheduler loop optmizations (no
1352 superfluous EndIdle calls); threads.c
1353 CMK_OPTIMIZE-> no mprotect.
1354 12/20/2000 olawlor More Optimizations: Minor speed tweaks-- ck.C
1355 groups cldEnqueue skip; init.h defines
1356 CkLocalBranch inline; and supporting changes.
1357 12/22/2000 gzheng IA64 support for Converse user level threads.
1358 01/02/2001 olawlor CCS: Minor update-- enabled CcsProbe, cleaned
1359 up superflous debug messages in server, added
1360 Java interface (originally written for
1362 01/09/2001 gzheng charmconfig converted to autoconf style, need
1363 to change configure.in and conv-autoconfig.h.in,
1364 and run autoconf to get configure and copy to
1365 charmconfig. added fortran subroutine name
1366 test and get libpthread.a
1367 01/10/2001 milind Added telnet method of getting libpthread.a
1368 from charm webserver.
1369 01/11/2001 olawlor Moved projections files here from
1370 CVSROOT/projections-java. Added fast Java
1371 versions of the .log file input routines in
1372 LogReader, LogLoader, LogAnalyzer, and
1373 UsageCalc. Added "U.java" user interface
1374 utility file, allowing times to be input in
1375 seconds, milliseconds, or microseconds,
1376 instead of just microseconds.
1377 01/15/2001 gzheng add +trace-root to specify the directory to
1378 put log files in. this is need in Scyld cluster
1379 where there is no NFS mounting and no i/o
1380 access to home directory sharing on nodes.
1381 01/15/2001 milind Made AMPI into a f90 module instead of
1382 'ampif.h' inclusion. AMPI f90 bindings are
1383 now more inclusive. Fixed argc,argv handling
1384 bugs in ArgsInfo message. Fixed a bug in pup
1385 that caused thread not to be sized, but was
1386 packed nevertheless. Moved irecv to waitall
1387 instead of at in ampi_start. Made
1388 AMPI_COMM_WORLD to be 0, because it clashed
1389 with wildcard(-1). AMPI_COMM_UNIVERSE is now
1390 handled properly in the AMPI module.
1391 C/C++ data members are NOT visible to
1393 01/18/2001 gzheng New supported platform: net-linux-scyld
1394 01/20/2001 olawlor Moved array index field from CMessage_* to the
1395 Ck envelope itself. This is the right thing
1396 to do, because any message may be sent to/from
1397 an array element. To reduce the wasted space
1398 in a message, a union is used to overlay the
1399 fields for the various possible message types.
1400 01/29/2001 olawlor Freed charmrun on net-* version from using
1401 remote shell to fork off processes. One can now
1402 use a daemon provided in the distribution.
1403 02/07/2001 olawlor Added debugging support to puppers.
1404 02/13/2000 gzheng Added ++local option to charmrun to start node
1405 program locally without any daemon; fix the
1406 hang program if you type wrong pgm name in
1407 scyld version, and redirect all output to
1408 /dev/null, otherwise all node program can send
1409 its output to console in scyld. Also implemented ++local in net-win32 version.
1410 02/26/2000 milind Changed the varsize syntax. Now one can specify
1411 actual varsize arrays in the interface file
1412 and have the translator generate alloc, pack
1415 --------------------------------------------------------------------------------
1417 --------------------------------------------------------------------------------
1419 10/29/1999 milind Replaced jmemcpy by memcpy in net versions, as
1420 it was causing a bit to flip (bug reported
1422 10/29/1999 milind Fixed multiline macros in all header files.
1423 02/05/2000 milind Fixed linking errors by getting the order of
1424 libraries right from the charmc command-line.
1425 02/18/2000 paranjpy Fixed Charm++ initialization bug on SMPs.
1426 02/21/2000 milind Fixed a context-switching bug in mipspro version
1428 02/25/2000 milind Charm++ interface translator was segfaulting
1429 on interface file errors. Fixed that. Also,
1430 added linenumbers to error messages.
1431 03/02/2000 milind Made CCS work on SMPs.
1432 03/07/2000 milind Made ConverseInit consistent with the manual on
1434 04/18/2000 milind Fixed a bug in CkWaitFuture, which was caching
1435 a variable locally, while it was changed by
1437 05/04/2000 paranjpy Fixed argv deletion bug on net-win32-smp.
1438 06/08/2000 milind sp3 version: changed optimization flags, which
1439 where power2 processor-specific.
1440 06/20/2000 milind mpi-* versions: Fixed ConverseExit since it was
1441 not obeying the following statement in the MPI
1442 standard: The user must ensure that all pending
1443 communications involving a process completes
1444 before the process calls MPI_FINALIZE.
1445 07/05/2000 milind Fixed a nasty bug in charmc in the -cp option.
1446 It used to append the name provided to -o flag
1447 to the directory provided to the -cp flag.
1448 Thus, -o ../pgm -cp ../bin options meant that
1449 the pgm would be copied to ../bin/.., which is
1450 not the expected behavior. This fix correctly
1451 copies pgm to ../bin.
1452 07/07/2000 milind Removed variable arg_myhome, as it was not
1453 being used anywhere, and also, setting it was
1454 causing problems of env var HOME was not set.
1455 07/27/2000 milind thishandle for the arrayelement was not being
1456 correctly set. Bug was reported by Neelam.
1457 08/26/2000 milind Origin2000: Changed the page alignment to
1458 reflect the mmap alignment. The mmap man page
1459 specifically states that it is not the same as
1461 09/02/2000 milind Fixed a bug in code generated for threaded
1462 (void) entry methods of array elements. The
1463 dummy message that is passed to that method in
1464 a thread has to be deleted before calling the
1465 object method, because upon object method's
1466 return, the thread might have migrated.
1467 09/03/2000 olawlor Minor fix-fixes: 1.) Change to LBObjid hash
1468 function would fail for >4-int object indices.
1469 Replaced with proper function, which also
1470 preserves the 1-int case. 2.) Array element
1471 sends must go via the message queue to prevent
1472 stack build-up for deep single-processor call
1473 chains. These might happen, e.g., in a driver
1474 element calling itself for the main time loop.
1475 Messages are now properly noted as sent, then
1476 wait through the queue for delivery. This
1477 entailed minor reorganization of the message
1479 09/21/2000 olawlor Tiny SMP thread fix-- registrations of a
1480 thread-private variable now reserve space on
1481 calls after the first. This wastes space for
1482 multiple CthInitialize's-- it's a quick hack to
1483 get threads working again on SMP versions.
1484 10/16/2000 olawlor A few CCS fixes: -Added split-phase reply
1485 (delay reply indefinitely) -Cleaned up error
1486 handling -Pass user data as "void *" instead of
1488 11/03/2000 wilmarth Removed 0 size array allocation in Charm++
1489 quiescence detection.
1490 11/20/2000 gzheng Rewrote part of Fiber thread, including a bug
1491 fix for a the non thread-safe function, and a
1492 different fiber free strategy.
1493 11/29/2000 gzheng The LB init procedure tried to allocate
1494 65536*160 as initial size, which is 10M memory
1495 for communication table, which is too big.
1496 Cut it down to roughly 1M, and it can expand
1498 12/05/2000 gzheng In many cases, conv-host exits without print
1499 out the error message from remote shell. try
1500 to fix it by calling sync to flush the pipe
1502 12/10/2000 milind net-linux: Made static linking the default
1503 option because dynamic linking runtime causes
1504 isomalloc threads to crash.
1505 12/18/2000 milind Increased portability of isomalloc threads by
1506 removing dependence on alloca.
1507 12/28/2000 milind Fixed ctrl-getone abort bug on SMP.
1508 12/28/2000 milind Made _groupTable a pointer on which a
1509 constructor is explicitly called. Since it
1510 was a Cpv variable, its constructor was not
1511 called by default in case of an SMP version.
1512 12/29/2000 olawlor Prevent infinite copy constructor recursion on
1514 01/10/2001 olawlor Added "explicit" keyword to remove ambiguity
1515 for KCC, which was confused by the private
1516 PUP::er(int) "cast" constructor and the operator
1517 |(PUP::er &p,T &t) into rejecting all operator|
1518 (int,int) as ambiguous.
1519 2001/01/17 gzheng fix the charmconfig bug on paragon-red: the
1520 failure testing of fortran won't stop the
1522 01/20/2001 olawlor Arrays reduction: Fixed bug-- reduction may end
1523 because all contributors migrate away.
1524 01/29/2001 olawlor Fix heap-corrupting bug-- call ->init() on
1525 nodeGroupTable, which sets the "pending"
1526 message queue to NULL. This prevents a nasty
1527 delete-unitialized-data bug later on. Also
1528 delayed queue creation until messages actually
1531 --------------------------------------------------------------------------------
1532 Documentation Changes:
1533 --------------------------------------------------------------------------------
1535 01/31/2000 milind Installation manual: Fixed bugs pointed out by
1537 02/28/2000 wilmarth Added a new look Charm++ manual.
1538 06/20/2000 milind Added pdflatex support to generate PDF versions
1539 of manuals from LaTeX sources.
1540 12/05/2000 milind Added Orion's FEM manual. Converted from HTML.
1541 12/10/2000 milind Added pplmanual.sty for all manuals.
1542 12/17/2000 milind Added master-slave library documentation to
1544 12/21/2000 saboo Added DDT documentation.
1545 01/02/2001 olawlor Updated for new CCS version.
1547 --------------------------------------------------------------------------------
1549 --------------------------------------------------------------------------------
1551 10/24/1999 olawlor charmc is changed to Bourne shell script
1552 instead of csh. All conv-mach.csh are
1553 replaced by conv-mach.sh.
1554 10/25/1999 olawlor SUPER_INSTALL is converted to use bourne shell.
1555 10/28/1999 milind All Makefiles now take OPTS commandline
1557 01/16/2000 olawlor Simplified Charm++ interface translator.
1558 02/23/2000 ruiliu Changed rand() calls from all over the codes
1559 to the new Converse random number generator.
1560 02/26/2000 milind Simplified the converse scheduler loop by
1561 combining the maxmsgs and poll modes.
1562 08/31/2000 milind Imported system documentation into the CVS tree.
1563 Also added super_install target for docs with
1564 necessary Makefile modifications.
1565 09/08/2000 olawlor Made soft links use relative pathnames instead
1566 of absolute. This lets you move a charm++
1567 installation without having to recompile
1569 09/11/2000 olawlor Grouped commonly needed code in the new util
1570 directory. Also, added pup_c a C wrapper for
1572 09/11/2000 olawlor Slightly reorganized header structure. Now no
1573 headers should need to be listed twice (once in
1574 ALLHEADERS, again in CKHEADERS). Now headers
1575 are soft-linked instead of copied. This makes
1576 development much easier. Added support for the
1577 new Common/util directory.
1578 09/21/2000 olawlor Major reorganization of net-* codes. Now all
1579 the TCP socket routines are in separate files.
1580 Also combined windoes NT code with unix codes.
1581 09/21/2000 olawlor Major rewrite of CCS-- underlying protocol is
1582 now binary (send/recv binary data everywhere);
1583 conv-host forwards requests to nodes; and
1584 source has been significantly re-arranged.
1585 (especially if NODE_0_IS_CONVHOST).
1586 11/22/2000 milind Removed IDL translator from distribution.
1587 12/01/2000 olawlor Renamed conv-host charmrun; added test for
1588 script conv-host. Also added charmrun for most
1590 12/17/2000 milind Moved List related data structures into
1591 cklists.h in util. Removed most of the redundant
1592 list implementations.
1593 12/20/2000 gzheng SUPER_INSTALL: format the output of list of
1594 versions and make the help page fit into one
1596 12/24/2000 milind Added test-{charm,converse,ampi,fem} targets to
1598 12/28/2000 milind net-sol-smp now uses pthreads.
1599 01/29/2001 olawlor Merged windowsNT and unix build procedures by
1600 basing the Windows build on cygwin. Added
1601 scripts to deal with unix and windows