1 This file describes the most significant changes. For more detail, use
2 'git log' on a clone of the charm repository.
4 ================================================================================
5 What's new in Charm++ 6.7.1
6 ================================================================================
8 Changes in this release are primarily bug fixes for 6.7.0. The major exception
9 is AMPI, which has seen changes to its extension APIs and now complies with more
10 of the MPI standard. A brief list of changes follows:
14 - Startup and exit sequences are more robust
16 - Error and warning messages are generally more informative
18 - CkMulticast's set and concat reducers work correctly
22 - AMPI's extensions have been renamed to use the prefix AMPI_ instead of MPI_
23 and to generally follow MPI's naming conventions
25 - AMPI_Migrate(MPI_Info) is now used for dynamic load balancing and all fault
26 tolerance schemes (see the AMPI manual)
28 - AMPI officially supports MPI-2.2, and also implements the non-blocking
29 collectives and neighborhood collectives from MPI-3.1
31 Platforms and Portability
33 - Cray regularpages build target has been fixed
35 - Clang compiler target for BlueGene/Q systems added
37 - Comm. thread tracing for SMP mode added
39 - AMPI's compiler wrappers are easier to use with autoconf and cmake
41 ================================================================================
42 What's new in Charm++ 6.7.0
43 ================================================================================
45 Over 120 bugs fixed, spanning areas across the entire system
49 - New API for efficient formula-based distributed spare array creation
51 - CkLoop is now built by default
53 - CBase_Foo::pup need not be called from Foo::pup in user code anymore - runtime
54 code handles this automatically
56 - Error reporting and recovery in .ci files is greatly improved, providing more
57 precise line numbers and often column information
59 - Many data races occurring under shared-memory builds (smp, multicore) were
60 fixed, facilitating use of tools like ThreadSanitizer and Helgrind
64 - Further MPI standard compliance in AMPI allows users to build and run
65 Hypre-2.10.1 on AMPI with virtualization, migration, etc.
67 - Improved AMPI Fortran2003 PUP interface 'apup', similiar to C++'s STL PUP
69 Platforms and Portability
71 - Compiling Charm++ now requires support for C++11 variadic templates. In GCC,
72 this became available with version 4.3, released in 2008
74 - New machine target for multicore Linux ARM7: multicore-linux-arm7
76 - Preliminary support for POWER8 processors, in preparation for the upcoming
77 Summit and Sierra supercomputers
79 - The charmrun process launcher is now much more robust in the face of slow
80 or rate-limited connections to compute nodes
82 - PXSHM now auto-detects the node size, so the '+nodesize' is no longer needed
84 - Out-of-tree builds are now supported
88 - CommLib has been removed.
90 - CmiBool has been dropped in favor of C++'s bool
93 ================================================================================
94 What's new in Charm++ 6.6.1
95 ================================================================================
97 Changes in this release are primarily bug fixes for 6.6.0. A concise list of
98 affected components follows:
102 - Reductions with syncFT
104 - mpicxx based MPI builds
106 - Increased support for macros in CI file
108 - GNI + RDMA related communication
110 - MPI_STATUSES_IGNORE support for AMPIF
112 - Restart on different node count with chkpt
114 - Immediate msgs on multicore builds
116 ================================================================================
117 What's new in Charm++ 6.6.0
118 ================================================================================
120 - Machine target files for Cray XC systems ('gni-crayxc') have been added
122 - Interoperability with MPI code using native communication interfaces on Blue
123 Gene Q (PAMI) and Cray XE/XK/XC (uGNI) systems, in addition to the universal
124 MPI communication interface
126 - Support for partitioned jobs on all machine types, including TCP/IP and IB
127 Verbs networks using 'netlrts' and 'verbs' machine layers
129 - A substantially improved version of our asynchronous library, CkIO, for
130 parallel output of large files
132 - Narrowing the circumstances in which the runtime system will send
133 overhead-inducing ReductionStarting messages
135 - A new fully distributed load balancing strategy, DistributedLB, that produces
136 high quality results with very low latency
138 - An API for applications to feed custom per-object data to specialized load
139 balancing strategies (e.g. physical simulation coordinates)
141 - SMP builds on LRTS-based machine layers (pamilrts, gni, mpi, netlrts, verbs)
142 support tracing messages through communication threads
144 - Thread affinity mapping with +pemap now supports Intel's Hyperthreading more
147 - After restarting from a checkpoint, thread affinity will use new
148 +pemap/+commap arguments
150 - Queue order randomization options were added to assist in debugging race
151 conditions in application and runtime code
153 - The full runtime code and associated libraries can now compile under the C11
154 and C++11/14 standards.
156 - Numerous bug fixes, performance enhancements, and smaller improvements in the
157 provided runtime facilities
160 * The long-unsupported FEM library has been deprecated in favor of ParFUM
161 * The CmiBool typedefs have been deleted, as C++ bool has long been universal
162 * Future versions of the runtime system and libraries will require some degree
163 of support for C++11 features from compilers
165 ================================================================================
166 What's new in Charm++ 6.5.0
167 ================================================================================
169 - The Charm++ manual has been thoroughly revised to improve its organization,
170 comprehensiveness, and clarity, with many additional example code snippets
173 - The runtime system now includes the 'Metabalancer', which can provide
174 substantial performance improvements for applications that exhibit dynamic
175 load imbalance. It provides two primary benefits. First, it automatically
176 optimizes the frequency of load balancer invocation, to avoid work stoppage
177 when it will provide too little benefit. Second, calls to AtSync() are made
178 less synchronous, to further reduce overhead when the load balancer doesn't
179 need to run. To activate the Metabalancer, pass the option +MetaLB at
180 runtime. To get the full benefits, calls to AtSync() should be made at every
181 iteration, rather than at some arbitrary longer interval as was previously
184 - Many feature additions and usability improvements have been made in the
185 interface translator that generates code from .ci files:
186 * Charmxi now provides much better error reports, including more accurate
187 line numbers and clearer reasons for failure, including some semantic
188 problems that would otherwise appear when compiling the C++ code or even at
190 * A new SDAG construct 'case' has been added that defines a disjunction over a
191 set of 'when' clauses: only one 'when' out of a set will ever be triggered.
192 * Entry method templates are now supported. An example program can be found
193 in tests/charm++/method_templates/.
194 * SDAG keyword "atomic" has been deprecated in favor of the newly supported
195 keyword "serial". The two are synonymous, but "atomic" is now provided only
196 for backward compatibility.
197 * It is no longer necessary to call __sdag_init() in chares that contain SDAG
198 code - the generated code does this automatically. The function is left as
199 a no-op for compatibility, but may be removed in a future version.
200 * Code generated from .ci files is now primarily in .def.h files, with only
201 declarations in .decl.h. This improves debugging, speeds compilation,
202 provides clearer compiler output, and enables more complete encapsulation,
203 especially in SDAG code.
204 * Mainchare constructors are expected to take CkArgMsg*, and always have
205 been. However, charmxi would allow declarations with no argument, and
206 assume the message. This is now deprecated, and generates a warning.
208 - Projections tracing has been extended and improved in various ways
209 * The trace module can generate a record of network topology of the nodes in
210 a run for certain platforms (including Cray), which Projections can
212 * If the gzip library (libz) is available when Charm++ is compiled, traces
213 are compressed by default.
214 * If traces were flushed as a results of filled buffers during the run, a
215 warning will be printed at exit to indicate that the user should be wary of
216 interference that may have resulted.
217 * In SMP builds, it is now possible to trace message progression through the
218 communication threads. This is disabled by default to avoid overhead and
219 potential misleading interpretation.
221 - Array elements can be block-mapped at the SMP node level instead of at the
222 per-PE level (option "+useNodeBlkMapping").
224 - AMPI can now privatize global and static variables using TLS. This is
225 supported in C and C++ with __thread markings on the variable declarations
226 and definitions, and in Fortran with a patched version of the gfortran
227 compiler. To activate this feature, append '-tls' to the '-thread' option's
228 argument when you link your AMPI program.
230 - Charm can now be built to only support message priorities of a specific data
231 type. This enables an optimized message queue within the the runtime
232 system. Typical applications with medium sized compute grains may not benefit
233 noticeably when switching to the new scheduler. However, this may permit
234 further optimizations in later releases.
236 The new queue is enabled by specifying the data type of the message
237 priorities while building charm using --with-prio-type=dtype. Here, dtype can
238 be one of char, short, int, long, float, double and bitvec. Specifying bitvec
239 will permit arbitrary-length bitvector priorities, and is the current default
240 mode of operation. However, we may change this in a future release.
242 - Converse now provides a complete set of wrappers for
243 fopen/fread/fwrite/fclose to handle EINTR, which is not uncommon on the
244 increasingly-popular Lustre. They are named CmiF{open,read,write,close}, and
245 are available from C and C++ code.
247 - The utility class 'CkEntryOptions' now permits method chaining for cleaner
248 usage. This applies to all its set methods (setPriority, setQueueing,
249 setGroupDepID). Example usage can be found in examples/charm++/prio/pgm.C.
251 - When creating groups or chare arrays that depend on the previous construction
252 of another such entity on the local PE, it is now possible to declare that
253 dependence to the runtime. Creation messages whose dependence is not yet
254 satisfied will be buffered until it is.
256 - For any given chare class Foo and entry method Bar, the supporting class's
257 member CkIndex_Foo::Bar() is used to lookup/specify the entry method
258 index. This release adds a newer API for such members where the argument is a
259 function pointer of the same signature as the entry method. Those new
260 functions are used like CkIndex_Foo::idx_Bar(&Foo::Bar). This permits entry
261 point index lookup without instantiating temporary variables just to feed the
262 CkIndex_Foo::Bar() methods. In cases where Foo::Bar is overloaded, &Foo::Bar
263 must be cast to the desired type to disambiguate it.
265 - CkReduction::reducerType now have PUP methods defined; and can hence be
266 passed as parameter-marshalled arguments to entry methods.
268 - The runtime option +stacksize for controlling the allocation of user-level
269 threads' stacks now accepts shorthanded annotation such as 1M.
271 - The -optimize flag to the charmc compiler wrapper now passes more aggressive
272 options to the various underlying compilers than the previous '-O'.
274 - The charmc compiler wrapper now provides a flag -use-new-std to enable
275 support for C11 and C++11 where available. To use this in application code,
276 the runtime system must have been built with that flag as well.
278 - When using, CmiMemoryUsage(), the runtime can be instructed not to use the
279 underlying mallinfo() library call, which can be inaccurate in settings where
280 usage exceeds INT_MAX. This is accomplished by setting the environment
281 variable "MEMORYUSAGE_NO_MALLINFO".
283 - Experimental Features
284 * Initial implementation of a fast message-logging protocol. Use option
285 'mlogft' to build it.
286 * Message compression support for persistent message on Gemini machine layer.
287 * Node-level inter-PE loop/task parallelization is now supported through
289 * New temperature/CPU frequency aware load balancer
290 * Support interoperation of Charm++ and native MPI code through dynamically
291 switching control between the two
292 * API in centralized load balancers to get and set PE speed
293 * A new scheme for optimization of double in-memory checkpoint/restart.
294 * Message combining library for improved fine-grained communication
296 * Support for partitioning of allocated nodes into subsets that run
297 independent Charm++ instances but can interact with each other.
299 Platform-Specific Changes
300 -------------------------
303 * The gemini_gni network layer has been heavily tuned and optimized,
304 providing substantial improvements in performance, scalability, and
306 * The gemini_gni-crayxe machine layer supports a 'hugepages' option at build
307 time, rather than requiring manual configuration file editing.
308 * Persistent message optimizations can be used to reduce latency and
310 * Experimental support for 'urgent' sends, which are sent ahead of any other
311 outgoing messages queued for transmission.
313 - IBM Blue Gene Q: Experimental machine-layer support for the native PAMI
314 interface and MPI, with and without SMP support. This supports many new
315 systems, including LLNL's Sequoia, ALCF's Mira, and FZ Juelich's Juqueen.
317 There are three network-layer implementations for these systems: 'mpi',
318 'pami', and 'pamilrts'. The 'mpi' layer is stable, but its performance and
319 scalability suffers from the additional overhead of using MPI rather than
320 driving the interconnect directly. The 'pami' layer is well tested for NAMD,
321 but has shown instability for other applications. It is likely to be replaced
322 by the 'pamilrts' layer, which is more generally stable and seems to provide
323 the same performance, in the next release.
325 In addition to the common 'smp' option to build the runtime system with
326 shared memory support, there is an 'async' option which sometimes provides
327 better performance on SMP builds. This option passes tests on 'pamilrts', but
328 is still experimental.
330 Note: Applications that have large number of messages may crash in default
331 setup due to overflow in the low-level FIFOs. Environment variables
332 MUSPI_INJFIFOSIZE and PAMI_RGETINJFIFOSIZE can be set to avoid application
333 failures due to large number of small and large messages respectively. The
334 default value of these variable is 65536 which is sufficient for 1000
337 - Infiniband Verbs: Better support for more flavors of ibverbs libraries
340 * Experimental rendezvous protocol for better performance above some MPI
342 * Some tuning parameters ("+dynCapSend" and "+dynCapRecv") are now
343 configurable at job launch, rather than Charm++ compilation.
345 - PGI C++: Disable automatic 'using namespace std;'
347 - Charm++ now supports ARM, both non-smp and smp.
349 - Mac OS X: Compilation options to build and link correctly on newer versions
352 ================================================================================
353 What's new in Charm++ 6.4.0
354 ================================================================================
356 --------------------------------------------------------------------------------
358 --------------------------------------------------------------------------------
360 - Cray XE and XK systems using the Gemini network via either MPI
361 (mpi-crayxe) or the native uGNI (gemini_gni-crayxe)
363 - IBM Blue Gene Q, using MPI (mpi-bluegeneq) or PAMI (pami-bluegeneq)
365 - Clang, Cray, and Fujitsu compilers
367 - MPI-based machine layers can now run on >64k PEs
369 --------------------------------------------------------------------------------
371 --------------------------------------------------------------------------------
373 - Added a new [reductiontarget] attribute to enable
374 parameter-marshaled recipients of reduction messages
376 - Enabled pipelining of large messages in CkMulticast by default
378 - New load balancers added:
381 * Scotch graph partitioning based: ScotchLB and Refine and Topo variants
384 - Load balancing improvements:
386 * Allow reduced load database size using floats instead of doubles
387 * Improved hierarchical balancer
388 * Periodic balancing adapts its interval dynamically
389 * User code can request a callback when migration is complete
390 * More balancers properly consider object migratability and PE
391 availability and speed
392 * Instrumentation records multicasts
394 - Chare arrays support options that can enable some optimizations
396 - New 'completion detection' library for parallel process termination
397 detection, when the need for modularity excludes full quiescence
400 - New 'mesh streamer' library for fine-grain many-to-many collectives,
401 handling message bundling and network topology
403 - Memory pooling allocator performance and resource usage improved
406 - AMPI: More routines support MPI_IN_PLACE, and those that don't check
409 ================================================================================
410 What's new in Charm++ 6.2.1 (since 6.2.0)
411 ================================================================================
413 --------------------------------------------------------------------------------
414 New Supported Platforms:
415 --------------------------------------------------------------------------------
417 POWER7 with LAPI on Linux
419 Infiniband on PowerPC
421 --------------------------------------------------------------------------------
423 --------------------------------------------------------------------------------
425 - Better support for multicasts on groups
426 - Topology information gathering has been optimized
427 - Converse (seed) load balancers have many new optimizations applied
428 - CPU affinity can be set more easily using +pemap and +commap options
429 instead of the older +coremap
430 - HybridLB (hierarchical balancing for very large core-count systems)
431 has been substantially improved
432 - Load balancing infrastructure has further optimizations and bug fixes
433 - Object mappings can be read from a file, to allow offline
434 topology-aware placement
435 - Projections logs can be spread across multiple directories, speeding
436 up output when dealing with thousands of cores (+trace-subdirs N
437 will divide log files evenly among N subdirectories of the trace
438 root, named PROGNAME.projdir.K)
439 - AMPI now implements MPI_Issend
440 - AMPI's MPI_Alltoall uses a flooding algorithm more agressively,
441 versus pairwise exchange
442 - Virtualized ARMCI support has been extended to cover the functions
445 --------------------------------------------------------------------------------
446 Architecture-specific changes
447 --------------------------------------------------------------------------------
449 - LAPI SMP has many new optimizations applied
451 - Net builds support the use of clusters' mpiexec systems for job
452 launch, via the ++mpiexec option to charmrun
454 ================================================================================
455 What's new in Charm++ 6.2.0 (since 6.1)
456 ================================================================================
458 --------------------------------------------------------------------------------
459 New Supported Platforms:
460 --------------------------------------------------------------------------------
462 64-bit MIPS, such as SiCortex, using mpi-linux-mips64
464 Windows HPC cluster, using mpi-win32/mpi-win64
466 Mac OSX 10.6, Snow Leopard (32-bit and 64-bit).
468 --------------------------------------------------------------------------------
470 --------------------------------------------------------------------------------
473 - Smarter build/configure scripts
474 - A new interface for model-based load balancing
475 - new CPU topology API
476 - a general implementation of CmiMemoryUsage()
477 - Bug fix: Quiescence detection (QD) works with immediate messages
478 - New reduction functions implemented in Converse
479 - CCS (Converse Client-Server) can deliver message to more than one processor
480 - Added a memory-aware adaptive scheduler, which can be optionally
482 - Added preliminary support for automatic message prioritization
483 (disabled by default)
486 - Cross-array and cross-group sections
487 - Structured Dagger (SDAG): Support templated arguments properly
488 - Plain chares support checkpoint/restart (both in-memory and disk-based)
489 - Conditional packing of messages and parameters in SMP scenario
490 - Changes to the CkArrayIndex class hierarchy
491 -- sizeof() all CkArrayIndex* classes is now the same
492 -- Codes using custom array indices have to use placement-new to construct
493 their custom index. Refer example code: examples/charm++/hello/fancyarray/
494 -- *** Backward Incompatibility ***
495 CkArrayIndex[4D/5D/6D]::index are now of type int (instead of short)
496 However the data is stored as shorts. Access by casting
497 CkArrayIndexND::data() appropriately
498 -- *** Deprecated ***
499 The direct use of public data member
500 CkArrayIndexND::index (N=1..6) is deprecated. We reserve the right to
501 change/remove this variable in future releases of Charm++.
502 Instead, please access the indices via member function:
503 int CkArrayIndexND::data()
506 - Compilers renamed to avoid collision with host MPI (ampicc, ampiCC,
508 - Improved MPI standard conformance, and documentation of non-conformance
509 * Bug fixes in: MPI_Ssend, MPI_Cart_shift, MPI_Get_count
510 * Support MPI_IN_PLACE in MPI_(All)Reduce
511 * Define various missing constants
512 - Return the received message's tag in response to a non-blocking
513 wildcard receive, to support SuperLU
514 - Improved tracing for BigSim
516 Multiphase Shared Arrays (MSA)
517 - Typed handles to enforce phases
518 - Split-phase synchronization to enable message-driven execution
522 - Automatic tracing of API calls for simulation and analysis
525 - Wider support for architectures other than net- (in particular MPI layers)
526 - Improved support for large scale debugging (better scalability)
527 - Enhanced record/replay stability to handle various events, and to
528 signal unexpected messages
529 - New detailed record/replay: The full content of messages can be
530 recorded, and a single processor can be re-executed outside of the
534 - Tracing of nested entry methods
536 Automatic Performance Tuning
537 - Created an automatic tuning framework [still for experimental use only]
540 - Network-topology / node aware spanning trees used internally for and
541 lower bytes on the network and improved performance in multicasts and
542 reductions delegated to this library
545 - Improved OneTimeMulticastStrategy classes
548 - Out-of-core support, with prefetching capability
549 - Detailed tracing of MPI calls
550 - Detailed record/replay support at emulation time, capable of
551 replaying any emulated processor after obtained recorded logs.
553 --------------------------------------------------------------------------------
554 Architecture-specific changes
555 --------------------------------------------------------------------------------
558 - Can run jobs with more than 1024 PEs
561 - New charmrun option ++no-va-randomization to disable address space
562 randomization (ASLR). This is most useful for running AMPI with
566 - Default to using ampicxx instead of mpiCC
569 - The +p option now has the same semantics as in other smp builds
572 - Support for VSX in SIMD abstraction API
575 - Compilers and options have been updated to the latest ones
578 - Added routines for measuring performance counters on BG/P.
579 - Updated to support latest DCMF driver version. On ANL's Intrepid, you may
580 need to set BGP_INSTALL=/bgsys/drivers/V1R4M1_460_2009-091110P/ppc in your
581 environment. This is the default on ANL's Surveyor.
584 - cputopology information is now available on XT3/4/5
587 - Bug fix: plug memory leaks that caused failures in long runs
588 - Optimized to reduce startup delays
591 - Support for SMP (experimental)
594 ================================================================================
595 Note that changes from 5.9, 6.0, and 6.1 are not documented here. A partial list
596 can be found on the charm download page, or by reading through version control
599 ================================================================================
600 What's New since Charm++ 5.4 release 1
601 ================================================================================
603 --------------------------------------------------------------------------------
604 New Supported Platforms:
605 --------------------------------------------------------------------------------
606 1. Charm++ ported to IA64 Itanium running Win2K and Linux, Charm++ also support
607 Intel C/C++ compilers;
609 2. Charm++ ported to Power Macintosh powerpc running Darwin;
611 3. Charm++ ported to Myrinet networking with GM API;
613 --------------------------------------------------------------------------------
614 Summary of New Features:
615 --------------------------------------------------------------------------------
617 Structured Dagger is a coordination language built on top of CHARM++.
618 Structured Dagger allows easy expression of dependences among messages and
619 computations and also among computations within the same object using
620 when-blocks and various structured constructs.
622 2. Entry functions support parameter marshalling
623 Now you can declare and invoke remote entry functions using parameter
624 marshalling instead of defining messages.
626 3. Easier running - standalone mode
627 For net-* version running locally, you can now run Charm programs without
628 charmrun. Running a node program directly from command line is now the
629 same as "charmrun +p1 <program>"; for SMP version, you can also specify
630 multiple (local) processors, as in "program +p2".
633 --------------------------------------------------------------------------------
635 --------------------------------------------------------------------------------
636 1. "build" changed for compilation of Charm++
637 To build Charm++ from scratch, we now take additional command line options
638 to compile with addon features and using different compilers other than gcc.
639 For example, to build Linux IA64 with Myrinet support, type command:
640 ./build net-linux-ia64 gm
643 ******* Old Change histories *******
646 ================================================================================
647 What's New in Charm++ 5.4 release 1 since 5.0
648 ================================================================================
650 --------------------------------------------------------------------------------
651 New Supported Platforms:
652 --------------------------------------------------------------------------------
654 1. Win9x/2000/NT: with Visual C++ or Cygwin gcc/g++, you can compile and run
655 Charm++ programs on all Win32 platforms.
657 2. Scyld Beowulf: Charm++ has been ported to the Linux-based Scyld Beowulf
658 operating system. For more information on Scyld, see <http://www.scyld.com>
660 3. MPI with VMI: Charm++ has been ported to NCSA's Virtual Machine Interface,
661 which is an efficient messaging library for heterogeneous cluster
665 --------------------------------------------------------------------------------
666 Summary of New Features:
667 --------------------------------------------------------------------------------
668 1. Dynamic Load balancing:
669 Chare migration is supported in the new release. Migration-based dynamic
670 load balancing framework with various load balancing strategies library has
674 Charm++ array is supported. You can now create an array of Chare objects
675 and use array index to refer the Charm++ array elements. A reduction
676 library on top of Chare array has been implemented and included.
679 Projections, a Java application for Charm++ program performance analysis and
680 visualization, has been included and distributed in the new release. Two
681 trace modes are available: trace-projections and trace-summary. Trace-summary
682 is a light-weight trace library compared to trace-projections.
685 AMPI is a load-balancing based library for porting legacy MPI applications
686 to Charm++. With few changes in the original MPI code to AMPI, the new
687 legacy MPI application on Charm++ will gain from Charm++'s adptive
688 load balancing ability.
691 "Charmrun" is now available on all platforms, with a uniform command line
692 syntax. You can forget the difference between net-* versions and MPI versions,
693 and run charm++ application with this same charmrun command syntax.
694 ++local option is added in charmrun for net-* version, it provides
695 simple local use of Charm and no longer require the ability to
696 "rsh localhost" or a nodelist file in order to run charm only on the local
697 machine. This is especially attractive when you run Charm++ on Windows.
700 Many new libraries have been added in this release. They include:
701 1) master-slave library: for writing manager-worker paradigm programs.
702 2) receiver library: provide asynchronous communication mode for chare array.
703 3) f90charm: provides Fortran90 bindings for Charm++ Array.
704 4) BlueGene: a Charm++/Converse emulator for IBM proposed Blue Gene.
706 --------------------------------------------------------------------------------
708 --------------------------------------------------------------------------------
709 1. message declaration syntax in .ci file:
710 The message declaration syntax for packed/varsize messages has been changed.
711 The packed/varsize keywords are eliminated, and you can specify the actual
712 actual varsize arrays in the interface file and have the translator generate
713 alloc, pack and unpack.
716 Here is the detailed list of Changes:
718 --------------------------------------------------------------------------------
720 --------------------------------------------------------------------------------
722 10/06/1999 rbrunner Added migration-based dynamic load balancing
724 11/15/1999 olawlor Added reduction support foe Charm++ arrays
725 02/06/2000 milind Added AMPI, an implementation of MPI with
726 dynamic load balancing
727 02/18/2000 paranjpy New platforms supported: net-win32, and net-win32-smp
728 04/04/2000 olawlor Added arbitrarily indexed Charm++ arrays.
729 Also, added translator support for new arrays.
730 04/15/2000 olawlor Added "puppers" for packing and unpacking
732 06/14/2000 milind Added the threaded FEM framework.
734 --------------------------------------------------------------------------------
736 --------------------------------------------------------------------------------
738 10/09/1999 rbrunner Added packlib, a library for C and C++ to
739 pack-unpack data to/from Charm++ messages.
740 10/13/1999 gzheng New LB strategy: RefineLB
741 10/13/1999 paranjpy New LB Strategy: Heap
742 10/14/1999 milind New LB Strategy: Metis
743 10/19/1999 olawlor New test program for testing LB strategies.
744 10/21/1999 gzheng New trace mode: trace-summary
745 10/28/1999 milind New supported platform: net-sol-x86
746 10/29/1999 milind Added runtime checks for ChareID assignment.
747 11/10/1999 rbrunner Added Neighborhood base strategy for LB
749 11/15/1999 olawlor conv-host now reads in a startup file
751 11/15/1999 olawlor New test program for testing array reductions.
752 11/16/1999 rbrunner Added processor-speed checking functions to
754 11/19/1999 milind Mapped SIGUSR to a Ccd condtion handler
755 11/22/1999 rbrunner New LB strategy: WSLB
756 11/29/1999 ruiliu Modified Metis LB strategy to deal with
757 different processor speeds
758 12/16/1999 rbrunner New LB strategy: GreedyRef
759 12/16/1999 rbrunner New LB strategy: RandRef
760 12/21/1999 skumar2 New LB strategy: CommLB
761 01/03/2000 rbrunner New LB strategy: RecBisectBfLB
762 01/08/2000 skumar2 New LB strategy: Comm1LB, with varying processor
764 01/18/2000 milind Modified SM library syntax, and added a test
766 01/19/2000 gzheng Added irecv, a library to simplify conversion
767 of message-passing programs to Charm++
768 02/20/2000 olawlor Added preliminary broadcast support to Charm++
770 02/23/2000 paranjpy Added converse-level quiescence detection
771 03/02/2000 milind Added ++server-port option to pre-specify
773 03/10/2000 wilmarth Random seed-based load balancer now uses
774 bit-vector for active PEs.
775 03/21/2000 gzheng Added support for marking user-defined events
777 03/28/2000 wilmarth Added CMK_TRUECRASH. Very helpful for
778 post-mortem debugging of Charm++ programs on
780 03/31/2000 jdesouza Added Fortran90 support to the Charm++
781 interface translator.
782 03/09/2000 milind Added support for -LANG and -rpath options
783 in charmc for Origin2000.
784 04/28/2000 milind Added prioritized converse threads.
785 05/01/2000 milind Added test programs for TeMPO, AMPI and irecv.
786 05/04/2000 milind New supported platform: mpi-sp.
787 05/04/2000 gzheng Added irecv pingpong program.
788 05/17/2000 olawlor Each chare, group and array element now has to
789 have migration constructor.
790 05/24/2000 milind Added Jacobi3D programs for irecv and AMPI both.
791 05/24/2000 milind Made migratable an optional attribute of
792 chares, groups, and nodegroups.
793 Arrays are by default migratable.
794 05/29/2000 paranjpy Added pup methods to arrays, reductions etc
796 06/13/2000 milind Made CtvInitialize idempotent. That is, it
797 can be called by any number of threads now,
798 only the first one will actually do
800 06/20/2000 milind Added a simple test program for the FEM
802 07/06/2000 milind Imported Metis 4.0 sources in the CVS tree.
803 Also added code to make metis libraries and
804 executables to Makefile.
805 07/07/2000 milind Added more meaningfull error messages using
806 perror in addition to a cryptic error codes in
808 07/10/2000 milind fem and femf are now recognized as "languages"
810 07/10/2000 saboo Added the derived datatypes library.
811 07/13/2000 milind Added +idle_timeout functionality. It takes a
812 commandline parameter denoting milliseconds of
813 maximum consecutive idle time allowed per
815 07/14/2000 milind Added group multicast. Added
816 CkSendMsgBranchMulti, CldEnqueueMulti, and
817 translator changes to support it.
818 07/14/2000 milind SUPER_INSTALL now takes "-*" arguments prior
819 to the target, that will be passed to make as
820 "makeflags". This makes it easy to suppress
821 make's output of commands etc (with the -s
822 flag). As a result of this, several Makefiles
824 07/18/2000 milind Added support for using "dbx" on suns as
826 07/19/2000 milind Added ability to tracemode projections which
827 produces binary trace files. Use flag
828 +binary-trace on the command line.
829 07/26/2000 milind Separated AMPI from TeMPO.
830 07/28/2000 milind Added test programs to test reduce, alltoall
831 and allreduce functionality of AMPI.
832 08/02/2000 milind Added an option to let the user specify which
833 "xterm" to use. For example, on some systems
834 (CDE), only dtterm is installed. So, by
835 putting ++xterm dtterm on the conv-host
836 commandline, one can use dtterm when ++in-xterm
837 option is specified on conv-host commandline.
838 08/14/2000 milind FEM Framework: Added capabilities to handle
839 esoteric meshes to standalone offline programs.
840 Makefile now produces gmap and fgmap programs,
841 which are used for this purpose. They convert
842 the mesh to a graph before partitioning it
844 08/24/2000 milind Added the 2D crack propagation program as a
845 test program for FEM framework.
846 08/25/2000 milind Initial implementation of isomalloc-based
847 threads. This implementation uses a fixed
848 stack size for all threads (can be set at
850 08/26/2000 milind Added a macro CtvAccessOther that lets you
851 get/set a Ctv variable of any thread. It
852 should be invoked as CtvAccessOther(thread,
853 varname); Added CthGetData function to each of
854 the threads implementation. This function is
855 used in the CtvAccessOther macro.
856 08/27/2000 milind FEM Framework: Separated mesh to graph
857 conversion capability into a separate program.
858 This way, the generated graph can be partitioned
860 09/04/2000 milind Added the class static readonly variables to
862 09/05/2000 milind FEM Framework: A very fast O(n) algorithm for
863 mesh2graph , uses more memory, but the tradeoff
864 was worth it. Coded by Karthik Mahesh, minor
865 optimizations by Milind.
866 09/05/2000 milind Added a barebones charm kernel scheduling
867 overhead measurement program.
868 09/15/2000 milind Added pup support for AMPI and FEM framework.
869 09/20/2000 olawlor Added capability to have an array of base type
870 where individual element could be of derived
872 10/03/2000 gzheng New supported platform: net-linux-axp
873 10/05/2000 skumar2 Added program littleMD to the test suite.
874 10/07/2000 skumar2 New job scheduler (Faucets projects).
875 10/15/2000 milind Improved support for Fortran90 in charmc.
876 11/04/2000 jdesouza Made the Faucets scheduler multi-threaded.
877 11/05/2000 olawlor FEM Framework: supports multiple element types,
878 mesh re-assembly, etc.
879 11/15/2000 gzheng New platform support: net-cygwin
880 11/18/2000 gzheng conv-host no longer needs /bin/csh to start
882 CMK_CONV_HOST_CSH_UNAVAILABLE to 1 to use
884 11/25/2000 milind Finished experimental implementation of
885 converse-threads based on co-operative pthreads.
886 11/25/2000 milind Added a benchmark suite of all pingpongs in
888 11/28/2000 milind Removed deletion of _idx at the end of every
889 send or doneInserting call. Instead now it is
890 in the destructor of the proxy. This allows us
891 to cache proxies, when proxy creation becomes
893 11/28/2000 olawlor Added "seek blocks" to puppers. This should
894 allow out-of-order pup'ing without the ugliness
895 of getBuf; and in a way that works with all
897 11/29/2000 olawlor Simplified and regularized command-line-argument
899 11/29/2000 milind AMPI: Added multiple-communicators capability.
900 12/05/2000 gzheng Now /bin/sh is default shell to fork node
901 program on remote machines.
902 12/13/2000 olawlor Added charmrun wrapper for poe on mpi-sp.
903 12/14/2000 milind Added bluegene emulator sources and test
904 programs. Added "bluegene" as a language known
905 to charmc. Makefile now has a target called
906 bluegene. Added preliminary bluegene
907 documentation. (copied from Arun's webpage.)
908 12/15/2000 gzheng f90charm addition to Makefile and charmc. Also,
909 added fixed size arrays support to f90charm. A
910 test program f90charm/hello is checked in.
911 12/17/2000 milind Added rtest test program. Contributed by jim to
912 test Converse message transmission.
913 12/20/2000 olawlor Added charmconfig script. Enables automatic
914 determination of C++ compiler properties,
915 replacing the verbose and error-prone
916 conv-mach.h entries for CMK_BOOL,
917 CMK_STL_USE_DOT_H, CMK_CPP_CAST_OK, ...
918 12/20/2000 olawlor Charm++ Arrays optimizations: Key and object
919 now variable-length fields, instead of pointers.
920 This extra flexibility lets us save many
921 dynamic allocations in the array framework.
922 12/20/2000 olawlor Added PUP::able support-- dynamic type
923 identification, allocation, and deletion.
924 Allows you to write: p(objPtr); and
925 objPointer will be properly identified,
926 allocated, packed, and deallocated (depending
927 on the PUP::er). Requires you to register any
928 such classes with DECLARE_PUPable and
930 12/20/2000 olawlor Arrays optimizations: Made CkArrayIndex
931 fixed-size. This significantly improves
932 messaging speed (7 us instead of 10 us
933 roundtrip). Move spring cleaning check into a
934 CcdCallFnAfter, which gains more speed (down to
936 12/20/2000 olawlor More optimizations: Minor speed tweaks--
937 conv-ccs.c uses hashtable for handler lookup;
938 conv-conds skips timer test until needed;
939 convcore.c scheduler loop optmizations (no
940 superfluous EndIdle calls); threads.c
941 CMK_OPTIMIZE-> no mprotect.
942 12/20/2000 olawlor More Optimizations: Minor speed tweaks-- ck.C
943 groups cldEnqueue skip; init.h defines
944 CkLocalBranch inline; and supporting changes.
945 12/22/2000 gzheng IA64 support for Converse user level threads.
946 01/02/2001 olawlor CCS: Minor update-- enabled CcsProbe, cleaned
947 up superflous debug messages in server, added
948 Java interface (originally written for
950 01/09/2001 gzheng charmconfig converted to autoconf style, need
951 to change configure.in and conv-autoconfig.h.in,
952 and run autoconf to get configure and copy to
953 charmconfig. added fortran subroutine name
954 test and get libpthread.a
955 01/10/2001 milind Added telnet method of getting libpthread.a
956 from charm webserver.
957 01/11/2001 olawlor Moved projections files here from
958 CVSROOT/projections-java. Added fast Java
959 versions of the .log file input routines in
960 LogReader, LogLoader, LogAnalyzer, and
961 UsageCalc. Added "U.java" user interface
962 utility file, allowing times to be input in
963 seconds, milliseconds, or microseconds,
964 instead of just microseconds.
965 01/15/2001 gzheng add +trace-root to specify the directory to
966 put log files in. this is need in Scyld cluster
967 where there is no NFS mounting and no i/o
968 access to home directory sharing on nodes.
969 01/15/2001 milind Made AMPI into a f90 module instead of
970 'ampif.h' inclusion. AMPI f90 bindings are
971 now more inclusive. Fixed argc,argv handling
972 bugs in ArgsInfo message. Fixed a bug in pup
973 that caused thread not to be sized, but was
974 packed nevertheless. Moved irecv to waitall
975 instead of at in ampi_start. Made
976 AMPI_COMM_WORLD to be 0, because it clashed
977 with wildcard(-1). AMPI_COMM_UNIVERSE is now
978 handled properly in the AMPI module.
979 C/C++ data members are NOT visible to
981 01/18/2001 gzheng New supported platform: net-linux-scyld
982 01/20/2001 olawlor Moved array index field from CMessage_* to the
983 Ck envelope itself. This is the right thing
984 to do, because any message may be sent to/from
985 an array element. To reduce the wasted space
986 in a message, a union is used to overlay the
987 fields for the various possible message types.
988 01/29/2001 olawlor Freed charmrun on net-* version from using
989 remote shell to fork off processes. One can now
990 use a daemon provided in the distribution.
991 02/07/2001 olawlor Added debugging support to puppers.
992 02/13/2000 gzheng Added ++local option to charmrun to start node
993 program locally without any daemon; fix the
994 hang program if you type wrong pgm name in
995 scyld version, and redirect all output to
996 /dev/null, otherwise all node program can send
997 its output to console in scyld. Also implemented ++local in net-win32 version.
998 02/26/2000 milind Changed the varsize syntax. Now one can specify
999 actual varsize arrays in the interface file
1000 and have the translator generate alloc, pack
1003 --------------------------------------------------------------------------------
1005 --------------------------------------------------------------------------------
1007 10/29/1999 milind Replaced jmemcpy by memcpy in net versions, as
1008 it was causing a bit to flip (bug reported
1010 10/29/1999 milind Fixed multiline macros in all header files.
1011 02/05/2000 milind Fixed linking errors by getting the order of
1012 libraries right from the charmc command-line.
1013 02/18/2000 paranjpy Fixed Charm++ initialization bug on SMPs.
1014 02/21/2000 milind Fixed a context-switching bug in mipspro version
1016 02/25/2000 milind Charm++ interface translator was segfaulting
1017 on interface file errors. Fixed that. Also,
1018 added linenumbers to error messages.
1019 03/02/2000 milind Made CCS work on SMPs.
1020 03/07/2000 milind Made ConverseInit consistent with the manual on
1022 04/18/2000 milind Fixed a bug in CkWaitFuture, which was caching
1023 a variable locally, while it was changed by
1025 05/04/2000 paranjpy Fixed argv deletion bug on net-win32-smp.
1026 06/08/2000 milind sp3 version: changed optimization flags, which
1027 where power2 processor-specific.
1028 06/20/2000 milind mpi-* versions: Fixed ConverseExit since it was
1029 not obeying the following statement in the MPI
1030 standard: The user must ensure that all pending
1031 communications involving a process completes
1032 before the process calls MPI_FINALIZE.
1033 07/05/2000 milind Fixed a nasty bug in charmc in the -cp option.
1034 It used to append the name provided to -o flag
1035 to the directory provided to the -cp flag.
1036 Thus, -o ../pgm -cp ../bin options meant that
1037 the pgm would be copied to ../bin/.., which is
1038 not the expected behavior. This fix correctly
1039 copies pgm to ../bin.
1040 07/07/2000 milind Removed variable arg_myhome, as it was not
1041 being used anywhere, and also, setting it was
1042 causing problems of env var HOME was not set.
1043 07/27/2000 milind thishandle for the arrayelement was not being
1044 correctly set. Bug was reported by Neelam.
1045 08/26/2000 milind Origin2000: Changed the page alignment to
1046 reflect the mmap alignment. The mmap man page
1047 specifically states that it is not the same as
1049 09/02/2000 milind Fixed a bug in code generated for threaded
1050 (void) entry methods of array elements. The
1051 dummy message that is passed to that method in
1052 a thread has to be deleted before calling the
1053 object method, because upon object method's
1054 return, the thread might have migrated.
1055 09/03/2000 olawlor Minor fix-fixes: 1.) Change to LBObjid hash
1056 function would fail for >4-int object indices.
1057 Replaced with proper function, which also
1058 preserves the 1-int case. 2.) Array element
1059 sends must go via the message queue to prevent
1060 stack build-up for deep single-processor call
1061 chains. These might happen, e.g., in a driver
1062 element calling itself for the main time loop.
1063 Messages are now properly noted as sent, then
1064 wait through the queue for delivery. This
1065 entailed minor reorganization of the message
1067 09/21/2000 olawlor Tiny SMP thread fix-- registrations of a
1068 thread-private variable now reserve space on
1069 calls after the first. This wastes space for
1070 multiple CthInitialize's-- it's a quick hack to
1071 get threads working again on SMP versions.
1072 10/16/2000 olawlor A few CCS fixes: -Added split-phase reply
1073 (delay reply indefinitely) -Cleaned up error
1074 handling -Pass user data as "void *" instead of
1076 11/03/2000 wilmarth Removed 0 size array allocation in Charm++
1077 quiescence detection.
1078 11/20/2000 gzheng Rewrote part of Fiber thread, including a bug
1079 fix for a the non thread-safe function, and a
1080 different fiber free strategy.
1081 11/29/2000 gzheng The LB init procedure tried to allocate
1082 65536*160 as initial size, which is 10M memory
1083 for communication table, which is too big.
1084 Cut it down to roughly 1M, and it can expand
1086 12/05/2000 gzheng In many cases, conv-host exits without print
1087 out the error message from remote shell. try
1088 to fix it by calling sync to flush the pipe
1090 12/10/2000 milind net-linux: Made static linking the default
1091 option because dynamic linking runtime causes
1092 isomalloc threads to crash.
1093 12/18/2000 milind Increased portability of isomalloc threads by
1094 removing dependence on alloca.
1095 12/28/2000 milind Fixed ctrl-getone abort bug on SMP.
1096 12/28/2000 milind Made _groupTable a pointer on which a
1097 constructor is explicitly called. Since it
1098 was a Cpv variable, its constructor was not
1099 called by default in case of an SMP version.
1100 12/29/2000 olawlor Prevent infinite copy constructor recursion on
1102 01/10/2001 olawlor Added "explicit" keyword to remove ambiguity
1103 for KCC, which was confused by the private
1104 PUP::er(int) "cast" constructor and the operator
1105 |(PUP::er &p,T &t) into rejecting all operator|
1106 (int,int) as ambiguous.
1107 2001/01/17 gzheng fix the charmconfig bug on paragon-red: the
1108 failure testing of fortran won't stop the
1110 01/20/2001 olawlor Arrays reduction: Fixed bug-- reduction may end
1111 because all contributors migrate away.
1112 01/29/2001 olawlor Fix heap-corrupting bug-- call ->init() on
1113 nodeGroupTable, which sets the "pending"
1114 message queue to NULL. This prevents a nasty
1115 delete-unitialized-data bug later on. Also
1116 delayed queue creation until messages actually
1119 --------------------------------------------------------------------------------
1120 Documentation Changes:
1121 --------------------------------------------------------------------------------
1123 01/31/2000 milind Installation manual: Fixed bugs pointed out by
1125 02/28/2000 wilmarth Added a new look Charm++ manual.
1126 06/20/2000 milind Added pdflatex support to generate PDF versions
1127 of manuals from LaTeX sources.
1128 12/05/2000 milind Added Orion's FEM manual. Converted from HTML.
1129 12/10/2000 milind Added pplmanual.sty for all manuals.
1130 12/17/2000 milind Added master-slave library documentation to
1132 12/21/2000 saboo Added DDT documentation.
1133 01/02/2001 olawlor Updated for new CCS version.
1135 --------------------------------------------------------------------------------
1137 --------------------------------------------------------------------------------
1139 10/24/1999 olawlor charmc is changed to Bourne shell script
1140 instead of csh. All conv-mach.csh are
1141 replaced by conv-mach.sh.
1142 10/25/1999 olawlor SUPER_INSTALL is converted to use bourne shell.
1143 10/28/1999 milind All Makefiles now take OPTS commandline
1145 01/16/2000 olawlor Simplified Charm++ interface translator.
1146 02/23/2000 ruiliu Changed rand() calls from all over the codes
1147 to the new Converse random number generator.
1148 02/26/2000 milind Simplified the converse scheduler loop by
1149 combining the maxmsgs and poll modes.
1150 08/31/2000 milind Imported system documentation into the CVS tree.
1151 Also added super_install target for docs with
1152 necessary Makefile modifications.
1153 09/08/2000 olawlor Made soft links use relative pathnames instead
1154 of absolute. This lets you move a charm++
1155 installation without having to recompile
1157 09/11/2000 olawlor Grouped commonly needed code in the new util
1158 directory. Also, added pup_c a C wrapper for
1160 09/11/2000 olawlor Slightly reorganized header structure. Now no
1161 headers should need to be listed twice (once in
1162 ALLHEADERS, again in CKHEADERS). Now headers
1163 are soft-linked instead of copied. This makes
1164 development much easier. Added support for the
1165 new Common/util directory.
1166 09/21/2000 olawlor Major reorganization of net-* codes. Now all
1167 the TCP socket routines are in separate files.
1168 Also combined windoes NT code with unix codes.
1169 09/21/2000 olawlor Major rewrite of CCS-- underlying protocol is
1170 now binary (send/recv binary data everywhere);
1171 conv-host forwards requests to nodes; and
1172 source has been significantly re-arranged.
1173 (especially if NODE_0_IS_CONVHOST).
1174 11/22/2000 milind Removed IDL translator from distribution.
1175 12/01/2000 olawlor Renamed conv-host charmrun; added test for
1176 script conv-host. Also added charmrun for most
1178 12/17/2000 milind Moved List related data structures into
1179 cklists.h in util. Removed most of the redundant
1180 list implementations.
1181 12/20/2000 gzheng SUPER_INSTALL: format the output of list of
1182 versions and make the help page fit into one
1184 12/24/2000 milind Added test-{charm,converse,ampi,fem} targets to
1186 12/28/2000 milind net-sol-smp now uses pthreads.
1187 01/29/2001 olawlor Merged windowsNT and unix build procedures by
1188 basing the Windows build on cygwin. Added
1189 scripts to deal with unix and windows