doc/faq/ports.tex

   1 \section{Versions and Ports}
   2
   3 \subsection{Has Charm++ been ported to use MPI underneath? What about OpenMP?}
   4
   5 Charm++ supports MPI and can use it as the underlying communication
   6 library. We have tested on MPICH, OpenMPI, and also most vendor MPI
   7 variants.  Charm++ also has explicit support for SMP nodes in MPI
   8 version. Charm++ hasn't been ported to use OpenMP, but OpenMP can be
   9 used from Charm++.
  10
  11 \subsection{How complicated is porting Charm++/Converse?}
  12
  13 Depends. Hopefully, the porting only involves fixing compiler compatibility
  14 issues.  The LRTS abstraction layer was designed to simplify this process and has been used for the
  15 MPI, Verbs, uGNI, and PAMI layers.  User level threads and Isomalloc support may require special
  16 platform specific support.  Otherwise Charm++ is generally platform independent.
  17
  18 \subsection{If the source is available how feasible would it be for us to do ports
  19 ourselves?}
  20
  21 The source is always available, and you're welcome to make it run anywhere.
  22 Any kind of UNIX, Windows, and MacOS machine should be straightforward: just a
  23 few modifications to {\tt charm/src/arch/.../conv-mach.h} (for compiler
  24 issues) and possibly
  25 a new {\em machine.c} (if there's a new communication system involved).
  26 However, porting to embedded hardware with a proprietary OS may be fairly difficult.
  27
  28 \subsection{To what platform has Charm++/Converse been ported to?}
  29
  30 Charm++/Converse has been ported to most UNIX and Linux OS, Windows, and MacOS.
  31
  32 \subsection{Is it hard to port Charm++ programs to different machines?}
  33
  34 \label{porting}
  35 Charm++ itself it fully portable, and should provide exactly
  36 the same interfaces everywhere (even if the implementations are
  37 sometimes different).  Still, it's often harder than we'd like
  38 to port user code to new machines.
  39
  40 Many parallel machines have old or weird compilers, and
  41 sometimes a strange operating system or unique set of libraries.
  42 Hence porting code to a parallel machine can be suprisingly difficult.
  43
  44 Unless you're absolutely sure you will only run your code on a
  45 single, known machine, we recommend you be very conservative in
  46 your use of the language and libraries.  ``But it works with my gcc!''
  47 is often true, but not very useful.
  48
  49 Things that seem to work well everywhere include:
  50 \begin{itemize}
  51 \item Small, straightforward Makefiles.  gmake-specific (e.g.,
  52 ``ifeq'', filter variables) or convoluted makefiles can lead
  53 to porting problems and confusion.  Calling charmc instead
  54 of the platform-specific compiler will save you many headaches,
  55 as charmc abstracts away the platform specific flags.
  56 \item Basically all of ANSI C and fortran 77 work everywhere.  These seem
  57 to be old enough to now have the bugs largely worked out.
  58 %Thankfully, K\&R (no-prototype) C compilers have now died out.
  59 \item C++ classes, inheritance, virtual methods, and namespaces
  60 work without problems everywhere.  Not so uniformly supported
  61 are C++ templates, the STL, new-style C++ system headers,
  62 and the other features listed in the C++ question below.
  63 \end{itemize}
  64
  65 \subsection{How should I approach portability of C language code?}
  66
  67 Our suggestions for Charm++ developers are:
  68
  69 \begin{itemize}
  70 \item Avoid the nonstandard type ``long long'', even though many compilers
  71 happen to support it.  Use CMK\_INT8 or CMK\_UINT8,
  72 from conv-config.h, which are macros for the right thing.
  73 ``long long'' is not supported on many 64-bit machines (where ``long''
  74 is 64 bits) or on Windows machines (where it's ``\_\_int64'').
  75 \item The ``long double'' type isn't present on all compilers.  You can protect
  76 long double code with {\em \#ifdef CMK\_LONG\_DOUBLE\_DEFINED} if it's really needed.
  77 \item Never use C++ ``//'' comments in C code, or headers included by C.
  78 This will not compile under many compilers. %including the IBM SP C compiler.
  79 \item ``bzero'' and ``bcopy'' are BSD-specific calls.
  80 Use memset and memcpy for portable programs.
  81 \end{itemize}
  82
  83 If you're writing code that is expected to compile and run on
  84 Microsoft Windows using the Visual C++ compiler (e.g. modification to
  85 NAMD that you intend to submit for integration), that compiler has
  86 limited support for the C99 standard, and Microsoft recommends using
  87 C++ instead.
  88
  89 Many widely-used C compilers on HPC systems have limited support for
  90 the C11 standard. If you want to use features of C11 in your code,
  91 particularly \verb|_Atomic|, we recommend writing the code in C++
  92 instead, since C++11 standard support is much more ubiquitous.
  93
  94 \subsection{How should I approach portability and performance of C++ language code?}
  95
  96 The Charm++ system developers are conservative about which C++
  97 standard version is relied upon in runtime system code and what
  98 features get used to ensure maximum portability across the broad range
  99 of HPC systems and the compilers used on them. Through version 6.8.x,
 100 the system code requires only limited support for C++11 features,
 101 specifically variadic templates and R-value references. From version
 102 6.9 onwards, the system will require a compiler and standard library
 103 with at least full C++11 support.
 104
 105 A good reference for which compiler versions
 106 provide what level of standard support can be found at
 107 \url{http://en.cppreference.com/w/cpp/compiler_support}
 108
 109 Developers of several Charm++ applications have reported good results
 110 using features in more recent C++ standards, with the caveat of
 111 requiring that those applications be built with a sufficiently
 112 up-to-date C++ compiler.
 113
 114 The containers specified in the C++ standard library are generally
 115 designed to provide a very broad API that can be used correctly over
 116 highly-varied use cases. This often entails tradeoffs against the
 117 performance attainable for narrower use cases that some applications
 118 may have. The most visible of these concerns are the tension between
 119 strict iterator invalidation semantics and cache-friendly memory
 120 layout. We recommend that developers whose code includes container
 121 access in performance-critical elements explore alternative
 122 implementations, such as those published by EA, Google, and Facebook,
 123 or potentially write custom implementations tailored to their
 124 application's needs.
 125
 126 In benchmarks across a range of compilers, we have found that avoiding
 127 use of exceptions (i.e. \verb+throw/catch+) and disabling support for
 128 them with compiler flags can produce higher-performance code,
 129 especially with aggressive optimization settings enabled. The runtime
 130 system does not use exceptions internally. If your goal as an
 131 application developer is to most efficiently use large-scale
 132 computational resources, we recommend alternative error-handling
 133 strategies.
 134
 135 \subsection{Why do I get a link error when mixing Fortran and C/C++?}
 136
 137 \label{f2c}
 138
 139 Fortran compilers ``mangle'' their routine names in a variety
 140 of ways.  g77 and most compilers make names all lowercase, and
 141 append an underscore, like ``foo\_''.  The IBM xlf compiler makes
 142 names all lowercase without an underscore, like ``foo''. Absoft f90
 143 makes names all uppercase, like ``FOO''.
 144
 145 If the Fortran compiler expects a routine to be named ``foo\_'',
 146 but you only define a C routine named ``foo'', you'll get a link
 147 error (``undefined symbol foo\_'').  Sometimes the UNIX command-line
 148 tool {\em nm} (list symbols in a .o or .a file) can help you see exactly what the
 149 Fortran compiler is asking for, compared to what you're providing.
 150
 151 Charm++ automatically detects the fortran name mangling scheme
 152 at configure time, and provides a C/C++ macro ``FTN\_NAME'', in ``charm-api.h'',
 153 that expands to a properly mangled fortran routine name.
 154 You pass the FTN\_NAME macro
 155 two copies of the routine name: once in all uppercase, and again
 156 in all lowercase.
 157 The FTN\_NAME macro then picks the appropriate name and applies any
 158 needed underscores.  ``charm-api.h'' also includes a macro ``FDECL''
 159 that makes the symbol linkable from fortran (in C++, this expands
 160 to extern ``C''), so a complete Fortran subroutine looks like in C or C++:
 161 \begin{alltt}
 162 FDECL void FTN\_NAME(FOO,foo)(void);
 163 \end{alltt}
 164
 165 This same syntax can be used for C/C++ routines called from
 166 fortran, or for calling fortran routines from C/C++.
 167 We strongly recommend using FTN\_NAME instead of hardcoding your
 168 favorite compiler's name mangling into the C routines.
 169
 170 If designing an API with the same routine names in C and
 171 Fortran, be sure to include both upper and lowercase letters
 172 in your routine names.  This way, the C name (with mixed case)
 173 will be different from all possible Fortran manglings (which
 174 all have uniform case).  For example, a routine named ``foo''
 175 will have the same name in C and Fortran when using the IBM
 176 xlf compilers, which is bad because the C and Fortran versions
 177 should take different parameters.  A routine named ``Foo'' does
 178 not suffer from this problem, because the C version is ``Foo,
 179 while the Fortran version is ``foo\_'', ``foo'', or ``FOO''.
 180
 181 \subsection{How does parameter passing work between Fortran and C?}
 182
 183 Fortran and C have rather different parameter-passing
 184 conventions, but it is possible to pass simple objects
 185 back and forth between Fortran and C:
 186
 187 \begin{itemize}
 188
 189 \item Fortran and C/C++ data types are generally completely
 190 interchangeable:
 191
 192 \begin{tabular}{|l|l|}
 193 \hline
 194 \textbf{C/C++ Type} & \textbf{Fortran Type} \\
 195 \hline
 196 int & INTEGER, LOGICAL \\
 197 double & DOUBLE PRECISION, REAL*8 \\
 198 float & REAL, REAL*4 \\
 199 char & CHARACTER \\
 200 \hline
 201 \end{tabular}
 202
 203 \item Fortran internally passes everything, including
 204 constants, integers, and doubles, by passing a pointer
 205 to the object.  Hence a fortran ``INTEGER'' argument becomes
 206 an ``int *'' in C/C++:
 207 \begin{alltt}
 208 /* Fortran */
 209 SUBROUTINE BAR(i)
 210     INTEGER :: i
 211     x=i
 212 END SUBROUTINE
 213
 214 /* C/C++ */
 215 FDECL void FTN\_NAME(BAR,bar)(int *i) \{
 216     x=*i;
 217 \}
 218 \end{alltt}
 219
 220 \item 1D arrays are passed exactly the same in Fortran and C/C++:
 221 both languages pass the array by passing the address of the
 222 first element of the array.
 223 Hence a fortran ``INTEGER, DIMENSION(:)'' array is an ``int *''
 224 in C or C++.  However, Fortran programmers normally think of
 225 their array indices as starting from index 1, while in C/C++
 226 arrays always start from index 0.  This does NOT change how
 227 arrays are passed in, so x is actually the same in both
 228 these subroutines:
 229 \begin{alltt}
 230 /* Fortran */
 231 SUBROUTINE BAR(arr)
 232     INTEGER :: arr(3)
 233     x=arr(1)
 234 END SUBROUTINE
 235
 236 /* C/C++ */
 237 FDECL void FTN\_NAME(BAR,bar)(int *arr) \{
 238     x=arr[0];
 239 \}
 240 \end{alltt}
 241
 242 \item There is a subtle but important difference between the way
 243 f77 and f90 pass array arguments.  f90 will pass an array object
 244 (which is not intelligible from C/C++) instead of a simple pointer
 245 if all of the following are true:
 246 \begin{itemize}
 247 \item A f90 ``INTERFACE'' statement is available on the call side.
 248 \item The subroutine is declared as taking an unspecified-length
 249        array (e.g., ``myArr(:)'') or POINTER variable.
 250 \end{itemize}
 251 Because these f90 array objects can't be used from C/C++, we recommend
 252 C/C++ routines either provide no f90 INTERFACE or else all the arrays
 253 in the INTERFACE are given explicit lengths.
 254
 255 \item Multidimensional allocatable arrays are stored with
 256 the smallest index first in Fortran.  C/C++ do not support
 257 allocatable multidimensional arrays, so they must fake them
 258 using arrays of pointers or index arithmetic.
 259
 260 \begin{alltt}
 261 /* Fortran */
 262 SUBROUTINE BAR2(arr,len1,len2)
 263     INTEGER :: arr(len1,len2)
 264     INTEGER :: i,j
 265     DO j=1,len2
 266       DO i=1,len1
 267         arr(i,j)=i;
 268       END DO
 269     END DO
 270 END SUBROUTINE
 271
 272 /* C/C++ */
 273 FDECL void FTN\_NAME(BAR2,bar2)(int *arr,int *len1p,int *len2p) \{
 274     int i,j; int len1=*len1p, len2=*len2p;
 275     for (j=0;j<len2;j++)
 276     for (i=0;i<len1;i++)
 277         arr[i+j*len1]=i;
 278 \}
 279 \end{alltt}
 280
 281 \item Fortran strings are passed in a very strange fashion.
 282 A string argument is passed as a character pointer and a
 283 length, but the length field, unlike all other Fortran arguments,
 284 is passed by value, and goes after all other arguments.
 285 Hence
 286
 287 \begin{alltt}
 288 /* Fortran */
 289 SUBROUTINE CALL\_BARS(arg)
 290     INTEGER :: arg
 291     CALL BARS('some string',arg);
 292 END SUBROUTINE
 293
 294 /* C/C++ */
 295 FDECL void FTN\_NAME(BARS,bars)(char *str,int *arg,int strlen) \{
 296     char *s=(char *)malloc(strlen+1);
 297     memcpy(s,str,strlen);
 298     s[strlen]=0; /* nul-terminate string */
 299     printf("Received Fortran string '\%s' (\%d characters){\textbackslash}n",s,strlen);
 300     free(s);
 301 \}
 302 \end{alltt}
 303
 304
 305 \item A f90 named TYPE can sometimes successfully be passed into a
 306 C/C++ struct, but this can fail if the compilers insert different
 307 amounts of padding.  There does not seem to be a portable way to
 308 pass f90 POINTER variables into C/C++, since different compilers
 309 represent POINTER variables differently.
 310
 311 \end{itemize}
 312
 313 \subsection{How do I use Charm++ on Xeon Phi?}
 314
 315 In general, no changes are required to use Charm++ on Xeon Phis. To
 316 compile code for Knights Landing, no special flags are required. To
 317 compile code for Knights Corner, one should build Charm++ with the
 318 {\tt mic} option. In terms of network layers, we currently recommend
 319 building the MPI layer ({\tt mpi-linux-x86\_64}) except for machines with
 320 custom network layers, such as Cray systems, on which we recommend
 321 building for the custom layer ({\tt gni-crayxc} for Cray XC machines,
 322 for example). To enable AVX-512 vector instructions, Charm++ can be
 323 built with {\tt -xMIC-AVX512} on Intel compilers or {\tt -mavx512f
 324   -mavx512er -mavx512cd -mavx512pf} for GNU compilers.
 325
 326 \subsection{How do I use \charm{} on GPUs?}
 327 \charm{} users have two options when utilizing GPUs in \charm.
 328
 329 The first is to write CUDA (or OpenCL, etc) code directly in their \charm{}
 330 applications. This does not take advantage of any of the special GPU-friendly
 331 features the \charm{} runtime provides and is similar to how programmers utilize
 332 GPUs in other parallel environments, e.g. MPI.
 333
 334 The second option is to leverage \charm's GPU library, GPU Manager. This library
 335 provides several useful features including:
 336 \begin{itemize}
 337 \item Automated data movement
 338 \item Ability to invoke callbacks at various points
 339 \item Host side pinned memory pooling
 340 \item Asynchronous kernel invocation
 341 \item Integrated tracing in Projections
 342 \end{itemize}
 343
 344 To do this, \charm{} must be built with the \texttt{cuda} option. Users must
 345 describe their kernels using a work request struct, which includes the buffers
 346 to be copied, callbacks to be invoked, and kernel to be executed. Additionally,
 347 users can take advantage of a pre-allocated host side pinned memory pool
 348 allocated by the runtime via invoking \texttt{hapi\_poolMalloc}. Finally, the
 349 user must compile this code using the appropriate \texttt{nvcc} compiler as per
 350 usual.
 351
 352 More details on using GPUs in \charm{} can be found in the
 353 \htmladdnormallink{GPU Manager Library}{http://charm.cs.illinois.edu/manuals/html/libraries/6.html}
 354 entry in the larger Libraries Manual.
 355