Doc #1605: Update FAQ text about C and C++ language usage
[charm.git] / doc / faq / ports.tex
blob5eb938f60a9a3b575d53adc343d61849f73d56da
1 \section{Versions and Ports}
3 \subsection{Has Charm++ been ported to use MPI underneath? What about OpenMP?}
5 Charm++ supports MPI and can use it as the underlying communication
6 library. We have tested on MPICH, OpenMPI, and also most vendor MPI
7 variants. Charm++ also has explicit support for SMP nodes in MPI
8 version. Charm++ hasn't been ported to use OpenMP, but OpenMP can be
9 used from Charm++.
11 \subsection{How complicated is porting Charm++/Converse?}
13 Depends. Hopefully, the porting only involves fixing compiler compatibility
14 issues. The LRTS abstraction layer was designed to simplify this process and has been used for the
15 MPI, Verbs, uGNI, and PAMI layers. User level threads and Isomalloc support may require special
16 platform specific support. Otherwise Charm++ is generally platform independent.
18 \subsection{If the source is available how feasible would it be for us to do ports
19 ourselves?}
21 The source is always available, and you're welcome to make it run anywhere.
22 Any kind of UNIX, Windows, and MacOS machine should be straightforward: just a
23 few modifications to {\tt charm/src/arch/.../conv-mach.h} (for compiler
24 issues) and possibly
25 a new {\em machine.c} (if there's a new communication system involved).
26 However, porting to embedded hardware with a proprietary OS may be fairly difficult.
28 \subsection{To what platform has Charm++/Converse been ported to?}
30 Charm++/Converse has been ported to most UNIX and Linux OS, Windows, and MacOS.
32 \subsection{Is it hard to port Charm++ programs to different machines?}
34 \label{porting}
35 Charm++ itself it fully portable, and should provide exactly
36 the same interfaces everywhere (even if the implementations are
37 sometimes different). Still, it's often harder than we'd like
38 to port user code to new machines.
40 Many parallel machines have old or weird compilers, and
41 sometimes a strange operating system or unique set of libraries.
42 Hence porting code to a parallel machine can be suprisingly difficult.
44 Unless you're absolutely sure you will only run your code on a
45 single, known machine, we recommend you be very conservative in
46 your use of the language and libraries. ``But it works with my gcc!''
47 is often true, but not very useful.
49 Things that seem to work well everywhere include:
50 \begin{itemize}
51 \item Small, straightforward Makefiles. gmake-specific (e.g.,
52 ``ifeq'', filter variables) or convoluted makefiles can lead
53 to porting problems and confusion. Calling charmc instead
54 of the platform-specific compiler will save you many headaches,
55 as charmc abstracts away the platform specific flags.
56 \item Basically all of ANSI C and fortran 77 work everywhere. These seem
57 to be old enough to now have the bugs largely worked out.
58 %Thankfully, K\&R (no-prototype) C compilers have now died out.
59 \item C++ classes, inheritance, virtual methods, and namespaces
60 work without problems everywhere. Not so uniformly supported
61 are C++ templates, the STL, new-style C++ system headers,
62 and the other features listed in the C++ question below.
63 \end{itemize}
65 \subsection{How should I approach portability of C language code?}
67 Our suggestions for Charm++ developers are:
69 \begin{itemize}
70 \item Avoid the nonstandard type ``long long'', even though many compilers
71 happen to support it. Use CMK\_INT8 or CMK\_UINT8,
72 from conv-config.h, which are macros for the right thing.
73 ``long long'' is not supported on many 64-bit machines (where ``long''
74 is 64 bits) or on Windows machines (where it's ``\_\_int64'').
75 \item The ``long double'' type isn't present on all compilers. You can protect
76 long double code with {\em \#ifdef CMK\_LONG\_DOUBLE\_DEFINED} if it's really needed.
77 \item Never use C++ ``//'' comments in C code, or headers included by C.
78 This will not compile under many compilers. %including the IBM SP C compiler.
79 \item ``bzero'' and ``bcopy'' are BSD-specific calls.
80 Use memset and memcpy for portable programs.
81 \end{itemize}
83 If you're writing code that is expected to compile and run on
84 Microsoft Windows using the Visual C++ compiler (e.g. modification to
85 NAMD that you intend to submit for integration), that compiler has
86 limited support for the C99 standard, and Microsoft recommends using
87 C++ instead.
89 Many widely-used C compilers on HPC systems have limited support for
90 the C11 standard. If you want to use features of C11 in your code,
91 particularly \verb|_Atomic|, we recommend writing the code in C++
92 instead, since C++11 standard support is much more ubiquitous.
94 \subsection{How should I approach portability and performance of C++ language code?}
96 The Charm++ system developers are conservative about which C++
97 standard version is relied upon in runtime system code and what
98 features get used to ensure maximum portability across the broad range
99 of HPC systems and the compilers used on them. Through version 6.8.x,
100 the system code requires only limited support for C++11 features,
101 specifically variadic templates and R-value references. From version
102 6.9 onwards, the system will require a compiler and standard library
103 with at least full C++11 support.
105 A good reference for which compiler versions
106 provide what level of standard support can be found at
107 \url{http://en.cppreference.com/w/cpp/compiler_support}
109 Developers of several Charm++ applications have reported good results
110 using features in more recent C++ standards, with the caveat of
111 requiring that those applications be built with a sufficiently
112 up-to-date C++ compiler.
114 The containers specified in the C++ standard library are generally
115 designed to provide a very broad API that can be used correctly over
116 highly-varied use cases. This often entails tradeoffs against the
117 performance attainable for narrower use cases that some applications
118 may have. The most visible of these concerns are the tension between
119 strict iterator invalidation semantics and cache-friendly memory
120 layout. We recommend that developers whose code includes container
121 access in performance-critical elements explore alternative
122 implementations, such as those published by EA, Google, and Facebook,
123 or potentially write custom implementations tailored to their
124 application's needs.
126 In benchmarks across a range of compilers, we have found that avoiding
127 use of exceptions (i.e. \verb+throw/catch+) and disabling support for
128 them with compiler flags can produce higher-performance code,
129 especially with aggressive optimization settings enabled. The runtime
130 system does not use exceptions internally. If your goal as an
131 application developer is to most efficiently use large-scale
132 computational resources, we recommend alternative error-handling
133 strategies.
135 \subsection{Why do I get a link error when mixing Fortran and C/C++?}
137 \label{f2c}
139 Fortran compilers ``mangle'' their routine names in a variety
140 of ways. g77 and most compilers make names all lowercase, and
141 append an underscore, like ``foo\_''. The IBM xlf compiler makes
142 names all lowercase without an underscore, like ``foo''. Absoft f90
143 makes names all uppercase, like ``FOO''.
145 If the Fortran compiler expects a routine to be named ``foo\_'',
146 but you only define a C routine named ``foo'', you'll get a link
147 error (``undefined symbol foo\_''). Sometimes the UNIX command-line
148 tool {\em nm} (list symbols in a .o or .a file) can help you see exactly what the
149 Fortran compiler is asking for, compared to what you're providing.
151 Charm++ automatically detects the fortran name mangling scheme
152 at configure time, and provides a C/C++ macro ``FTN\_NAME'', in ``charm-api.h'',
153 that expands to a properly mangled fortran routine name.
154 You pass the FTN\_NAME macro
155 two copies of the routine name: once in all uppercase, and again
156 in all lowercase.
157 The FTN\_NAME macro then picks the appropriate name and applies any
158 needed underscores. ``charm-api.h'' also includes a macro ``FDECL''
159 that makes the symbol linkable from fortran (in C++, this expands
160 to extern ``C''), so a complete Fortran subroutine looks like in C or C++:
161 \begin{alltt}
162 FDECL void FTN\_NAME(FOO,foo)(void);
163 \end{alltt}
165 This same syntax can be used for C/C++ routines called from
166 fortran, or for calling fortran routines from C/C++.
167 We strongly recommend using FTN\_NAME instead of hardcoding your
168 favorite compiler's name mangling into the C routines.
170 If designing an API with the same routine names in C and
171 Fortran, be sure to include both upper and lowercase letters
172 in your routine names. This way, the C name (with mixed case)
173 will be different from all possible Fortran manglings (which
174 all have uniform case). For example, a routine named ``foo''
175 will have the same name in C and Fortran when using the IBM
176 xlf compilers, which is bad because the C and Fortran versions
177 should take different parameters. A routine named ``Foo'' does
178 not suffer from this problem, because the C version is ``Foo,
179 while the Fortran version is ``foo\_'', ``foo'', or ``FOO''.
181 \subsection{How does parameter passing work between Fortran and C?}
183 Fortran and C have rather different parameter-passing
184 conventions, but it is possible to pass simple objects
185 back and forth between Fortran and C:
187 \begin{itemize}
189 \item Fortran and C/C++ data types are generally completely
190 interchangeable:
192 \begin{tabular}{|l|l|}
193 \hline
194 \textbf{C/C++ Type} & \textbf{Fortran Type} \\
195 \hline
196 int & INTEGER, LOGICAL \\
197 double & DOUBLE PRECISION, REAL*8 \\
198 float & REAL, REAL*4 \\
199 char & CHARACTER \\
200 \hline
201 \end{tabular}
203 \item Fortran internally passes everything, including
204 constants, integers, and doubles, by passing a pointer
205 to the object. Hence a fortran ``INTEGER'' argument becomes
206 an ``int *'' in C/C++:
207 \begin{alltt}
208 /* Fortran */
209 SUBROUTINE BAR(i)
210 INTEGER :: i
212 END SUBROUTINE
214 /* C/C++ */
215 FDECL void FTN\_NAME(BAR,bar)(int *i) \{
216 x=*i;
218 \end{alltt}
220 \item 1D arrays are passed exactly the same in Fortran and C/C++:
221 both languages pass the array by passing the address of the
222 first element of the array.
223 Hence a fortran ``INTEGER, DIMENSION(:)'' array is an ``int *''
224 in C or C++. However, Fortran programmers normally think of
225 their array indices as starting from index 1, while in C/C++
226 arrays always start from index 0. This does NOT change how
227 arrays are passed in, so x is actually the same in both
228 these subroutines:
229 \begin{alltt}
230 /* Fortran */
231 SUBROUTINE BAR(arr)
232 INTEGER :: arr(3)
233 x=arr(1)
234 END SUBROUTINE
236 /* C/C++ */
237 FDECL void FTN\_NAME(BAR,bar)(int *arr) \{
238 x=arr[0];
240 \end{alltt}
242 \item There is a subtle but important difference between the way
243 f77 and f90 pass array arguments. f90 will pass an array object
244 (which is not intelligible from C/C++) instead of a simple pointer
245 if all of the following are true:
246 \begin{itemize}
247 \item A f90 ``INTERFACE'' statement is available on the call side.
248 \item The subroutine is declared as taking an unspecified-length
249 array (e.g., ``myArr(:)'') or POINTER variable.
250 \end{itemize}
251 Because these f90 array objects can't be used from C/C++, we recommend
252 C/C++ routines either provide no f90 INTERFACE or else all the arrays
253 in the INTERFACE are given explicit lengths.
255 \item Multidimensional allocatable arrays are stored with
256 the smallest index first in Fortran. C/C++ do not support
257 allocatable multidimensional arrays, so they must fake them
258 using arrays of pointers or index arithmetic.
260 \begin{alltt}
261 /* Fortran */
262 SUBROUTINE BAR2(arr,len1,len2)
263 INTEGER :: arr(len1,len2)
264 INTEGER :: i,j
265 DO j=1,len2
266 DO i=1,len1
267 arr(i,j)=i;
268 END DO
269 END DO
270 END SUBROUTINE
272 /* C/C++ */
273 FDECL void FTN\_NAME(BAR2,bar2)(int *arr,int *len1p,int *len2p) \{
274 int i,j; int len1=*len1p, len2=*len2p;
275 for (j=0;j<len2;j++)
276 for (i=0;i<len1;i++)
277 arr[i+j*len1]=i;
279 \end{alltt}
281 \item Fortran strings are passed in a very strange fashion.
282 A string argument is passed as a character pointer and a
283 length, but the length field, unlike all other Fortran arguments,
284 is passed by value, and goes after all other arguments.
285 Hence
287 \begin{alltt}
288 /* Fortran */
289 SUBROUTINE CALL\_BARS(arg)
290 INTEGER :: arg
291 CALL BARS('some string',arg);
292 END SUBROUTINE
294 /* C/C++ */
295 FDECL void FTN\_NAME(BARS,bars)(char *str,int *arg,int strlen) \{
296 char *s=(char *)malloc(strlen+1);
297 memcpy(s,str,strlen);
298 s[strlen]=0; /* nul-terminate string */
299 printf("Received Fortran string '\%s' (\%d characters){\textbackslash}n",s,strlen);
300 free(s);
302 \end{alltt}
305 \item A f90 named TYPE can sometimes successfully be passed into a
306 C/C++ struct, but this can fail if the compilers insert different
307 amounts of padding. There does not seem to be a portable way to
308 pass f90 POINTER variables into C/C++, since different compilers
309 represent POINTER variables differently.
311 \end{itemize}
313 \subsection{How do I use Charm++ on Xeon Phi?}
315 In general, no changes are required to use Charm++ on Xeon Phis. To
316 compile code for Knights Landing, no special flags are required. To
317 compile code for Knights Corner, one should build Charm++ with the
318 {\tt mic} option. In terms of network layers, we currently recommend
319 building the MPI layer ({\tt mpi-linux-x86\_64}) except for machines with
320 custom network layers, such as Cray systems, on which we recommend
321 building for the custom layer ({\tt gni-crayxc} for Cray XC machines,
322 for example). To enable AVX-512 vector instructions, Charm++ can be
323 built with {\tt -xMIC-AVX512} on Intel compilers or {\tt -mavx512f
324 -mavx512er -mavx512cd -mavx512pf} for GNU compilers.
326 \subsection{How do I use \charm{} on GPUs?}
327 \charm{} users have two options when utilizing GPUs in \charm.
329 The first is to write CUDA (or OpenCL, etc) code directly in their \charm{}
330 applications. This does not take advantage of any of the special GPU-friendly
331 features the \charm{} runtime provides and is similar to how programmers utilize
332 GPUs in other parallel environments, e.g. MPI.
334 The second option is to leverage \charm's GPU library, GPU Manager. This library
335 provides several useful features including:
336 \begin{itemize}
337 \item Automated data movement
338 \item Ability to invoke callbacks at various points
339 \item Host side pinned memory pooling
340 \item Asynchronous kernel invocation
341 \item Integrated tracing in Projections
342 \end{itemize}
344 To do this, \charm{} must be built with the \texttt{cuda} option. Users must
345 describe their kernels using a work request struct, which includes the buffers
346 to be copied, callbacks to be invoked, and kernel to be executed. Additionally,
347 users can take advantage of a pre-allocated host side pinned memory pool
348 allocated by the runtime via invoking \texttt{hapi\_poolMalloc}. Finally, the
349 user must compile this code using the appropriate \texttt{nvcc} compiler as per
350 usual.
352 More details on using GPUs in \charm{} can be found in the
353 \htmladdnormallink{GPU Manager Library}{http://charm.cs.illinois.edu/manuals/html/libraries/6.html}
354 entry in the larger Libraries Manual.