Follow-up to #2950: do not include valgrind.h on Windows
[charm.git] / doc / armci / manual.tex
blob7e482b923e6f1d83460a6bf16b32b225c1b2a66e
1 %\documentclass[10pt,dvips]{article}
2 \documentclass[10pt]{article}
3 \usepackage{../pplmanual}
4 \usepackage[pdftex]{graphicx}
5 %\usepackage[dvips]{graphicx}
6 %\usepackage[usenames,dvipsnames]{color}
7 %\usepackage[pdftex]{hyperref}
8 \usepackage{epsfig}
9 \input{../pplmanual}
11 \ifpdf
12 \DeclareGraphicsExtensions{.jpg,.pdf,.mps,.png}
13 \else
14 \DeclareGraphicsExtensions{.eps}
15 \fi
17 \title{ARMCI Interface under \charmpp{}}
18 \version{1.0}
19 \credits{
20 Chee Wai Lee and Chao Huang
23 \begin{document}
24 \maketitle
26 \section{Introduction}
27 \label{sec::introduction}
29 This manual describes the basic features and API of the Aggregate
30 Remote Memory Copy Interface (ARMCI) library implemented under
31 \charmpp{}. It is meant for developers using ARMCI who desire the
32 performance features of the \charmpp{} run-time system (e.g. dynamic
33 load balancing, fault tolerance and scalability) applied transparently
34 to their libraries written using the ARMCI API.
36 ARMCI is a library that supports remote memory copy functionality. It
37 focuses on non-contiguous data transfers and is meant to be used by
38 other libraries as opposed to application development. Libraries that
39 the original ARMCI developers have targeted include Global Arrays,
40 P++/Overture and the Adlib PCRC run-time system.
42 ARMCI remote copy operations are one-sided and complete, regardless of
43 the actions taken by the remote process. For performance reasons,
44 polling can be helpful but should not be necessary to ensure
45 progress. The operations are ordered when referencing the same remote
46 process. Operations issued to different processes can complete in an
47 arbitrary order. Both blocking and non-blocking APIs are supported.
49 ARMCI supports three classes of operations: data transfer using {\em
50 put}, {\em get} and {\em accumulate} operations; synchronization with
51 local and global {\em fence} operations and atomic read-modify-write;
52 and utility functions for memory management and error handling. {\em
53 Accumulate} and atomic read-modify-write operations are currently not
54 implemented for the charmpp{} port.
56 A {\em get} operation transfers data from the remote process memory
57 (source) to the calling processing local memory (destination). A {\em
58 put} operation transfers data from the local memory of the calling
59 process (source) to the memory of the remote process (destination).
61 This manual will include several useful \charmpp{}-specific extensions to
62 the ARMCI API. It will also list the functions that have not yet been
63 implemented but exists in the original ARMCI implementation. Readers
64 of this manual are advised to refer to the original ARMCI
65 documentation (See Section \ref{sec::related doc}) for more complete
66 information and motivation for the development of this library.
68 \subsection{Building ARMCI Support under The \charmpp{} Runtime System}
69 \label{sec::charm build}
71 Build charm target ARMCI (instead of charm or AMPI):
72 \begin{verbatim}
73 > cd charm
74 > ./build ARMCI net-linux -O3
75 \end{verbatim}
77 \subsection{Writing a Simple ARMCI Program}
78 \label{sec::simple program}
80 The following simple example has two processes place their own string
81 into the global array and then acquire the appropriate string from the
82 other's global address space in order to print ``hello world''.
84 The main function has to be compliant to ANSI C:
86 \begin{verbatim}
87 #include <stdio.h>
88 #include <stdlib.h>
89 #include <string.h>
91 #include <armci.h>
93 #define MAX_PROCESSORS 2
95 int main(int argc, char * argv[]) {
96 void *baseAddress[MAX_PROCESSORS];
97 char *myBuffer;
98 int thisImage;
100 // initialize
101 ARMCI_Init();
102 ARMCI_Myid(&thisImage);
104 // allocate data (collective operation)
105 ARMCI_Malloc(baseAddress, strlen("hello")+1);
107 if (thisImage == 0) {
108 sprintf((char *)baseAddress[0], "%s", "hello");
109 } else if (thisImage == 1) {
110 sprintf((char *)baseAddress[1], "%s", "world");
113 // allocate space for local buffer
114 myBuffer = (char *)AMRCI_Malloc_local(strlen("hello")+1);
116 ARMCI_Barrier();
118 if (thisImage == 0) {
119 ARMCI_Get(baseAddress[1], myBuffer, strlen("hello")+1, 1);
120 printf("[%d] %s %s\n",thisImage, baseAddress[0], myBuffer);
121 } else if (thisImage == 1) {
122 ARMCI_Get(baseAddress[0], myBuffer, strlen("hello")+1, 0);
123 printf("[%d] %s %s\n",thisImage, myBuffer, baseAddress[1]);
126 // finalize
127 ARMCI_Finalize();
128 return 0;
130 \end{verbatim}
132 \subsection{Building an ARMCI Binary and Execution}
133 \label{sec::armci build}
135 Compiling the code with:
136 \begin{verbatim}
137 > charm/bin/charmc -c hello.c /$(OPTS)
138 \end{verbatim}
140 \noindent
141 Linking the program with:
142 \begin{verbatim}
143 > charm/bin/charmc hello.o -o hello -swapglobals -memory isomalloc -language armci $(OPTS)
144 \end{verbatim}
146 \noindent
147 Run the program:
148 \begin{verbatim}
149 > ./charmrun ./hello +p2 +vp8
150 \end{verbatim}
152 \section{ARMCI Data Structures}
153 \label{sec::data structures}
155 ARMCI provides two formats to describe non-contiguous layouts of data
156 in memory.
158 The {\em generalized I/O vector} is the most general format intended
159 for multiple sets of equally sized data segments to be moved between
160 arbitrary local and remote memory locations. It uses two arrays of
161 pointers: one for source and one for destination addresses. The length
162 of each array is equal to the number of segments.
164 \begin{verbatim}
165 typedef struct {
166 void *src_ptr_ar;
167 void *dst_ptr_ar;
168 int bytes;
169 int ptr_ar_len;
170 } armci_giov_t;
171 \end{verbatim}
173 Currently, there is no support for {\em generalized I/O vector}
174 operations in the charmpp{} implementation.
176 The {\em strided} format is an optimization of the generalized I/O
177 vector format. It is intended to minimize storage required to describe
178 sections of dense multi-dimensional arrays. Instead of including
179 addresses for all the segments, it specifies only an address of the
180 first segment in the set for source and destination. The addresses of
181 the other segments can be computed using the stride information.
183 \section{Application Programmer's Interface}
184 \label{sec::api}
186 The following is a list of functions supported on the \charmpp{} port
187 of ARMCI. The integer value returned by most ARMCI operations
188 represents the error code. The zero value is successful, other values
189 represent failure (See Section \ref{sec::error codes} for details).
191 \subsection{Startup, Cleanup and Status Functions}
193 \begin{verbatim}
194 int ARMCI_Init(void);
195 \end{verbatim}
196 Initializes the ARMCI library. This function must be called before any
197 ARMCI functions may be used.
199 \begin{verbatim}
200 int ARMCI_Finalize(void);
201 \end{verbatim}
202 Shuts down the ARMCI library. No ARMCI functions may be called after
203 this call is made. It must be used before terminating the program normally.
205 \begin{verbatim}
206 void ARMCI_Cleanup(void);
207 \end{verbatim}
208 Releases system resources that the ARMCI library might be holding. This is
209 intended to be used before terminating the program in case of error.
211 \begin{verbatim}
212 void ARMCI_Error(char *msg, int code);
213 \end{verbatim}
214 Combines the functionality of ARMCI\_Cleanup and \charmpp{}'s CkAbort
215 call. Prints to {\em stdout} and {\em stderr} {\tt msg} followed by an
216 integer {\tt code}.
218 \begin{verbatim}
219 int ARMCI_Procs(int *procs);
220 \end{verbatim}
221 The number of processes is stored in the address {\tt procs}.
223 \begin{verbatim}
224 int ARMCI_Myid(int *myid);
225 \end{verbatim}
226 The id of the process making this call is stored in the address {\tt myid}.
228 \subsection{ARMCI Memory Allocation}
230 \begin{verbatim}
231 int ARMCI_Malloc(void* ptr_arr[], int bytes);
232 \end{verbatim}
233 Collective operation to allocate memory that can be used in the
234 context of ARMCI copy operations. Memory of size {\tt bytes} is
235 allocated on each process. The pointer address of each process'
236 allocated memory is stored at {\tt ptr\_arr[]} indexed by the process'
237 id (see {\tt ARMCI\_Myid}). Each process gets a copy of {\tt ptr\_arr}.
239 \begin{verbatim}
240 int ARMCI_Free(void *ptr);
241 \end{verbatim}
242 Collective operation to free memory which was allocated by
243 {\tt ARMCI\_Malloc}.
245 \begin{verbatim}
246 void *ARMCI_Malloc_local(int bytes);
247 \end{verbatim}
248 Local memory of size {\tt bytes} allocated. Essentially a wrapper for
249 {\tt malloc}.
251 \begin{verbatim}
252 int ARMCI_Free_local(void *ptr);
253 \end{verbatim}
254 Local memory address pointed to by {\tt ptr} is freed. Essentially a
255 wrapper for {\tt free}.
257 \subsection{Put and Get Communication}
259 \begin{verbatim}
260 int ARMCI_Put(void *src, void *dst, int bytes, int proc);
261 \end{verbatim}
262 Transfer contiguous data of size {\tt bytes} from the local process
263 memory (source) pointed to by {\tt src} into the remote memory of
264 process id {\tt proc} pointed to by {\tt dst} (remote memory pointer
265 at destination).
267 \begin{verbatim}
268 int ARMCI_NbPut(void *src, void* dst, int bytes, int proc,
269 armci_hdl_t *handle);
270 \end{verbatim}
271 The non-blocking version of {\tt ARMCI\_Put}. Passing a {\tt NULL}
272 value to {\tt handle} makes this function perform an implicit handle
273 non-blocking transfer.
275 \begin{verbatim}
276 int ARMCI_PutS(void *src_ptr, int src_stride_ar[],
277 void *dst_ptr, int dst_stride_ar[],
278 int count[], int stride_levels, int proc);
279 \end{verbatim}
280 Transfer strided data from the local process memory (source) into
281 remote memory of process id {\tt proc}. {\tt src\_ptr} points to the
282 first memory segment in local process memory. {\tt dst\_ptr} is a
283 remote memory address that points to the first memory segment in the
284 memory of process {\tt proc}. {\tt stride\_levels} represents the
285 number of additional dimensions of striding beyond 1. {\tt
286 src\_stride\_ar} is an array of size {\tt stride\_levels} whose values
287 indicate the number of bytes to skip on the local process memory
288 layout. {\tt dst\_stride\_ar} is an array of size {\tt stride\_levels}
289 whose values indicate the number of bytes to skip on process {\tt
290 proc}'s memory layout. {\tt count} is an array of size {\tt
291 stride\_levels + 1} whose values indicate the number of bytes to copy.
293 As an example, assume two 2-dimensional C arrays residing on different
294 processes.
296 \begin{verbatim}
297 double A[10][20]; /* local process */
298 double B[20][30]; /* remote process */
299 \end{verbatim}
301 To put a block of data of 3x6 doubles starting at location (1,2) in
302 {\tt A} into location (3,4) in {\tt B}, the arguments to {\tt
303 ARMCI\_PutS} will be as follows (assuming C/C++ memory layout):
305 \begin{verbatim}
306 src_ptr = &A[0][0] + (1 * 20 + 2); /* location (1,2) */
307 src_stride_ar[0] = 20 * sizeof(double);
308 dst_ptr = &B[0][0] + (3 * 30 + 4); /* location (3,4) */
309 dst_stride_ar[0] = 30 * sizeof(double);
310 count[0] = 6; * sizeof(double); /* contiguous data */
311 count[1] = 3; /* number of rows of contiguous data */
312 stride_levels = 1;
313 proc = <B's id>;
314 \end{verbatim}
316 \begin{verbatim}
317 int ARMCI_NbPutS(void *src_ptr, int src_stride_ar[],
318 void *dst_ptr, int dst_stride_ar[],
319 int count[], int stride_levels, int proc
320 armci_hdl_t *handle);
321 \end{verbatim}
322 The non-blocking version of {\tt ARMCI\_PutS}. Passing a {\tt NULL}
323 value to {\tt handle} makes this function perform an implicit handle
324 non-blocking transfer.
326 \begin{verbatim}
327 int ARMCI_Get(void *src, void *dst, int bytes, int proc);
328 \end{verbatim}
329 Transfer contiguous data of size {\tt bytes} from the remote process
330 memory at process {\tt proc} (source) pointed to by {\tt src} into the
331 local memory of the calling process pointed to by {\tt dst}.
333 \begin{verbatim}
334 int ARMCI_NbGet(void *src, void *dst, int bytes, int proc,
335 armci_hdl_t *handle);
336 \end{verbatim}
337 The non-blocking version of {\tt ARMCI\_Get}. Passing a {\tt NULL}
338 value to {\tt handle} makes this function perform an implicit handle
339 non-blocking transfer.
341 \begin{verbatim}
342 int ARMCI_GetS(void *src_ptr, int src_stride_ar[],
343 void* dst_ptr, int dst_stride_ar[],
344 int count[], int stride_levels, int proc);
345 \end{verbatim}
346 Transfer strided data segments from remote process memory on process
347 {\tt proc} to the local memory of the calling process. The semantics
348 of the parameters to this function are the same as that for {\tt
349 ARMCI\_PutS}.
351 \begin{verbatim}
352 int ARMCI_NbGetS(void *src_ptr, int src_stride_ar[],
353 void* dst_ptr, int dst_stride_ar[],
354 int count[], int stride_levels, int proc,
355 armci_hdl_t *handle);
356 \end{verbatim}
357 The non-blocking version of {\tt ARMCI\_GetS}. Passing a {\tt NULL}
358 value to {\tt handle} makes this function perform an implicit handle
359 non-blocking transfer.
361 \subsection{Explicit Synchronization}
363 \begin{verbatim}
364 int ARMCI_Wait(armci_hdl_t *handle);
365 int ARMCI_WaitProc(int proc);
366 int ARMCI_WaitAll();
367 int ARMCI_Test(armci_hdl_t *handle);
368 int ARMCI_Barrier();
369 \end{verbatim}
371 \begin{verbatim}
372 int ARMCI_Fence(int proc);
373 \end{verbatim}
374 Blocks the calling process until all {\em put} or {\em accumulate}
375 operations the process issued to the remote process {\tt proc} are
376 completed at the destination.
378 \begin{verbatim}
379 int ARMCI_AllFence(void);
380 \end{verbatim}
381 Blocks the calling process until all outstanding {\em put} or {\em
382 accumulate} operations it issued are completed on all remote
383 destinations.
385 \subsection{Extensions to the Standard API}
386 \label{sec::extensions}
388 \begin{verbatim}
389 void ARMCI_Migrate(void);
390 void ARMCI_Async_Migrate(void);
391 void ARMCI_Checkpoint(char* dirname);
392 void ARMCI_MemCheckpoint(void);
394 int armci_notify(int proc);
395 int armci_notify_wait(int proc, int *pval);
396 \end{verbatim}
398 \section{List of Unimplemented Functions}
400 The following functions are supported on the standard ARMCI
401 implementation but not yet supported in the \charmpp{} port.
403 \begin{verbatim}
404 int ARMCI_GetV(...);
405 int ARMCI_NbGetV(...);
406 int ARMCI_PutV(...);
407 int ARMCI_NbPutV(...);
408 int ARMCI_AccV(...);
409 int ARMCI_NbAccV(...);
411 int ARMCI_Acc(...);
412 int ARMCI_NbAcc(...);
413 int ARMCI_AccS(...);
414 int ARMCI_NbAccS(...);
416 int ARMCI_PutValueLong(long src, void* dst, int proc);
417 int ARMCI_PutValueInt(int src, void* dst, int proc);
418 int ARMCI_PutValueFloat(float src, void* dst, int proc);
419 int ARMCI_PutValueDouble(double src, void* dst, int proc);
420 int ARMCI_NbPutValueLong(long src, void* dst, int proc, armci_hdl_t* handle);
421 int ARMCI_NbPutValueInt(int src, void* dst, int proc, armci_hdl_t* handle);
422 int ARMCI_NbPutValueFloat(float src, void* dst, int proc, armci_hdl_t* handle);
423 int ARMCI_NbPutValueDouble(double src, void* dst, int proc, armci_hdl_t* handle);
424 long ARMCI_GetValueLong(void *src, int proc);
425 int ARMCI_GetValueInt(void *src, int proc);
426 float ARMCI_GetValueFloat(void *src, int proc);
427 double ARMCI_GetValueDouble(void *src, int proc);
429 void ARMCI_SET_AGGREGATE_HANDLE (armci_hdl_t* handle);
430 void ARMCI_UNSET_AGGREGATE_HANDLE (armci_hdl_t* handle);
432 int ARMCI_Rmw(int op, int *ploc, int *prem, int extra, int proc);
433 int ARMCI_Create_mutexes(int num);
434 int ARMCI_Destroy_mutexes(void);
435 void ARMCI_Lock(int mutex, int proc);
436 void ARMCI_Unlock(int mutex, int proc);
437 \end{verbatim}
439 \section{Error Codes}
440 \label{sec::error codes}
442 As of this writing, attempts to locate the documented error codes have
443 failed because the release notes have not been found. Attempts are
444 being made to derive these from the ARMCI source directly. Currently
445 \charmpp{} implementation does not implement any error codes.
447 \section{Related Manuals and Documents}
448 \label{sec::related doc}
450 \noindent
451 ARMCI website:
452 \begin{verbatim}
453 http://www.emsl.pnl.gov/docs/parsoft/armci/index.html
454 \end{verbatim}
456 \end{document}