Document that zero copy api mostly offers only memory footprint benefit
[charm.git] / doc / charm++ / zerocopyapi.tex
blobc51e9c18ed825b5be8ea6826f53d858f54e99108
1 \section{Zero Copy Message Send API}
3 \label{nocopyapi}
4 Apart from using messages, \charmpp{} also provides a zero copy message send
5 API to avoid copies for entry method invocations which use parameter marshalling
6 instead of messages. This makes use of onesided communication by using the
7 underlying Remote Direct Memory Access (RDMA) enabled network.
8 For large arrays (few 100 KBs or more), the cost of copying during marshalling
9 the message can be quite high. Using this API can help not only save
10 the expensive copy operation but also reduce the application's memory footprint
11 by avoiding data duplication. Saving these costs for large arrays proves
12 to be a significant optimization in achieving faster message send times.
13 On the other hand, using the zero copy message send API for small arrays can lead
14 to a drop in performance due to the overhead associated with onesided communication.
16 \vspace{0.1in}
17 \noindent
18 To send an array using the zero copy message send API, specify the array parameter
19 in the .ci file with the rdma specifier.
21 \begin{alltt}
22 entry void foo (int size, rdma int arr[size]);
23 \end{alltt}
25 While calling the entry method from the .C file, wrap the array i.e the
26 pointer in an rdma wrapper.
28 \begin{alltt}
29 arrayProxy[0].foo(500000, rdma(arrPtr));
30 \end{alltt}
32 Until the RDMA operation is completed, it is not safe to modify the buffer.
33 To be notified on completion of the RDMA operation, pass an optional callback object
34 in the rdma wrapper associated with the specific rdma array.
36 \begin{alltt}
37 CkCallback cb(CkIndex_Foo::rdmaSent(NULL), thisProxy[thisIndex]);
38 arrayProxy[0].foo(500000, rdma(arrPtr, cb));
39 \end{alltt}
41 The callback will be invoked on completion of the RDMA operation associated with the
42 corresponding array. Inside the callback, it is safe to overwrite the buffer sent
43 via the zero copy API and this buffer can be accessed by dereferencing the CkDataMsg
44 received in the callback.
46 \begin{alltt}
47 //called when RDMA operation is completed
48 void rdmaSent(CkDataMsg *m)
50 //get access to the pointer and free the allocated buffer
51 void *ptr = *((void **)(m->data));
52 free(ptr);
53 delete m;
55 \end{alltt}
57 The RDMA call is associated with an rdma array rather than the entry method.
58 In the case of sending multiple rdma arrays, each RDMA call is independent of the other.
59 Hence, the callback applies to only the array it is attached to and not to all the rdma
60 arrays passed in an entry method invocation. On completion of the RDMA call for each
61 array, the corresponding callback is separately invoked.
63 As an example, for an entry method with two rdma array parameters, each called with the same
64 callback, the callback will be invoked twice : on completing the transfer of each of the two
65 RDMA parameters.
67 \vspace{0.1in}
68 \noindent
69 For multiple arrays to be sent via RDMA, declare the entry method in the .ci file as:
71 \begin{alltt}
72 entry void foo (int size1, rdma int arr1[size1], int size2, rdma double arr2[size2]);
73 \end{alltt}
75 In the .C file, it is also possible to have different callbacks associated with each rdma array.
76 \begin{alltt}
77 CkCallback cb1(CkIndex_Foo::rdmaSent1(NULL), thisProxy[thisIndex]);
78 CkCallback cb2(CkIndex_Foo::rdmaSent2(NULL), thisProxy[thisIndex]);
79 arrayProxy[0].foo(500000, rdma(arrPtr1, cb1), 1024000, rdma(arrPtr2, cb2));
80 \end{alltt}
82 This API is demonstrated in \examplerefdir{rdma/simpleRdma} and \testrefdir{pingpong}
84 \vspace{0.1in}
85 \noindent
86 It should be noted that calls to entry methods with rdma specified parameters are
87 currently only supported for point to point operations and not for collective operations.
88 Additionally, there is also no support for migration of chares that have pending RDMA transfer
89 requests.
91 \vspace{0.1in}
92 \noindent
93 It should also be noted that the benefit of this API can be seen for large arrays on
94 only RDMA enabled networks. On networks which do not support RDMA, the API is functional
95 but doesn't show any performance benefit as it behaves like a regular entry method that
96 copies its arguments. Currently, the benefit of the API is mostly in terms of reducing
97 application memory footprint as the API is largely unoptimized. Optimized versions of this
98 API are expected to be released in the future.