preparation for modularization, correction of copyright date coverage.
[CommonLispStat.git] / external / clem / doc / clem-performance.tex
blobaa103aebda80b79b50e50582966d45a1cb4b4354
1 \documentclass[10pt]{article}
2 \setcounter{secnumdepth}{5}
3 \usepackage{amssymb}
4 \usepackage{amsmath}
5 \usepackage{verbatim}
6 \usepackage{graphicx}
7 \usepackage{subfigure}
8 \usepackage{caption}
9 \usepackage{hyperref}
10 \usepackage{fancyheadings}
11 \usepackage{longtable}
12 \usepackage[letterpaper]{geometry}
13 \newcommand{\argmax}{\operatornamewithlimits{argmax}}\newcommand{\argmin}{\operatornamewithlimits{argmin}}
14 \title{CLEM Matrix Performance}
15 \author{Cyrus L. Harmon}
16 \geometry{verbose,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in}
17 \setcounter{topnumber}{2}
18 \setcounter{bottomnumber}{2}
19 \setcounter{totalnumber}{4} % 2 may work better
20 \setcounter{dbltopnumber}{2} % for 2-column pages
21 \renewcommand{\dbltopfraction}{0.9} % fit big float above 2-col. text
22 \renewcommand{\textfraction}{0.07} % allow minimal text w. figs
23 % Parameters for FLOAT pages (not text pages):
24 \renewcommand{\floatpagefraction}{0.7} % require fuller float pages
25 % N.B.: floatpagefraction MUST be less than topfraction !!
26 \renewcommand{\dblfloatpagefraction}{0.7} % require fuller float pages
27 \setlength{\captionmargin}{10pt}
28 \begin{document}
29 \maketitle
30 \let\mypdfximage\pdfximage
31 \def\pdfximage{\immediate\mypdfximage}
32 \baselineskip14pt
35 \clearpage
36 \section{CLEM Performance}
37 \baselineskip12pt
38 \subsection{Introduction}
39 \baselineskip14pt
40 Common Lisp is a high-level language that is a modern-day
41 member of the LISP family of languages. Common Lisp can be
42 either interpreted or compiled, or both, depending on the
43 implementation and modern compilers offer the promise of
44 performance on the order of that achieved by C and fortran
45 compilers. CLEM is a matrix math package for Common Lisp that
46 strives to offer high-performance matrix math routines, written
47 in Common Lisp. Common Lisp has a sophisticated type system and
48 it is a goal of CLEM to offer matrix representation and
49 opreations that exploit the features of this type
50 system. Furthermore, Common Lisp has a sophisticated object
51 model, the Common Lisp Object System (CLOS), and CLEM uses
52 features of CLOS and its companion metaobject system, the
53 Meta-object Protocol (MOP) to define its classes, objects and
54 methods.
56 Common Lisp implementations vary greatly in their
57 peformance, but the Common Lisp standard provides an
58 infrastructure for high-performance-capable implementations
59 to generate efficient code through the use of compiler
60 declarations of types and optimization settings. Lisp
61 implementations are free to compile lisp code to native
62 machine code, rather than either interpreting the lisp code
63 on the fly, or compiling the code to a byte-code
64 representation, that is then executed by a virtual
65 machine. This suggests that, subject to the limits placed on
66 the output by the common lisp language and to the
67 implementation details of the particular lisp system, a lisp
68 environment should be able to produce code that approaches
69 the speed and efficiency of other compiled languages. SBCL
70 is a Common Lisp implementation that compiles to native code
71 and has a sophisticated compiler called, somewhat
72 confusingly, Python, although the Python compiler predates
73 the Python interpreted language by many years. SBCL's compiler
74 uses type and optimization declarations to generate efficient
75 code by producing optimized routines that are specific to the
76 declared types, often representing data in an "unboxed" format,
77 ideally one that matches the type representation handled directly
78 by the CPU. One of CLEM's main goals is to provide efficient
79 matrix operations that utilize the capabilities of the lisp
80 compiler to generate efficient code.
82 \baselineskip12pt
83 \subsubsection{Boxed and Unboxed Representation of Lisp Data Objects}
84 \baselineskip14pt
85 Boxed representations of lisp values are generally stored
86 as tagged data objects where the tags serve to identify the type
87 of the particular object. Unboxed representations, on the other
88 hand, are merely represented by the data values themselves,
89 without any identifying metadata being stored with the
90 value. Obviously, the language environment needs to know, or to
91 be able to determine, the type of a particular data value. For
92 boxed values, the language environment can rely on the fact that
93 the type information is stored with the data value directly. For
94 unboxed values, the language system must keep track of the value
95 stored in a particular directly. The main advantage of using
96 unboxed datatypes is that the language environment, or compiled
97 code it produces, does not have to bother extracting the type
98 information from the boxed value and extracting the data as
99 appropriate. However, this has the disadvantage that the type of
100 data is then fixed to be that represented by the unboxed
101 type. Often, hybrid representations are used in structures such
102 as an array, where the array itself will be a dynamically typed,
103 boxed object while the values of the array will be unboxed
104 values placed in an a particular region of memory. By using
105 unboxed values for the elements of the array, the code can
106 directly access these values without the overhead of "unboxing"
107 the data. Boxed memory access often allocates memory,
108 or "conses" in lisp parlance, as a storage area is needed to
109 hold the unboxed value. In summary, sing boxed datatypes introduces
110 (at least) two important areas where work has to be done by the
111 language environment to access the data, the allocation of temporary
112 to store the resulting values that will be unboxed from the data, and
113 additional work on the part of the code to unbox the data in the
114 first place.
116 For these reasons, it is higly desirable for CLEM to use
117 unboxed data where possibly. In the inner loop of a matrix
118 operation, for instance, accessing a boxed data type can
119 introduce substantial performance penalties. Therefore, CLEM has gone
120 to great lengths to provide matrix representations that yield unboxed
121 values and operations that can operate on these values with
122 allocation of a minimum amount of memory.
124 \baselineskip12pt
125 \subsubsection{Avoiding Unneccessary Memory Allocation}
126 \baselineskip14pt
127 \clearpage
128 \section{CLEM Design Goals}
129 \baselineskip12pt
130 \subsection{CLEM Implementation Choices}
131 \baselineskip14pt
132 \clearpage
133 \section{Matrix Data Representation}
134 What options do we have for storing matrix data? Main choices
135 are lisp arrays or an external block of memory. Are there other
136 options here?
138 \baselineskip12pt
139 \subsection{Reification of Matrix Data Objects}
140 \baselineskip14pt
141 Defering, for a moment, the question of what form the
142 actual matrix data will take, let us consider the form of
143 the matrix object itself. It could be the object that
144 represents the data directly, or it could be an object, such
145 as an instance of a class or struct, that contains a
146 reference to the object that holds the data. In a sense, the
147 simplest approach to providing matrix arithmetic operations
148 is just to use common lisp arrays both to hold the data and
149 to be the direct representation of the matrix object. The
150 CLEM design assumes that there is additional data, besides
151 the data values stored in the array, or what have you, that
152 will be needed and that just using a lisp array as the
153 matrix itself is insufficient.
155 \baselineskip12pt
156 \subsection{Common Lisp Arrays}
157 \baselineskip14pt
158 Lisp arrays have the advantage that they are likely to
159 take advantage of the lisp type system. Yes, an
160 implementation may choose to ignroe this information, or
161 continue to produce the same code as it would for untyped
162 arrays, but a sufficently smart compiler, such as Python,
163 should use the array type information to produce efficient
164 code. Of course this also means that in order to get this
165 efficiency we have to provide this type information to the
166 compiler. As we try to modularize various pieces, it is
167 often the case that one would like to have generic code that
168 can work on matrices of any type. It these cases, additional
169 measures may be needed to coax the compiler into generating
170 efficient code, while doing so in a generic manner.
172 \baselineskip12pt
173 \subsubsection{One-dimensional or Multi-dimensional Arrays?}
174 \baselineskip14pt
175 One issue in dealing with lisp arrays is whether to use
176 the lisp facilitty for multi-dimensional array. One argument
177 in favor of native multi-dimensional arrays is that the
178 compiler can generate efficent code to access data in
179 certain multi-dimensional arrays, provided that this
180 information is known and passed to the compiler at
181 compile-time. On the other hand, using one-dimensional
182 arrays puts both the burden of and the flexibility of
183 computing array indices on the matrix package.
185 \baselineskip12pt
186 \subsubsection{Extensible arrays?}
187 \baselineskip14pt
188 \baselineskip12pt
189 \subsubsection{What About Lists?}
190 \baselineskip14pt
191 Lists are convenient for representing matrices in that
192 the iteration functions can be used to traverse the elements of
193 the matrix, yielding the famous trivial transpose operation
194 using mapcar and list. However, lists aren't designed for
195 efficient random access and are a poor choice for representing
196 anything but trivially small matrices.
198 It is worth mentioning the possibility of other approaches,
199 such as an external block of memory, perhaps allocated with either
200 non-standard routines of the lisp system, or via a foreign-function
201 interface, and to determine the offsets into this block of memory,
202 coerce the contents to a given lisp type and obtain the results. This
203 approach is used by some libraries that use clem, such as ch-image,
204 to access matrix/array data stored in memory in a non-native-lisp
205 form, such as matlisp matrices (which are really BLAS/LAPACK
206 matrices), fftw matrices, and arrays of data from TIFF images. While
207 this is nice to be able to do, it is unlikely to be practical for
208 storing matrix data, given the alternative of using lisp
209 arrays. Coercing of the contents of the matrix to lisp types is
210 done for us by optimized code in the compiler. It is unlikely that we
211 would be able to do a better job of this than the compiler. This
212 approach is useful for conversion of matrix data to other in-memory
213 formats, but unlikely to be useful for the typed lisp matrices for
214 which CLEM is designed. If we were to go this route, it would make
215 sense to use other, optimized, code libraries for operating on this
216 data, and this is what matlisp does, handing these blocks of memory
217 off to BLAS/LAPACK for processing in highly-optimized routines
218 written in C, Fortran or Assembly Language.
220 \clearpage
221 \section{Matrix Data Access}
222 \baselineskip12pt
223 \subsection{Now that we have chosen a matrix representation, how do we
224 access the data in it?}
225 \baselineskip14pt
226 \baselineskip12pt
227 \subsection{Slow and Fast Data Access Paths}
228 \baselineskip14pt
229 \baselineskip12pt
230 \subsection{Flexibility}
231 \baselineskip14pt
232 \baselineskip12pt
233 \subsubsection{Resizing Matrices}
234 \baselineskip14pt
235 \baselineskip12pt
236 \subsubsection{Matrix Index "Recycling"}
237 \baselineskip14pt
238 \baselineskip12pt
239 \subsection{Macros}
240 \baselineskip14pt
241 \baselineskip12pt
242 \subsection{Compiler Macros}
243 \baselineskip14pt
244 \baselineskip12pt
245 \subsection{SBCL-specific Compiler Features}
246 \baselineskip14pt
247 \baselineskip12pt
248 \subsubsection{defknown/deftransform/defoptimizer}
249 \baselineskip14pt
250 \clearpage
251 \section{Benchmarks}
252 We'll start with some simple benchmarks of matrix operations
253 and examine the effect of the various implementation strategies for
254 representing matrices on these operations.
256 \baselineskip12pt
257 \subsection{2-Dimensional Lisp Arrays}
258 \baselineskip14pt
260 \baselineskip12pt
261 \begin{verbatim}(defparameter b1
262 (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
263 :adjustable nil :fill-pointer nil))
266 (defparameter b2
267 (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
268 :adjustable nil :fill-pointer nil))
271 (defparameter b3
272 (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
273 :adjustable nil :fill-pointer nil))
276 \end{verbatim}
277 \baselineskip14pt
278 Now, our function to add the two arrays:
281 \baselineskip12pt
282 \begin{verbatim}(defun bench/add-matrix/aref (a1 a2 a3)
283 (destructuring-bind
284 (rows cols)
285 (array-dimensions a1)
286 (dotimes (i rows)
287 (dotimes (j cols) (setf (aref a3 i j) (+ (aref a1 i j) (aref a2 i j)))))))
288 BENCH/ADD-MATRIX/AREF
290 \end{verbatim}
291 \baselineskip14pt
292 Now, we time how long it takes to run bench/addmatrix/aref:
295 \baselineskip12pt
296 \begin{verbatim}(ch-util:time-to-string (bench/add-matrix/aref b1 b2 b3))
297 "Evaluation took:
298 0.215 seconds of real time
299 0.1958 seconds of user run time
300 0.018307 seconds of system run time
301 [Run times include 0.083 seconds GC run time.]
302 0 calls to %EVAL
303 0 page faults and
304 50,331,032 bytes consed.
307 \end{verbatim}
308 \baselineskip14pt
309 \baselineskip12pt
310 \subsection{1-Dimensional Lisp Arrays}
311 \baselineskip14pt
312 \baselineskip12pt
313 \subsection{A CLOS object holding a reference to a 2-Dimensional Lisp
314 Array}
315 \baselineskip14pt
316 \baselineskip12pt
317 \subsection{A CLOS object holding a reference to a 1-Dimensional Lisp
318 Array}
319 \baselineskip14pt
320 \end{document}