4 #.(smarkup::enable-quote-reader-macro)
5 (smarkup::setup-headings)
9 "Copyright 2006, Cyrus Harmon. All Rights Reserved.")
11 "CLEM Matrix Performance")
12 (:author "Cyrus L. Harmon")
14 "(\"asdf:/ch-bib/lisp\" \"asdf:/ch-bib/bio\" \"asdf:/ch-bib/stat\" \"asdf:/ch-bib/vision\")")
15 (:bibtex-style "Science"))
16 (:html-metadata (:htmlcss "simple.css")))
18 (:h1 "CLEM Performance")
22 (:p #q{Common Lisp is a high-level language that is a modern-day
23 member of the LISP family of languages. Common Lisp can be
24 either interpreted or compiled, or both, depending on the
25 implementation and modern compilers offer the promise of
26 performance on the order of that achieved by C and fortran
27 compilers. CLEM is a matrix math package for Common Lisp that
28 strives to offer high-performance matrix math routines, written
29 in Common Lisp. Common Lisp has a sophisticated type system and
30 it is a goal of CLEM to offer matrix representation and
31 opreations that exploit the features of this type
32 system. Furthermore, Common Lisp has a sophisticated object
33 model, the Common Lisp Object System (CLOS), and CLEM uses
34 features of CLOS and its companion metaobject system, the
35 Meta-object Protocol (MOP) to define its classes, objects and
38 (:p #q{Common Lisp implementations vary greatly in their
39 peformance, but the Common Lisp standard provides an
40 infrastructure for high-performance-capable implementations
41 to generate efficient code through the use of compiler
42 declarations of types and optimization settings. Lisp
43 implementations are free to compile lisp code to native
44 machine code, rather than either interpreting the lisp code
45 on the fly, or compiling the code to a byte-code
46 representation, that is then executed by a virtual
47 machine. This suggests that, subject to the limits placed on
48 the output by the common lisp language and to the
49 implementation details of the particular lisp system, a lisp
50 environment should be able to produce code that approaches
51 the speed and efficiency of other compiled languages. SBCL
52 is a Common Lisp implementation that compiles to native code
53 and has a sophisticated compiler called, somewhat
54 confusingly, Python, although the Python compiler predates
55 the Python interpreted language by many years. SBCL's compiler
56 uses type and optimization declarations to generate efficient
57 code by producing optimized routines that are specific to the
58 declared types, often representing data in an "unboxed" format,
59 ideally one that matches the type representation handled directly
60 by the CPU. One of CLEM's main goals is to provide efficient
61 matrix operations that utilize the capabilities of the lisp
62 compiler to generate efficient code.})
64 (:h3 #q{Boxed and Unboxed Representation of Lisp Data Objects})
66 (:p #q{Boxed representations of lisp values are generally stored
67 as tagged data objects where the tags serve to identify the type
68 of the particular object. Unboxed representations, on the other
69 hand, are merely represented by the data values themselves,
70 without any identifying metadata being stored with the
71 value. Obviously, the language environment needs to know, or to
72 be able to determine, the type of a particular data value. For
73 boxed values, the language environment can rely on the fact that
74 the type information is stored with the data value directly. For
75 unboxed values, the language system must keep track of the value
76 stored in a particular directly. The main advantage of using
77 unboxed datatypes is that the language environment, or compiled
78 code it produces, does not have to bother extracting the type
79 information from the boxed value and extracting the data as
80 appropriate. However, this has the disadvantage that the type of
81 data is then fixed to be that represented by the unboxed
82 type. Often, hybrid representations are used in structures such
83 as an array, where the array itself will be a dynamically typed,
84 boxed object while the values of the array will be unboxed
85 values placed in an a particular region of memory. By using
86 unboxed values for the elements of the array, the code can
87 directly access these values without the overhead of "unboxing"
88 the data. Boxed memory access often allocates memory,
89 or "conses" in lisp parlance, as a storage area is needed to
90 hold the unboxed value. In summary, sing boxed datatypes introduces
91 (at least) two important areas where work has to be done by the
92 language environment to access the data, the allocation of temporary
93 to store the resulting values that will be unboxed from the data, and
94 additional work on the part of the code to unbox the data in the
97 (:p #q{For these reasons, it is higly desirable for CLEM to use
98 unboxed data where possibly. In the inner loop of a matrix
99 operation, for instance, accessing a boxed data type can
100 introduce substantial performance penalties. Therefore, CLEM has gone
101 to great lengths to provide matrix representations that yield unboxed
102 values and operations that can operate on these values with
103 allocation of a minimum amount of memory.})
105 (:h3 #q{Avoiding Unneccessary Memory Allocation})
107 (:h1 #q{CLEM Design Goals})
109 (:h2 #q{CLEM Implementation Choices})
111 (:h1 #q{Matrix Data Representation})
113 (:p #q{What options do we have for storing matrix data? Main choices
114 are lisp arrays or an external block of memory. Are there other
117 (:h2 #q{Reification of Matrix Data Objects})
119 (:p #q{Defering, for a moment, the question of what form the
120 actual matrix data will take, let us consider the form of
121 the matrix object itself. It could be the object that
122 represents the data directly, or it could be an object, such
123 as an instance of a class or struct, that contains a
124 reference to the object that holds the data. In a sense, the
125 simplest approach to providing matrix arithmetic operations
126 is just to use common lisp arrays both to hold the data and
127 to be the direct representation of the matrix object. The
128 CLEM design assumes that there is additional data, besides
129 the data values stored in the array, or what have you, that
130 will be needed and that just using a lisp array as the
131 matrix itself is insufficient.})
133 (:h2 #q{Common Lisp Arrays})
135 (:p #q{Lisp arrays have the advantage that they are likely to
136 take advantage of the lisp type system. Yes, an
137 implementation may choose to ignroe this information, or
138 continue to produce the same code as it would for untyped
139 arrays, but a sufficently smart compiler, such as Python,
140 should use the array type information to produce efficient
141 code. Of course this also means that in order to get this
142 efficiency we have to provide this type information to the
143 compiler. As we try to modularize various pieces, it is
144 often the case that one would like to have generic code that
145 can work on matrices of any type. It these cases, additional
146 measures may be needed to coax the compiler into generating
147 efficient code, while doing so in a generic manner.})
149 (:h3 #q{One-dimensional or Multi-dimensional Arrays?})
151 (:p #q{One issue in dealing with lisp arrays is whether to use
152 the lisp facilitty for multi-dimensional array. One argument
153 in favor of native multi-dimensional arrays is that the
154 compiler can generate efficent code to access data in
155 certain multi-dimensional arrays, provided that this
156 information is known and passed to the compiler at
157 compile-time. On the other hand, using one-dimensional
158 arrays puts both the burden of and the flexibility of
159 computing array indices on the matrix package.})
161 (:h3 #q{Extensible arrays?})
163 (#p #q{Christophe Rhodes has recently introduced a protocol for
164 extensible sequences to SBCL. Might a protocol for a similar
165 extensible array be useful here?})
167 (:h3 #q{What About Lists?})
169 (:p #q{Lists are convenient for representing matrices in that
170 the iteration functions can be used to traverse the elements of
171 the matrix, yielding the famous trivial transpose operation
172 using mapcar and list. However, lists aren't designed for
173 efficient random access and are a poor choice for representing
174 anything but trivially small matrices.})
176 (:h #q{Other Blocks of Memory})
178 (:p #q{It is worth mentioning the possibility of other approaches,
179 such as an external block of memory, perhaps allocated with either
180 non-standard routines of the lisp system, or via a foreign-function
181 interface, and to determine the offsets into this block of memory,
182 coerce the contents to a given lisp type and obtain the results. This
183 approach is used by some libraries that use clem, such as ch-image,
184 to access matrix/array data stored in memory in a non-native-lisp
185 form, such as matlisp matrices (which are really BLAS/LAPACK
186 matrices), fftw matrices, and arrays of data from TIFF images. While
187 this is nice to be able to do, it is unlikely to be practical for
188 storing matrix data, given the alternative of using lisp
189 arrays. Coercing of the contents of the matrix to lisp types is
190 done for us by optimized code in the compiler. It is unlikely that we
191 would be able to do a better job of this than the compiler. This
192 approach is useful for conversion of matrix data to other in-memory
193 formats, but unlikely to be useful for the typed lisp matrices for
194 which CLEM is designed. If we were to go this route, it would make
195 sense to use other, optimized, code libraries for operating on this
196 data, and this is what matlisp does, handing these blocks of memory
197 off to BLAS/LAPACK for processing in highly-optimized routines
198 written in C, Fortran or Assembly Language. })
200 (:h1 #q{Matrix Data Access})
202 (:h2 #q{Now that we have chosen a matrix representation, how do we
203 access the data in it?})
205 (:h2 #q{Slow and Fast Data Access Paths})
207 (:h2 #q{Flexibility})
209 (:h3 #q{Resizing Matrices})
211 (:h3 #q{Matrix Index "Recycling"})
215 (:h2 #q{Compiler Macros})
217 (:h2 #q{SBCL-specific Compiler Features})
219 (:h3 #q{defknown/deftransform/defoptimizer})
223 (:p #q{We'll start with some simple benchmarks of matrix operations
224 and examine the effect of the various implementation strategies for
225 representing matrices on these operations.})
227 (:h2 #q{2-Dimensional Lisp Arrays})
231 (defparameter b1 (make-array '(1024 1024)
232 :element-type 'double-float
233 :initial-element 1.0d0
237 (defparameter b2 (make-array '(1024 1024)
238 :element-type 'double-float
239 :initial-element 1.0d0
243 (defparameter b3 (make-array '(1024 1024)
244 :element-type 'double-float
245 :initial-element 1.0d0
250 (:p #q{Now, our function to add the two arrays:})
254 (defun bench/add-matrix/aref (a1 a2 a3)
255 (destructuring-bind (rows cols)
256 (array-dimensions a1)
259 (setf (aref a3 i j) (+ (aref a1 i j) (aref a2 i j)))))))
264 (:p #q{Now, we time how long it takes to run bench/addmatrix/aref:})
268 #q{(ch-util:time-to-string (bench/add-matrix/aref b1 b2 b3))})
270 (:h2 #q{1-Dimensional Lisp Arrays})
272 (:h2 #q{A CLOS object holding a reference to a 2-Dimensional Lisp
275 (:h2 #q{A CLOS object holding a reference to a 1-Dimensional Lisp
279 #+nil(:BIBLIOGRAPHY))