cleared up and add questions about intent of work.
[CommonLispStat.git] / external / clem / doc / clem-performance.xhtml
blob4fd6dd0afe0882b25667be5741d33e1a599b96a6
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html
3 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>CLEM Matrix Performance</title>
5 <link rel="stylesheet" type="text/css" href="simple.css"/></head>
6 <body><p></p>
7 <h1>CLEM Performance</h1>
8 <h2>Introduction</h2>
9 <p>Common Lisp is a high-level language that is a modern-day
10 member of the LISP family of languages. Common Lisp can be
11 either interpreted or compiled, or both, depending on the
12 implementation and modern compilers offer the promise of
13 performance on the order of that achieved by C and fortran
14 compilers. CLEM is a matrix math package for Common Lisp that
15 strives to offer high-performance matrix math routines, written
16 in Common Lisp. Common Lisp has a sophisticated type system and
17 it is a goal of CLEM to offer matrix representation and
18 opreations that exploit the features of this type
19 system. Furthermore, Common Lisp has a sophisticated object
20 model, the Common Lisp Object System (CLOS), and CLEM uses
21 features of CLOS and its companion metaobject system, the
22 Meta-object Protocol (MOP) to define its classes, objects and
23 methods.</p>
24 <p>Common Lisp implementations vary greatly in their
25 peformance, but the Common Lisp standard provides an
26 infrastructure for high-performance-capable implementations
27 to generate efficient code through the use of compiler
28 declarations of types and optimization settings. Lisp
29 implementations are free to compile lisp code to native
30 machine code, rather than either interpreting the lisp code
31 on the fly, or compiling the code to a byte-code
32 representation, that is then executed by a virtual
33 machine. This suggests that, subject to the limits placed on
34 the output by the common lisp language and to the
35 implementation details of the particular lisp system, a lisp
36 environment should be able to produce code that approaches
37 the speed and efficiency of other compiled languages. SBCL
38 is a Common Lisp implementation that compiles to native code
39 and has a sophisticated compiler called, somewhat
40 confusingly, Python, although the Python compiler predates
41 the Python interpreted language by many years. SBCL's compiler
42 uses type and optimization declarations to generate efficient
43 code by producing optimized routines that are specific to the
44 declared types, often representing data in an "unboxed" format,
45 ideally one that matches the type representation handled directly
46 by the CPU. One of CLEM's main goals is to provide efficient
47 matrix operations that utilize the capabilities of the lisp
48 compiler to generate efficient code.</p>
49 <h3>Boxed and Unboxed Representation of Lisp Data Objects</h3>
50 <p>Boxed representations of lisp values are generally stored
51 as tagged data objects where the tags serve to identify the type
52 of the particular object. Unboxed representations, on the other
53 hand, are merely represented by the data values themselves,
54 without any identifying metadata being stored with the
55 value. Obviously, the language environment needs to know, or to
56 be able to determine, the type of a particular data value. For
57 boxed values, the language environment can rely on the fact that
58 the type information is stored with the data value directly. For
59 unboxed values, the language system must keep track of the value
60 stored in a particular directly. The main advantage of using
61 unboxed datatypes is that the language environment, or compiled
62 code it produces, does not have to bother extracting the type
63 information from the boxed value and extracting the data as
64 appropriate. However, this has the disadvantage that the type of
65 data is then fixed to be that represented by the unboxed
66 type. Often, hybrid representations are used in structures such
67 as an array, where the array itself will be a dynamically typed,
68 boxed object while the values of the array will be unboxed
69 values placed in an a particular region of memory. By using
70 unboxed values for the elements of the array, the code can
71 directly access these values without the overhead of "unboxing"
72 the data. Boxed memory access often allocates memory,
73 or "conses" in lisp parlance, as a storage area is needed to
74 hold the unboxed value. In summary, sing boxed datatypes introduces
75 (at least) two important areas where work has to be done by the
76 language environment to access the data, the allocation of temporary
77 to store the resulting values that will be unboxed from the data, and
78 additional work on the part of the code to unbox the data in the
79 first place.</p>
80 <p>For these reasons, it is higly desirable for CLEM to use
81 unboxed data where possibly. In the inner loop of a matrix
82 operation, for instance, accessing a boxed data type can
83 introduce substantial performance penalties. Therefore, CLEM has gone
84 to great lengths to provide matrix representations that yield unboxed
85 values and operations that can operate on these values with
86 allocation of a minimum amount of memory.</p>
87 <h3>Avoiding Unneccessary Memory Allocation</h3>
88 <h1>CLEM Design Goals</h1>
89 <h2>CLEM Implementation Choices</h2>
90 <h1>Matrix Data Representation</h1>
91 <p>What options do we have for storing matrix data? Main choices
92 are lisp arrays or an external block of memory. Are there other
93 options here?</p>
94 <h2>Reification of Matrix Data Objects</h2>
95 <p>Defering, for a moment, the question of what form the
96 actual matrix data will take, let us consider the form of
97 the matrix object itself. It could be the object that
98 represents the data directly, or it could be an object, such
99 as an instance of a class or struct, that contains a
100 reference to the object that holds the data. In a sense, the
101 simplest approach to providing matrix arithmetic operations
102 is just to use common lisp arrays both to hold the data and
103 to be the direct representation of the matrix object. The
104 CLEM design assumes that there is additional data, besides
105 the data values stored in the array, or what have you, that
106 will be needed and that just using a lisp array as the
107 matrix itself is insufficient.</p>
108 <h2>Common Lisp Arrays</h2>
109 <p>Lisp arrays have the advantage that they are likely to
110 take advantage of the lisp type system. Yes, an
111 implementation may choose to ignroe this information, or
112 continue to produce the same code as it would for untyped
113 arrays, but a sufficently smart compiler, such as Python,
114 should use the array type information to produce efficient
115 code. Of course this also means that in order to get this
116 efficiency we have to provide this type information to the
117 compiler. As we try to modularize various pieces, it is
118 often the case that one would like to have generic code that
119 can work on matrices of any type. It these cases, additional
120 measures may be needed to coax the compiler into generating
121 efficient code, while doing so in a generic manner.</p>
122 <h3>One-dimensional or Multi-dimensional Arrays?</h3>
123 <p>One issue in dealing with lisp arrays is whether to use
124 the lisp facilitty for multi-dimensional array. One argument
125 in favor of native multi-dimensional arrays is that the
126 compiler can generate efficent code to access data in
127 certain multi-dimensional arrays, provided that this
128 information is known and passed to the compiler at
129 compile-time. On the other hand, using one-dimensional
130 arrays puts both the burden of and the flexibility of
131 computing array indices on the matrix package.</p>
132 <h3>Extensible arrays?</h3>
133 Christophe Rhodes has recently introduced a protocol for
134 extensible sequences to SBCL. Might a protocol for a similar
135 extensible array be useful here?<h3>What About Lists?</h3>
136 <p>Lists are convenient for representing matrices in that
137 the iteration functions can be used to traverse the elements of
138 the matrix, yielding the famous trivial transpose operation
139 using mapcar and list. However, lists aren't designed for
140 efficient random access and are a poor choice for representing
141 anything but trivially small matrices.</p>
142 <h>Other Blocks of Memory</h>
143 <p>It is worth mentioning the possibility of other approaches,
144 such as an external block of memory, perhaps allocated with either
145 non-standard routines of the lisp system, or via a foreign-function
146 interface, and to determine the offsets into this block of memory,
147 coerce the contents to a given lisp type and obtain the results. This
148 approach is used by some libraries that use clem, such as ch-image,
149 to access matrix/array data stored in memory in a non-native-lisp
150 form, such as matlisp matrices (which are really BLAS/LAPACK
151 matrices), fftw matrices, and arrays of data from TIFF images. While
152 this is nice to be able to do, it is unlikely to be practical for
153 storing matrix data, given the alternative of using lisp
154 arrays. Coercing of the contents of the matrix to lisp types is
155 done for us by optimized code in the compiler. It is unlikely that we
156 would be able to do a better job of this than the compiler. This
157 approach is useful for conversion of matrix data to other in-memory
158 formats, but unlikely to be useful for the typed lisp matrices for
159 which CLEM is designed. If we were to go this route, it would make
160 sense to use other, optimized, code libraries for operating on this
161 data, and this is what matlisp does, handing these blocks of memory
162 off to BLAS/LAPACK for processing in highly-optimized routines
163 written in C, Fortran or Assembly Language. </p>
164 <h1>Matrix Data Access</h1>
165 <h2>Now that we have chosen a matrix representation, how do we
166 access the data in it?</h2>
167 <h2>Slow and Fast Data Access Paths</h2>
168 <h2>Flexibility</h2>
169 <h3>Resizing Matrices</h3>
170 <h3>Matrix Index "Recycling"</h3>
171 <h2>Macros</h2>
172 <h2>Compiler Macros</h2>
173 <h2>SBCL-specific Compiler Features</h2>
174 <h3>defknown/deftransform/defoptimizer</h3>
175 <h1>Benchmarks</h1>
176 <p>We'll start with some simple benchmarks of matrix operations
177 and examine the effect of the various implementation strategies for
178 representing matrices on these operations.</p>
179 <h2>2-Dimensional Lisp Arrays</h2>
180 <div class="lisp"><pre><code>(defparameter b1
181 (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
182 :adjustable nil :fill-pointer nil))
183 </code>
184 <code class="results">B1
185 </code>
187 <code>(defparameter b2
188 (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
189 :adjustable nil :fill-pointer nil))
190 </code>
191 <code class="results">B2
192 </code>
194 <code>(defparameter b3
195 (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
196 :adjustable nil :fill-pointer nil))
197 </code>
198 <code class="results">B3
199 </code>
201 </pre>
202 </div>
203 <p>Now, our function to add the two arrays:</p>
204 <div class="lisp"><pre><code>(defun bench/add-matrix/aref (a1 a2 a3)
205 (destructuring-bind
206 (rows cols)
207 (array-dimensions a1)
208 (dotimes (i rows)
209 (dotimes (j cols) (setf (aref a3 i j) (+ (aref a1 i j) (aref a2 i j)))))))
210 </code>
211 <code class="results">BENCH/ADD-MATRIX/AREF
212 </code>
214 </pre>
215 </div>
216 <p>Now, we time how long it takes to run bench/addmatrix/aref:</p>
217 <div class="lisp"><pre><code>(ch-util:time-to-string (bench/add-matrix/aref b1 b2 b3))
218 </code>
219 <code class="results">"Evaluation took:
220 0.215 seconds of real time
221 0.1958 seconds of user run time
222 0.018307 seconds of system run time
223 [Run times include 0.083 seconds GC run time.]
224 0 calls to %EVAL
225 0 page faults and
226 50,331,032 bytes consed.
228 </code>
230 </pre>
231 </div>
232 <h2>1-Dimensional Lisp Arrays</h2>
233 <h2>A CLOS object holding a reference to a 2-Dimensional Lisp
234 Array</h2>
235 <h2>A CLOS object holding a reference to a 1-Dimensional Lisp
236 Array</h2>
237 </body>
238 </html>