external/clem/doc/clem-performance.xhtml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <!DOCTYPE html
   3      PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   4      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>CLEM Matrix Performance</title>
   5 <link rel="stylesheet" type="text/css" href="simple.css"/></head>
   6 <body><p></p>
   7 <h1>CLEM Performance</h1>
   8 <h2>Introduction</h2>
   9 <p>Common Lisp is a high-level language that is a modern-day
  10      member of the LISP family of languages. Common Lisp can be
  11      either interpreted or compiled, or both, depending on the
  12      implementation and modern compilers offer the promise of
  13      performance on the order of that achieved by C and fortran
  14      compilers. CLEM is a matrix math package for Common Lisp that
  15      strives to offer high-performance matrix math routines, written
  16      in Common Lisp. Common Lisp has a sophisticated type system and
  17      it is a goal of CLEM to offer matrix representation and
  18      opreations that exploit the features of this type
  19      system. Furthermore, Common Lisp has a sophisticated object
  20      model, the Common Lisp Object System (CLOS), and CLEM uses
  21      features of CLOS and its companion metaobject system, the
  22      Meta-object Protocol (MOP) to define its classes, objects and
  23      methods.</p>
  24 <p>Common Lisp implementations vary greatly in their
  25      peformance, but the Common Lisp standard provides an
  26      infrastructure for high-performance-capable implementations
  27      to generate efficient code through the use of compiler
  28      declarations of types and optimization settings. Lisp
  29      implementations are free to compile lisp code to native
  30      machine code, rather than either interpreting the lisp code
  31      on the fly, or compiling the code to a byte-code
  32      representation, that is then executed by a virtual
  33      machine. This suggests that, subject to the limits placed on
  34      the output by the common lisp language and to the
  35      implementation details of the particular lisp system, a lisp
  36      environment should be able to produce code that approaches
  37      the speed and efficiency of other compiled languages. SBCL
  38      is a Common Lisp implementation that compiles to native code
  39      and has a sophisticated compiler called, somewhat
  40      confusingly, Python, although the Python compiler predates
  41      the Python interpreted language by many years. SBCL's compiler
  42      uses type and optimization declarations to generate efficient
  43      code by producing optimized routines that are specific to the
  44      declared types, often representing data in an "unboxed" format,
  45      ideally one that matches the type representation handled directly
  46      by the CPU. One of CLEM's main goals is to provide efficient
  47      matrix operations that utilize the capabilities of the lisp
  48      compiler to generate efficient code.</p>
  49 <h3>Boxed and Unboxed Representation of Lisp Data Objects</h3>
  50 <p>Boxed representations of lisp values are generally stored
  51  as tagged data objects where the tags serve to identify the type
  52  of the particular object. Unboxed representations, on the other
  53  hand, are merely represented by the data values themselves,
  54  without any identifying metadata being stored with the
  55  value. Obviously, the language environment needs to know, or to
  56  be able to determine, the type of a particular data value. For
  57  boxed values, the language environment can rely on the fact that
  58  the type information is stored with the data value directly. For
  59  unboxed values, the language system must keep track of the value
  60  stored in a particular directly. The main advantage of using
  61  unboxed datatypes is that the language environment, or compiled
  62  code it produces, does not have to bother extracting the type
  63  information from the boxed value and extracting the data as
  64  appropriate. However, this has the disadvantage that the type of
  65  data is then fixed to be that represented by the unboxed
  66  type. Often, hybrid representations are used in structures such
  67  as an array, where the array itself will be a dynamically typed,
  68  boxed object while the values of the array will be unboxed
  69  values placed in an a particular region of memory. By using
  70  unboxed values for the elements of the array, the code can
  71  directly access these values without the overhead of "unboxing"
  72  the data. Boxed memory access often allocates memory,
  73  or "conses" in lisp parlance, as a storage area is needed to
  74  hold the unboxed value. In summary, sing boxed datatypes introduces
  75  (at least) two important areas where work has to be done by the
  76  language environment to access the data, the allocation of temporary
  77  to store the resulting values that will be unboxed from the data, and
  78  additional work on the part of the code to unbox the data in the
  79  first place.</p>
  80 <p>For these reasons, it is higly desirable for CLEM to use
  81  unboxed data where possibly. In the inner loop of a matrix
  82  operation, for instance, accessing a boxed data type can
  83  introduce substantial performance penalties. Therefore, CLEM has gone
  84  to great lengths to provide matrix representations that yield unboxed
  85  values and operations that can operate on these values with
  86  allocation of a minimum amount of memory.</p>
  87 <h3>Avoiding Unneccessary Memory Allocation</h3>
  88 <h1>CLEM Design Goals</h1>
  89 <h2>CLEM Implementation Choices</h2>
  90 <h1>Matrix Data Representation</h1>
  91 <p>What options do we have for storing matrix data? Main choices
  92      are lisp arrays or an external block of memory. Are there other
  93      options here?</p>
  94 <h2>Reification of Matrix Data Objects</h2>
  95 <p>Defering, for a moment, the question of what form the
  96      actual matrix data will take, let us consider the form of
  97      the matrix object itself. It could be the object that
  98      represents the data directly, or it could be an object, such
  99      as an instance of a class or struct, that contains a
 100      reference to the object that holds the data. In a sense, the
 101      simplest approach to providing matrix arithmetic operations
 102      is just to use common lisp arrays both to hold the data and
 103      to be the direct representation of the matrix object. The
 104      CLEM design assumes that there is additional data, besides
 105      the data values stored in the array, or what have you, that
 106      will be needed and that just using a lisp array as the
 107      matrix itself is insufficient.</p>
 108 <h2>Common Lisp Arrays</h2>
 109 <p>Lisp arrays have the advantage that they are likely to
 110      take advantage of the lisp type system. Yes, an
 111      implementation may choose to ignroe this information, or
 112      continue to produce the same code as it would for untyped
 113      arrays, but a sufficently smart compiler, such as Python,
 114      should use the array type information to produce efficient
 115      code. Of course this also means that in order to get this
 116      efficiency we have to provide this type information to the
 117      compiler. As we try to modularize various pieces, it is
 118      often the case that one would like to have generic code that
 119      can work on matrices of any type. It these cases, additional
 120      measures may be needed to coax the compiler into generating
 121      efficient code, while doing so in a generic manner.</p>
 122 <h3>One-dimensional or Multi-dimensional Arrays?</h3>
 123 <p>One issue in dealing with lisp arrays is whether to use
 124      the lisp facilitty for multi-dimensional array. One argument
 125      in favor of native multi-dimensional arrays is that the
 126      compiler can generate efficent code to access data in
 127      certain multi-dimensional arrays, provided that this
 128      information is known and passed to the compiler at
 129      compile-time. On the other hand, using one-dimensional
 130      arrays puts both the burden of and the flexibility of
 131      computing array indices on the matrix package.</p>
 132 <h3>Extensible arrays?</h3>
 133 Christophe Rhodes has recently introduced a protocol for
 134  extensible sequences to SBCL. Might a protocol for a similar
 135  extensible array be useful here?<h3>What About Lists?</h3>
 136 <p>Lists are convenient for representing matrices in that
 137  the iteration functions can be used to traverse the elements of
 138  the matrix, yielding the famous trivial transpose operation
 139  using mapcar and list. However, lists aren't designed for
 140  efficient random access and are a poor choice for representing
 141  anything but trivially small matrices.</p>
 142 <h>Other Blocks of Memory</h>
 143 <p>It is worth mentioning the possibility of other approaches,
 144  such as an external block of memory, perhaps allocated with either
 145  non-standard routines of the lisp system, or via a foreign-function
 146  interface, and to determine the offsets into this block of memory,
 147  coerce the contents to a given lisp type and obtain the results. This
 148  approach is used by some libraries that use clem, such as ch-image,
 149  to access matrix/array data stored in memory in a non-native-lisp
 150  form, such as matlisp matrices (which are really BLAS/LAPACK
 151  matrices), fftw matrices, and arrays of data from TIFF images. While
 152  this is nice to be able to do, it is unlikely to be practical for
 153  storing matrix data, given the alternative of using lisp
 154  arrays. Coercing of the contents of the matrix to lisp types is
 155  done for us by optimized code in the compiler. It is unlikely that we
 156  would be able to do a better job of this than the compiler. This
 157  approach is useful for conversion of matrix data to other in-memory
 158  formats, but unlikely to be useful for the typed lisp matrices for
 159  which CLEM is designed. If we were to go this route, it would make
 160  sense to use other, optimized, code libraries for operating on this
 161  data, and this is what matlisp does, handing these blocks of memory
 162  off to BLAS/LAPACK for processing in highly-optimized routines
 163  written in C, Fortran or Assembly Language. </p>
 164 <h1>Matrix Data Access</h1>
 165 <h2>Now that we have chosen a matrix representation, how do we
 166       access the data in it?</h2>
 167 <h2>Slow and Fast Data Access Paths</h2>
 168 <h2>Flexibility</h2>
 169 <h3>Resizing Matrices</h3>
 170 <h3>Matrix Index "Recycling"</h3>
 171 <h2>Macros</h2>
 172 <h2>Compiler Macros</h2>
 173 <h2>SBCL-specific Compiler Features</h2>
 174 <h3>defknown/deftransform/defoptimizer</h3>
 175 <h1>Benchmarks</h1>
 176 <p>We'll start with some simple benchmarks of matrix operations
 177  and examine the effect of the various implementation strategies for
 178  representing matrices on these operations.</p>
 179 <h2>2-Dimensional Lisp Arrays</h2>
 180 <div class="lisp"><pre><code>(defparameter b1
 181   (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
 182               :adjustable nil :fill-pointer nil))
 183 </code>
 184 <code class="results">B1
 185 </code>
 186
 187 <code>(defparameter b2
 188   (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
 189               :adjustable nil :fill-pointer nil))
 190 </code>
 191 <code class="results">B2
 192 </code>
 193
 194 <code>(defparameter b3
 195   (make-array '(1024 1024) :element-type 'double-float :initial-element 1.0d0
 196               :adjustable nil :fill-pointer nil))
 197 </code>
 198 <code class="results">B3
 199 </code>
 200
 201 </pre>
 202 </div>
 203 <p>Now, our function to add the two arrays:</p>
 204 <div class="lisp"><pre><code>(defun bench/add-matrix/aref (a1 a2 a3)
 205   (destructuring-bind
 206       (rows cols)
 207       (array-dimensions a1)
 208     (dotimes (i rows)
 209       (dotimes (j cols) (setf (aref a3 i j) (+ (aref a1 i j) (aref a2 i j)))))))
 210 </code>
 211 <code class="results">BENCH/ADD-MATRIX/AREF
 212 </code>
 213
 214 </pre>
 215 </div>
 216 <p>Now, we time how long it takes to run bench/addmatrix/aref:</p>
 217 <div class="lisp"><pre><code>(ch-util:time-to-string (bench/add-matrix/aref b1 b2 b3))
 218 </code>
 219 <code class="results">"Evaluation took:
 220   0.215 seconds of real time
 221   0.1958 seconds of user run time
 222   0.018307 seconds of system run time
 223   [Run times include 0.083 seconds GC run time.]
 224   0 calls to %EVAL
 225   0 page faults and
 226   50,331,032 bytes consed.
 227 "
 228 </code>
 229
 230 </pre>
 231 </div>
 232 <h2>1-Dimensional Lisp Arrays</h2>
 233 <h2>A CLOS object holding a reference to a 2-Dimensional Lisp
 234  Array</h2>
 235 <h2>A CLOS object holding a reference to a 1-Dimensional Lisp
 236  Array</h2>
 237 </body>
 238 </html>