external/clem/doc/clem-performance.sexp

   1 ((:P
   2   #.(progn
   3       (in-package :clem)
   4       #.(smarkup::enable-quote-reader-macro)
   5       (smarkup::setup-headings)
   6       nil)
   7   (:smarkup-metadata
   8    (:copyright
   9     "Copyright 2006, Cyrus Harmon. All Rights Reserved.")
  10    (:title
  11     "CLEM Matrix Performance")
  12    (:author "Cyrus L. Harmon")
  13    (:bibtex-database
  14     "(\"asdf:/ch-bib/lisp\" \"asdf:/ch-bib/bio\" \"asdf:/ch-bib/stat\" \"asdf:/ch-bib/vision\")")
  15    (:bibtex-style "Science"))
  16   (:html-metadata (:htmlcss "simple.css")))
  17
  18  (:h1 "CLEM Performance")
  19
  20  (:h2 "Introduction")
  21
  22  (:p #q{Common Lisp is a high-level language that is a modern-day
  23      member of the LISP family of languages. Common Lisp can be
  24      either interpreted or compiled, or both, depending on the
  25      implementation and modern compilers offer the promise of
  26      performance on the order of that achieved by C and fortran
  27      compilers. CLEM is a matrix math package for Common Lisp that
  28      strives to offer high-performance matrix math routines, written
  29      in Common Lisp. Common Lisp has a sophisticated type system and
  30      it is a goal of CLEM to offer matrix representation and
  31      opreations that exploit the features of this type
  32      system. Furthermore, Common Lisp has a sophisticated object
  33      model, the Common Lisp Object System (CLOS), and CLEM uses
  34      features of CLOS and its companion metaobject system, the
  35      Meta-object Protocol (MOP) to define its classes, objects and
  36      methods.})
  37
  38  (:p #q{Common Lisp implementations vary greatly in their
  39      peformance, but the Common Lisp standard provides an
  40      infrastructure for high-performance-capable implementations
  41      to generate efficient code through the use of compiler
  42      declarations of types and optimization settings. Lisp
  43      implementations are free to compile lisp code to native
  44      machine code, rather than either interpreting the lisp code
  45      on the fly, or compiling the code to a byte-code
  46      representation, that is then executed by a virtual
  47      machine. This suggests that, subject to the limits placed on
  48      the output by the common lisp language and to the
  49      implementation details of the particular lisp system, a lisp
  50      environment should be able to produce code that approaches
  51      the speed and efficiency of other compiled languages. SBCL
  52      is a Common Lisp implementation that compiles to native code
  53      and has a sophisticated compiler called, somewhat
  54      confusingly, Python, although the Python compiler predates
  55      the Python interpreted language by many years. SBCL's compiler
  56      uses type and optimization declarations to generate efficient
  57      code by producing optimized routines that are specific to the
  58      declared types, often representing data in an "unboxed" format,
  59      ideally one that matches the type representation handled directly
  60      by the CPU. One of CLEM's main goals is to provide efficient
  61      matrix operations that utilize the capabilities of the lisp
  62      compiler to generate efficient code.})
  63
  64  (:h3 #q{Boxed and Unboxed Representation of Lisp Data Objects})
  65
  66  (:p #q{Boxed representations of lisp values are generally stored
  67  as tagged data objects where the tags serve to identify the type
  68  of the particular object. Unboxed representations, on the other
  69  hand, are merely represented by the data values themselves,
  70  without any identifying metadata being stored with the
  71  value. Obviously, the language environment needs to know, or to
  72  be able to determine, the type of a particular data value. For
  73  boxed values, the language environment can rely on the fact that
  74  the type information is stored with the data value directly. For
  75  unboxed values, the language system must keep track of the value
  76  stored in a particular directly. The main advantage of using
  77  unboxed datatypes is that the language environment, or compiled
  78  code it produces, does not have to bother extracting the type
  79  information from the boxed value and extracting the data as
  80  appropriate. However, this has the disadvantage that the type of
  81  data is then fixed to be that represented by the unboxed
  82  type. Often, hybrid representations are used in structures such
  83  as an array, where the array itself will be a dynamically typed,
  84  boxed object while the values of the array will be unboxed
  85  values placed in an a particular region of memory. By using
  86  unboxed values for the elements of the array, the code can
  87  directly access these values without the overhead of "unboxing"
  88  the data. Boxed memory access often allocates memory,
  89  or "conses" in lisp parlance, as a storage area is needed to
  90  hold the unboxed value. In summary, sing boxed datatypes introduces
  91  (at least) two important areas where work has to be done by the
  92  language environment to access the data, the allocation of temporary
  93  to store the resulting values that will be unboxed from the data, and
  94  additional work on the part of the code to unbox the data in the
  95  first place.})
  96
  97  (:p #q{For these reasons, it is higly desirable for CLEM to use
  98  unboxed data where possibly. In the inner loop of a matrix
  99  operation, for instance, accessing a boxed data type can
 100  introduce substantial performance penalties. Therefore, CLEM has gone
 101  to great lengths to provide matrix representations that yield unboxed
 102  values and operations that can operate on these values with
 103  allocation of a minimum amount of memory.})
 104
 105  (:h3 #q{Avoiding Unneccessary Memory Allocation})
 106
 107  (:h1 #q{CLEM Design Goals})
 108
 109  (:h2 #q{CLEM Implementation Choices})
 110
 111  (:h1 #q{Matrix Data Representation})
 112
 113  (:p #q{What options do we have for storing matrix data? Main choices
 114      are lisp arrays or an external block of memory. Are there other
 115      options here?})
 116
 117  (:h2 #q{Reification of Matrix Data Objects})
 118
 119  (:p #q{Defering, for a moment, the question of what form the
 120      actual matrix data will take, let us consider the form of
 121      the matrix object itself. It could be the object that
 122      represents the data directly, or it could be an object, such
 123      as an instance of a class or struct, that contains a
 124      reference to the object that holds the data. In a sense, the
 125      simplest approach to providing matrix arithmetic operations
 126      is just to use common lisp arrays both to hold the data and
 127      to be the direct representation of the matrix object. The
 128      CLEM design assumes that there is additional data, besides
 129      the data values stored in the array, or what have you, that
 130      will be needed and that just using a lisp array as the
 131      matrix itself is insufficient.})
 132
 133  (:h2 #q{Common Lisp Arrays})
 134
 135  (:p #q{Lisp arrays have the advantage that they are likely to
 136      take advantage of the lisp type system. Yes, an
 137      implementation may choose to ignroe this information, or
 138      continue to produce the same code as it would for untyped
 139      arrays, but a sufficently smart compiler, such as Python,
 140      should use the array type information to produce efficient
 141      code. Of course this also means that in order to get this
 142      efficiency we have to provide this type information to the
 143      compiler. As we try to modularize various pieces, it is
 144      often the case that one would like to have generic code that
 145      can work on matrices of any type. It these cases, additional
 146      measures may be needed to coax the compiler into generating
 147      efficient code, while doing so in a generic manner.})
 148
 149  (:h3 #q{One-dimensional or Multi-dimensional Arrays?})
 150
 151  (:p #q{One issue in dealing with lisp arrays is whether to use
 152      the lisp facilitty for multi-dimensional array. One argument
 153      in favor of native multi-dimensional arrays is that the
 154      compiler can generate efficent code to access data in
 155      certain multi-dimensional arrays, provided that this
 156      information is known and passed to the compiler at
 157      compile-time. On the other hand, using one-dimensional
 158      arrays puts both the burden of and the flexibility of
 159      computing array indices on the matrix package.})
 160
 161  (:h3 #q{Extensible arrays?})
 162
 163  (#p #q{Christophe Rhodes has recently introduced a protocol for
 164  extensible sequences to SBCL. Might a protocol for a similar
 165  extensible array be useful here?})
 166
 167  (:h3 #q{What About Lists?})
 168
 169  (:p #q{Lists are convenient for representing matrices in that
 170  the iteration functions can be used to traverse the elements of
 171  the matrix, yielding the famous trivial transpose operation
 172  using mapcar and list. However, lists aren't designed for
 173  efficient random access and are a poor choice for representing
 174  anything but trivially small matrices.})
 175
 176  (:h #q{Other Blocks of Memory})
 177
 178  (:p #q{It is worth mentioning the possibility of other approaches,
 179  such as an external block of memory, perhaps allocated with either
 180  non-standard routines of the lisp system, or via a foreign-function
 181  interface, and to determine the offsets into this block of memory,
 182  coerce the contents to a given lisp type and obtain the results. This
 183  approach is used by some libraries that use clem, such as ch-image,
 184  to access matrix/array data stored in memory in a non-native-lisp
 185  form, such as matlisp matrices (which are really BLAS/LAPACK
 186  matrices), fftw matrices, and arrays of data from TIFF images. While
 187  this is nice to be able to do, it is unlikely to be practical for
 188  storing matrix data, given the alternative of using lisp
 189  arrays. Coercing of the contents of the matrix to lisp types is
 190  done for us by optimized code in the compiler. It is unlikely that we
 191  would be able to do a better job of this than the compiler. This
 192  approach is useful for conversion of matrix data to other in-memory
 193  formats, but unlikely to be useful for the typed lisp matrices for
 194  which CLEM is designed. If we were to go this route, it would make
 195  sense to use other, optimized, code libraries for operating on this
 196  data, and this is what matlisp does, handing these blocks of memory
 197  off to BLAS/LAPACK for processing in highly-optimized routines
 198  written in C, Fortran or Assembly Language. })
 199
 200  (:h1 #q{Matrix Data Access})
 201
 202  (:h2 #q{Now that we have chosen a matrix representation, how do we
 203       access the data in it?})
 204
 205  (:h2 #q{Slow and Fast Data Access Paths})
 206
 207  (:h2 #q{Flexibility})
 208
 209  (:h3 #q{Resizing Matrices})
 210
 211  (:h3 #q{Matrix Index "Recycling"})
 212
 213  (:h2 #q{Macros})
 214
 215  (:h2 #q{Compiler Macros})
 216
 217  (:h2 #q{SBCL-specific Compiler Features})
 218
 219  (:h3 #q{defknown/deftransform/defoptimizer})
 220
 221  (:h1 #q{Benchmarks})
 222
 223  (:p #q{We'll start with some simple benchmarks of matrix operations
 224  and examine the effect of the various implementation strategies for
 225  representing matrices on these operations.})
 226
 227  (:h2 #q{2-Dimensional Lisp Arrays})
 228
 229  (:lisp
 230   #q{
 231   (defparameter b1 (make-array '(1024 1024)
 232                                :element-type 'double-float
 233                                :initial-element 1.0d0
 234                                :adjustable nil
 235                                :fill-pointer nil))
 236
 237   (defparameter b2 (make-array '(1024 1024)
 238                                :element-type 'double-float
 239                                :initial-element 1.0d0
 240                                :adjustable nil
 241                                :fill-pointer nil))
 242
 243   (defparameter b3 (make-array '(1024 1024)
 244                                :element-type 'double-float
 245                                :initial-element 1.0d0
 246                                :adjustable nil
 247                                :fill-pointer nil))
 248   })
 249
 250  (:p #q{Now, our function to add the two arrays:})
 251
 252  (:lisp
 253   #q{
 254   (defun bench/add-matrix/aref (a1 a2 a3)
 255     (destructuring-bind (rows cols)
 256         (array-dimensions a1)
 257       (dotimes (i rows)
 258         (dotimes (j cols)
 259           (setf (aref a3 i j) (+ (aref a1 i j) (aref a2 i j)))))))
 260
 261   })
 262
 263
 264  (:p #q{Now, we time how long it takes to run bench/addmatrix/aref:})
 265
 266
 267  (:lisp
 268   #q{(ch-util:time-to-string (bench/add-matrix/aref b1 b2 b3))})
 269
 270  (:h2 #q{1-Dimensional Lisp Arrays})
 271
 272  (:h2 #q{A CLOS object holding a reference to a 2-Dimensional Lisp
 273  Array})
 274
 275  (:h2 #q{A CLOS object holding a reference to a 1-Dimensional Lisp
 276  Array})
 277
 278
 279  #+nil(:BIBLIOGRAPHY))
 280