README

   1 Requirements:
   2
   3 - gmp (http://gmplib.org/)
   4 - libyaml (http://pyyaml.org/wiki/LibYAML)
   5 - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
   6
   7 Compilation:
   8
   9         git clone git://repo.or.cz/ppcg.git
  10         cd ppcg
  11         git submodule init
  12         git submodule update
  13         ./autogen.sh
  14         ./configure
  15         make
  16         make check
  17
  18 Specifying tile, grid and block sizes
  19
  20 The iterations space tile size, grid size and block size can
  21 be specified using the --sizes option.  The argument is a union map
  22 in isl notation mapping kernels identified by their sequence number
  23 in a "kernel" space to singleton sets in the "tile", "grid" and "block"
  24 spaces.  The sizes are specified outermost to innermost.
  25
  26 The dimension of the "tile" space indicates the (maximal) number of loop
  27 dimensions to tile.  The elements of the single integer tuple
  28 specify the tile sizes in each dimension.
  29
  30 The dimension of the "grid" space indicates the (maximal) number of block
  31 dimensions in the grid.  The elements of the single integer tuple
  32 specify the number of blocks in each dimension.
  33
  34 The dimension of the "block" space indicates the (maximal) number of thread
  35 dimensions in the grid.  The elements of the single integer tuple
  36 specify the number of threads in each dimension.
  37
  38 For example,
  39
  40     { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
  41
  42 specifies that in kernel 0, two loops should be tiled with a tile
  43 size of 64 in both dimensions and that all kernels except kernel 4
  44 should be run using a block of 16 threads.
  45
  46 Since PPCG performs some scheduling, it can be difficult to predict
  47 what exactly will end up in a kernel.  If you want to specify
  48 tile, grid or block sizes, you may want to run PPCG first with the defaults,
  49 examine the kernels and then run PPCG again with the desired sizes.
  50
  51
  52 Compiling the generated code with nvcc
  53
  54 To get optimal performance from nvcc, it is important to choose --arch
  55 according to your target GPU.  Specifically, use the flag "--arch sm_20"
  56 for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
  57 GK110 Kepler.  We discourage the use of older cards as we have seen
  58 correctness issues with compilation for older architectures.
  59 Note that in the absence of any --arch flag, nvcc defaults to
  60 "--arch sm_13". This will not only be slower, but can also cause
  61 correctness issues.