README

   1 Requirements:
   2
   3 - gmp (http://gmplib.org/)
   4 - libyaml (http://pyyaml.org/wiki/LibYAML)
   5 - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
   6         For more details, see pet/README.
   7
   8 Compilation:
   9
  10         git clone git://repo.or.cz/ppcg.git
  11         cd ppcg
  12         git submodule init
  13         git submodule update
  14         ./autogen.sh
  15         ./configure
  16         make
  17         make check
  18
  19 Using PPCG to generate CUDA code
  20
  21 To convert a fragment of a C program to CUDA, insert a line containing
  22
  23         #pragma scop
  24
  25 before the fragment and add a line containing
  26
  27         #pragma endscop
  28
  29 after the fragment.  Then run
  30
  31         ppcg --target=cuda file.c
  32
  33 where file.c is the file containing the fragment.  The generated
  34 code is stored in file_host.cu and file_kernel.cu.
  35
  36 Specifying tile, grid and block sizes
  37
  38 The iterations space tile size, grid size and block size can
  39 be specified using the --sizes option.  The argument is a union map
  40 in isl notation mapping kernels identified by their sequence number
  41 in a "kernel" space to singleton sets in the "tile", "grid" and "block"
  42 spaces.  The sizes are specified outermost to innermost.
  43
  44 The dimension of the "tile" space indicates the (maximal) number of loop
  45 dimensions to tile.  The elements of the single integer tuple
  46 specify the tile sizes in each dimension.
  47
  48 The dimension of the "grid" space indicates the (maximal) number of block
  49 dimensions in the grid.  The elements of the single integer tuple
  50 specify the number of blocks in each dimension.
  51
  52 The dimension of the "block" space indicates the (maximal) number of thread
  53 dimensions in the grid.  The elements of the single integer tuple
  54 specify the number of threads in each dimension.
  55
  56 For example,
  57
  58     { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
  59
  60 specifies that in kernel 0, two loops should be tiled with a tile
  61 size of 64 in both dimensions and that all kernels except kernel 4
  62 should be run using a block of 16 threads.
  63
  64 Since PPCG performs some scheduling, it can be difficult to predict
  65 what exactly will end up in a kernel.  If you want to specify
  66 tile, grid or block sizes, you may want to run PPCG first with the defaults,
  67 examine the kernels and then run PPCG again with the desired sizes.
  68
  69
  70 Compiling the generated code with nvcc
  71
  72 To get optimal performance from nvcc, it is important to choose --arch
  73 according to your target GPU.  Specifically, use the flag "--arch sm_20"
  74 for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
  75 GK110 Kepler.  We discourage the use of older cards as we have seen
  76 correctness issues with compilation for older architectures.
  77 Note that in the absence of any --arch flag, nvcc defaults to
  78 "--arch sm_13". This will not only be slower, but can also cause
  79 correctness issues.