README

   1 Requirements:
   2
   3 - automake, autoconf, libtool
   4         (not needed when compiling a release)
   5 - pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
   6         (not needed when compiling a release using the included isl and pet)
   7 - gmp (http://gmplib.org/)
   8 - libyaml (http://pyyaml.org/wiki/LibYAML)
   9         (only needed if you want to compile the pet executable)
  10 - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
  11         Unless you have some other reasons for wanting to use the svn version,
  12         it is best to install the latest release (3.3).
  13         For more details, see pet/README.
  14
  15 If you are installing on Ubuntu, then you can install the following packages:
  16
  17 automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
  18
  19 Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
  20 Older versions of this package did not include the required libraries.
  21 If you are using an older version of ubuntu, then you need to compile and
  22 install LLVM/clang from source.
  23
  24
  25 Preparing:
  26
  27 Grab the latest release and extract it or get the source from
  28 the git repository as follows.  This process requires autoconf,
  29 automake, libtool and pkg-config.
  30
  31         git clone git://repo.or.cz/ppcg.git
  32         cd ppcg
  33         git submodule init
  34         git submodule update
  35         ./autogen.sh
  36
  37
  38 Compilation:
  39
  40         ./configure
  41         make
  42         make check
  43
  44 If you have installed any of the required libraries in a non-standard
  45 location, then you may need to use the --with-gmp-prefix,
  46 --with-libyaml-prefix and/or --with-clang-prefix options
  47 when calling "./configure".
  48
  49
  50 Using PPCG to generate CUDA code
  51
  52 To convert a fragment of a C program to CUDA, insert a line containing
  53
  54         #pragma scop
  55
  56 before the fragment and add a line containing
  57
  58         #pragma endscop
  59
  60 after the fragment.  Then run
  61
  62         ppcg --target=cuda file.c
  63
  64 where file.c is the file containing the fragment.  The generated
  65 code is stored in file_host.cu and file_kernel.cu.
  66
  67 Specifying tile, grid and block sizes
  68
  69 The iterations space tile size, grid size and block size can
  70 be specified using the --sizes option.  The argument is a union map
  71 in isl notation mapping kernels identified by their sequence number
  72 in a "kernel" space to singleton sets in the "tile", "grid" and "block"
  73 spaces.  The sizes are specified outermost to innermost.
  74
  75 The dimension of the "tile" space indicates the (maximal) number of loop
  76 dimensions to tile.  The elements of the single integer tuple
  77 specify the tile sizes in each dimension.
  78
  79 The dimension of the "grid" space indicates the (maximal) number of block
  80 dimensions in the grid.  The elements of the single integer tuple
  81 specify the number of blocks in each dimension.
  82
  83 The dimension of the "block" space indicates the (maximal) number of thread
  84 dimensions in the grid.  The elements of the single integer tuple
  85 specify the number of threads in each dimension.
  86
  87 For example,
  88
  89     { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
  90
  91 specifies that in kernel 0, two loops should be tiled with a tile
  92 size of 64 in both dimensions and that all kernels except kernel 4
  93 should be run using a block of 16 threads.
  94
  95 Since PPCG performs some scheduling, it can be difficult to predict
  96 what exactly will end up in a kernel.  If you want to specify
  97 tile, grid or block sizes, you may want to run PPCG first with the defaults,
  98 examine the kernels and then run PPCG again with the desired sizes.
  99
 100
 101 Compiling the generated code with nvcc
 102
 103 To get optimal performance from nvcc, it is important to choose --arch
 104 according to your target GPU.  Specifically, use the flag "--arch sm_20"
 105 for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
 106 GK110 Kepler.  We discourage the use of older cards as we have seen
 107 correctness issues with compilation for older architectures.
 108 Note that in the absence of any --arch flag, nvcc defaults to
 109 "--arch sm_13". This will not only be slower, but can also cause
 110 correctness issues.
 111 If you want to obtain results that are identical to those obtained
 112 by the original code, then you may need to disable some optimizations
 113 by passing the "--fmad=false" option.
 114
 115
 116 Processing PolyBench
 117
 118 When processing a PolyBench/C 3.2 benchmark, you should always specify
 119 -DPOLYBENCH_USE_C99_PROTO on the command line.  Otherwise, the source
 120 files are inconsistent, having fixed size arrays but parametrically
 121 bounded loops iterating over them.
 122
 123
 124 CUDA and function overloading
 125
 126 While CUDA supports function overloading based on the arguments types,
 127 no such function overloading exists in the input language C.  Since PPCG
 128 simply prints out the same function name as in the original code, this
 129 may result in a different function being called based on the types
 130 of the arguments.  For example, if the original code contains a call
 131 to the function sqrt() with a float argument, then the argument will
 132 be promoted to a double and the sqrt() function will be called.
 133 In the transformed (CUDA) code, however, overloading will cause the
 134 function sqrtf() to be called.  Until this issue has been resolved in PPCG,
 135 we recommend that users either explicitly call the function sqrtf() or
 136 explicitly cast the argument to double in the input code.
 137
 138
 139 Citing PPCG
 140
 141 If you use PPCG for your research, you are invited do cite
 142 the following paper.
 143
 144 @article{Verdoolaege2013PPCG,
 145     author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
 146                 G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
 147                 Catthoor, Francky},
 148     title = {Polyhedral parallel code generation for CUDA},
 149     journal = {ACM Trans. Archit. Code Optim.},
 150     issue_date = {January 2013},
 151     volume = {9},
 152     number = {4},
 153     month = jan,
 154     year = {2013},
 155     issn = {1544-3566},
 156     pages = {54:1--54:23},
 157     doi = {10.1145/2400682.2400713},
 158     acmid = {2400713},
 159     publisher = {ACM},
 160     address = {New York, NY, USA},
 161 }