README

   1 Requirements:
   2
   3 - automake, autoconf, libtool
   4         (not needed when compiling a release)
   5 - pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
   6         (not needed when compiling a release using the included isl and pet)
   7 - gmp (http://gmplib.org/)
   8 - libyaml (http://pyyaml.org/wiki/LibYAML)
   9         (only needed if you want to compile the pet executable)
  10 - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
  11         Unless you have some other reasons for wanting to use the svn version,
  12         it is best to install the latest release (3.3).
  13         For more details, see pet/README.
  14
  15 If you are installing on Ubuntu, then you can install the following packages:
  16
  17 automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
  18
  19 Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
  20 Older versions of this package did not include the required libraries.
  21 If you are using an older version of ubuntu, then you need to compile and
  22 install LLVM/clang from source.
  23
  24
  25 Preparing:
  26
  27 Grab the latest release and extract it or get the source from
  28 the git repository as follows.  This process requires autoconf,
  29 automake, libtool and pkg-config.
  30
  31         git clone git://repo.or.cz/ppcg.git
  32         cd ppcg
  33         git submodule init
  34         git submodule update
  35         ./autogen.sh
  36
  37
  38 Compilation:
  39
  40         ./configure
  41         make
  42         make check
  43
  44 If you have installed any of the required libraries in a non-standard
  45 location, then you may need to use the --with-gmp-prefix,
  46 --with-libyaml-prefix and/or --with-clang-prefix options
  47 when calling "./configure".
  48
  49
  50 Using PPCG to generate CUDA or OpenCL code
  51
  52 To convert a fragment of a C program to CUDA, insert a line containing
  53
  54         #pragma scop
  55
  56 before the fragment and add a line containing
  57
  58         #pragma endscop
  59
  60 after the fragment.  To generate CUDA code run
  61
  62         ppcg --target=cuda file.c
  63
  64 where file.c is the file containing the fragment.  The generated
  65 code is stored in file_host.cu and file_kernel.cu.
  66
  67 To generate OpenCL code run
  68
  69         ppcg --target=opencl file.c
  70
  71 where file.c is the file containing the fragment.  The generated code
  72 is stored in file_host.c and file_kernel.cl.
  73
  74
  75 Specifying tile, grid and block sizes
  76
  77 The iterations space tile size, grid size and block size can
  78 be specified using the --sizes option.  The argument is a union map
  79 in isl notation mapping kernels identified by their sequence number
  80 in a "kernel" space to singleton sets in the "tile", "grid" and "block"
  81 spaces.  The sizes are specified outermost to innermost.
  82
  83 The dimension of the "tile" space indicates the (maximal) number of loop
  84 dimensions to tile.  The elements of the single integer tuple
  85 specify the tile sizes in each dimension.
  86
  87 The dimension of the "grid" space indicates the (maximal) number of block
  88 dimensions in the grid.  The elements of the single integer tuple
  89 specify the number of blocks in each dimension.
  90
  91 The dimension of the "block" space indicates the (maximal) number of thread
  92 dimensions in the grid.  The elements of the single integer tuple
  93 specify the number of threads in each dimension.
  94
  95 For example,
  96
  97     { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
  98
  99 specifies that in kernel 0, two loops should be tiled with a tile
 100 size of 64 in both dimensions and that all kernels except kernel 4
 101 should be run using a block of 16 threads.
 102
 103 Since PPCG performs some scheduling, it can be difficult to predict
 104 what exactly will end up in a kernel.  If you want to specify
 105 tile, grid or block sizes, you may want to run PPCG first with the defaults,
 106 examine the kernels and then run PPCG again with the desired sizes.
 107 Instead of examining the kernels, you can also specify the option
 108 --dump-sizes on the first run to obtain the effectively used default sizes.
 109
 110
 111 Compiling the generated CUDA code with nvcc
 112
 113 To get optimal performance from nvcc, it is important to choose --arch
 114 according to your target GPU.  Specifically, use the flag "--arch sm_20"
 115 for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
 116 GK110 Kepler.  We discourage the use of older cards as we have seen
 117 correctness issues with compilation for older architectures.
 118 Note that in the absence of any --arch flag, nvcc defaults to
 119 "--arch sm_13". This will not only be slower, but can also cause
 120 correctness issues.
 121 If you want to obtain results that are identical to those obtained
 122 by the original code, then you may need to disable some optimizations
 123 by passing the "--fmad=false" option.
 124
 125
 126 Compiling the generated OpenCL code with gcc
 127
 128 To compile the host code you need to link against the file
 129 ocl_utilities.c which contains utility functions used by the generated
 130 OpenCL host code.  To compile the host code with gcc, run
 131
 132   gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL
 133
 134 Note that we have experienced the generated OpenCL code freezing
 135 on some inputs (e.g., the PolyBench symm benchmark) when using
 136 at least some version of the Nvidia OpenCL library, while the
 137 corresponding CUDA code runs fine.
 138 We have experienced no such freezes when using AMD, ARM or Intel
 139 OpenCL libraries.
 140
 141
 142 Function calls
 143
 144 Function calls inside the analyzed fragment are reproduced
 145 in the CUDA or OpenCL code, but for now it is left to the user
 146 to make sure that the functions that are being called are
 147 available from the generated kernels.
 148
 149 In the case of OpenCL code, the --opencl-include-file option
 150 may be used to specify one or more files to be #include'd
 151 from the generated code.  These files may then contain
 152 the definitions of the functions being called from the
 153 program fragment.  If the pathnames of the included files
 154 are relative to the current directory, then you may need
 155 to additionally specify the --opencl-compiler-options=-I.
 156 to make sure that the files can be found by the OpenCL compiler.
 157
 158
 159 Processing PolyBench
 160
 161 When processing a PolyBench/C 3.2 benchmark, you should always specify
 162 -DPOLYBENCH_USE_C99_PROTO on the ppcg command line.  Otherwise, the source
 163 files are inconsistent, having fixed size arrays but parametrically
 164 bounded loops iterating over them.
 165 However, you should not specify this define when compiling
 166 the PPCG generated code using nvcc since CUDA does not support VLAs.
 167
 168
 169 CUDA and function overloading
 170
 171 While CUDA supports function overloading based on the arguments types,
 172 no such function overloading exists in the input language C.  Since PPCG
 173 simply prints out the same function name as in the original code, this
 174 may result in a different function being called based on the types
 175 of the arguments.  For example, if the original code contains a call
 176 to the function sqrt() with a float argument, then the argument will
 177 be promoted to a double and the sqrt() function will be called.
 178 In the transformed (CUDA) code, however, overloading will cause the
 179 function sqrtf() to be called.  Until this issue has been resolved in PPCG,
 180 we recommend that users either explicitly call the function sqrtf() or
 181 explicitly cast the argument to double in the input code.
 182
 183
 184 Additional variables in generated code
 185
 186 The generated code may contain additional variables with names
 187 that match /^[bthsgc][0-9]+$/.  These variables may shadow or
 188 conflict with variables in the input program.  Until this issue
 189 has been resolved in PPCG, you should avoid such variable names
 190 in your input program.
 191
 192
 193 Contact
 194
 195 For bug reports, feature requests and questions,
 196 contact http://groups.google.com/group/isl-development
 197
 198
 199 Citing PPCG
 200
 201 If you use PPCG for your research, you are invited to cite
 202 the following paper.
 203
 204 @article{Verdoolaege2013PPCG,
 205     author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
 206                 G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
 207                 Catthoor, Francky},
 208     title = {Polyhedral parallel code generation for CUDA},
 209     journal = {ACM Trans. Archit. Code Optim.},
 210     issue_date = {January 2013},
 211     volume = {9},
 212     number = {4},
 213     month = jan,
 214     year = {2013},
 215     issn = {1544-3566},
 216     pages = {54:1--54:23},
 217     doi = {10.1145/2400682.2400713},
 218     acmid = {2400713},
 219     publisher = {ACM},
 220     address = {New York, NY, USA},
 221 }