1 <!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3 <!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
6 <META http-equiv=
"Content-Type" content=
"text/html; charset=ISO-8859-1">
7 <title>Polly - Examples
</title>
8 <link type=
"text/css" rel=
"stylesheet" href=
"menu.css">
9 <link type=
"text/css" rel=
"stylesheet" href=
"content.css">
13 <!--#include virtual="menu.html.incl"-->
15 <!--=====================================================================-->
16 <h1>Execute the individual Polly passes manually
</h1>
17 <!--=====================================================================-->
20 This example presents the individual passes that are involved when optimizing
21 code with Polly. We show how to execute them individually and explain for each
22 which analysis is performed or what transformation is applied. In this example
23 the polyhedral transformation is user-provided to show how much performance
24 improvement can be expected by an optimal automatic optimizer.
</p>
26 The files used and created in this example are available in the Polly checkout
27 in the folder
<em>www/experiments/matmul
</em>. They can be created automatically
28 by running the
<em>www/experiments/matmul/runall.sh
</em> script.
31 <li><h4>Create LLVM-IR from the C code
</h4>
33 Polly works on LLVM-IR. Hence it is necessary to translate the source files into
34 LLVM-IR. If more than on file should be optimized the files can be combined into
35 a single file with llvm-link.
37 <pre class=
"code">clang -S -emit-llvm matmul.c -o matmul.s
</pre>
41 <li><h4>Load Polly automatically when calling the 'opt' tool
</h4>
43 Polly is not built into opt or bugpoint, but it is a shared library that needs
44 to be loaded into these tools explicitally. The Polly library is called
45 LVMPolly.so. It is available in the build/lib/ directory. For convenience we create
46 an alias that automatically loads Polly if 'opt' is called.
48 export
PATH_TO_POLLY_LIB=
"~/polly/build/lib/"
49 alias
opt=
"opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre>
52 <li><h4>Prepare the LLVM-IR for Polly
</h4>
54 Polly is only able to work with code that matches a canonical form. To translate
55 the LLVM-IR into this form we use a set of canonicalication passes. They are
56 scheduled by using '-polly-canonicalize'.
57 <pre class=
"code">opt -S -polly-canonicalize matmul.s
> matmul.preopt.ll
</pre></li>
59 <li><h4>Show the SCoPs detected by Polly (optional)
</h4>
61 To understand if Polly was able to detect SCoPs, we print the
62 structure of the detected SCoPs. In our example two SCoPs were detected. One in
63 'init_array' the other in 'main'.
65 <pre class=
"code">opt -basicaa -polly-ast -analyze -q matmul.preopt.ll
</pre>
69 for (c2=
0;c2
<=
1023;c2++) {
70 for (c4=
0;c4
<=
1023;c4++) {
76 for (c2=
0;c2
<=
1023;c2++) {
77 for (c4=
0;c4
<=
1023;c4++) {
79 for (c6=
0;c6
<=
1023;c6++) {
86 <li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)
</h4>
88 Polly can use graphviz to graphically show a CFG in which the detected SCoPs are
89 highlighted. It can also create '.dot' files that can be translated by
90 the 'dot' utility into various graphic formats.
92 <pre class=
"code">opt -basicaa -view-scops -disable-output matmul.preopt.ll
93 opt -basicaa -view-scops-only -disable-output matmul.preopt.ll
</pre>
94 The output for the different functions
<br />
96 <a href=
"experiments/matmul/scops.main.dot.png">main
</a>,
97 <a href=
"experiments/matmul/scops.init_array.dot.png">init_array
</a>,
98 <a href=
"experiments/matmul/scops.print_array.dot.png">print_array
</a><br />
100 <a href=
"experiments/matmul/scopsonly.main.dot.png">main
</a>,
101 <a href=
"experiments/matmul/scopsonly.init_array.dot.png">init_array
</a>,
102 <a href=
"experiments/matmul/scopsonly.print_array.dot.png">print_array
</a>
105 <li><h4>View the polyhedral representation of the SCoPs
</h4>
106 <pre class=
"code">opt -basicaa -polly-scops -analyze matmul.preopt.ll
</pre>
109 Printing analysis 'Polly - Create polyhedral description of Scops' for region:
110 'for.cond =
> for.end19' in function 'init_array':
116 { Stmt_5[i0, i1]
: i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023 };
118 { Stmt_5[i0, i1] -
> schedule[
0, i0,
0, i1,
0] };
120 { Stmt_5[i0, i1] -
> MemRef_A[
1037i0 + i1] };
122 { Stmt_5[i0, i1] -
> MemRef_B[
1047i0 + i1] };
127 { FinalRead[i0] -
> schedule[
200000000, o1, o2, o3, o4] };
129 { FinalRead[i0] -
> MemRef_A[o0] };
131 { FinalRead[i0] -
> MemRef_B[o0] };
134 Printing analysis 'Polly - Create polyhedral description of Scops' for region:
135 'for.cond =
> for.end30' in function 'main':
141 { Stmt_4[i0, i1]
: i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023 };
143 { Stmt_4[i0, i1] -
> schedule[
0, i0,
0, i1,
0,
0,
0] };
145 { Stmt_4[i0, i1] -
> MemRef_C[
1067i0 + i1] };
148 { Stmt_6[i0, i1, i2]
: i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023 and i2
>=
0 and i2
<=
1023 };
150 { Stmt_6[i0, i1, i2] -
> schedule[
0, i0,
0, i1,
1, i2,
0] };
152 { Stmt_6[i0, i1, i2] -
> MemRef_C[
1067i0 + i1] };
154 { Stmt_6[i0, i1, i2] -
> MemRef_A[
1037i0 + i2] };
156 { Stmt_6[i0, i1, i2] -
> MemRef_B[i1 +
1047i2] };
158 { Stmt_6[i0, i1, i2] -
> MemRef_C[
1067i0 + i1] };
163 { FinalRead[i0] -
> schedule[
200000000, o1, o2, o3, o4, o5, o6] };
165 { FinalRead[i0] -
> MemRef_C[o0] };
167 { FinalRead[i0] -
> MemRef_A[o0] };
169 { FinalRead[i0] -
> MemRef_B[o0] };
175 <li><h4>Show the dependences for the SCoPs
</h4>
176 <pre class=
"code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll
</pre>
177 <pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region:
178 'for.cond =
> for.end19' in function 'init_array':
187 Printing analysis 'Polly - Calculate dependences for SCoP' for region:
188 'for.cond =
> for.end30' in function 'main':
190 { Stmt_4[i0, i1] -
> Stmt_6[i0, i1,
0]
:
191 i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023;
192 Stmt_6[i0, i1, i2] -
> Stmt_6[i0, i1,
1 + i2]
:
193 i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023 and i2
>=
0 and i2
<=
1022;
194 Stmt_6[i0, i1,
1023] -
> FinalRead[
0]
:
195 i1
<=
1091540 -
1067i0 and i1
>= -
1067i0 and i1
>=
0 and i1
<=
1023;
196 Stmt_6[
1023, i1,
1023] -
> FinalRead[
0]
:
197 i1
>=
0 and i1
<=
1023
202 { Stmt_6[i0, i1, i2] -
> MemRef_A[
1037i0 + i2]
:
203 i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023 and i2
>=
0 and i2
<=
1023;
204 Stmt_6[i0, i1, i2] -
> MemRef_B[i1 +
1047i2]
:
205 i0
>=
0 and i0
<=
1023 and i1
>=
0 and i1
<=
1023 and i2
>=
0 and i2
<=
1023;
206 FinalRead[
0] -
> MemRef_A[o0];
207 FinalRead[
0] -
> MemRef_B[o0]
208 FinalRead[
0] -
> MemRef_C[o0]
:
209 o0
>=
1092565 or (exists (e0 = [(o0)/
1067]: o0
<=
1091540 and o0
>=
0
210 and
1067e0
<= -
1024 + o0 and
1067e0
>= -
1066 + o0)) or o0
<= -
1;
216 <li><h4>Export jscop files
</h4>
218 Polly can export the polyhedral representation in so called jscop files. Jscop
219 files contain the polyhedral representation stored in a JSON file.
220 <pre class=
"code">opt -basicaa -polly-export-jscop matmul.preopt.ll
</pre>
221 <pre>Writing SCoP 'for.cond =
> for.end19' in function 'init_array' to './init_array___%for.cond---%for.end19.jscop'.
222 Writing SCoP 'for.cond =
> for.end30' in function 'main' to './main___%for.cond---%for.end30.jscop'.
225 <li><h4>Import the changed jscop files and print the updated SCoP structure
227 <p>Polly can reimport jscop files, in which the schedules of the statements are
228 changed. These changed schedules are used to descripe transformations.
229 It is possible to import different jscop files by providing the postfix
230 of the jscop file that is imported.
</p>
231 <p> We apply three different transformations on the SCoP in the main function.
232 The jscop files describing these transformations are hand written (and available
233 in
<em>www/experiments/matmul
</em>).
237 <p>As a baseline we do not call any Polly code generation, but only apply the
238 normal -O3 optimizations.
</p>
241 opt matmul.preopt.ll -basicaa \
242 -polly-import-jscop \
248 for (c2=
0;c2
<g;=
1535;c2++) {
249 for (c4=
0;c4
<g;=
1535;c4++) {
251 for (c6=
0;c6
<g;=
1535;c6++) {
258 <h5>Interchange (and Fission to allow the interchange)
</h5>
259 <p>We split the loops and can now apply an interchange of the loop dimensions that
260 enumerate Stmt_6.
</p>
262 opt matmul.preopt.ll -basicaa \
263 -polly-import-jscop -polly-import-jscop-postfix=interchanged \
268 Reading JScop 'for.cond =
> for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
271 for (c2=
0;c2
<=
1535;c2++) {
272 for (c4=
0;c4
<=
1535;c4++) {
276 for (c2=
0;c2
<=
1535;c2++) {
277 for (c4=
0;c4
<=
1535;c4++) {
278 for (c6=
0;c6
<=
1535;c6++) {
285 <h5>Interchange + Tiling
</h5>
286 <p>In addition to the interchange we tile now the second loop nest.
</p>
289 opt matmul.preopt.ll -basicaa \
290 -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
295 Reading JScop 'for.cond =
> for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
298 for (c2=
0;c2
<=
1535;c2++) {
299 for (c4=
0;c4
<=
1535;c4++) {
303 for (c2=
0;c2
<=
1535;c2+=
64) {
304 for (c3=
0;c3
<=
1535;c3+=
64) {
305 for (c4=
0;c4
<=
1535;c4+=
64) {
306 for (c5=c2;c5
<=c2+
63;c5++) {
307 for (c6=c4;c6
<=c4+
63;c6++) {
308 for (c7=c3;c7
<=c3+
63;c7++) {
318 <h5>Interchange + Tiling + Strip-mining to prepare vectorization
</h5>
319 To later allow vectorization we create a so called trivially parallelizable
320 loop. It is innermost, parallel and has only four iterations. It can be
321 replaced by
4-element SIMD instructions.
323 opt matmul.preopt.ll -basicaa \
324 -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
325 -polly-ast -analyze
</pre>
329 Reading JScop 'for.cond =
> for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
332 for (c2=
0;c2
<=
1535;c2++) {
333 for (c4=
0;c4
<=
1535;c4++) {
337 for (c2=
0;c2
<=
1535;c2+=
64) {
338 for (c3=
0;c3
<=
1535;c3+=
64) {
339 for (c4=
0;c4
<=
1535;c4+=
64) {
340 for (c5=c2;c5
<=c2+
63;c5++) {
341 for (c6=c4;c6
<=c4+
63;c6++) {
342 for (c7=c3;c7
<=c3+
63;c7+=
4) {
343 for (c8=c7;c8
<=c7+
3;c8++) {
357 <li><h4>Codegenerate the SCoPs
</h4>
359 This generates new code for the SCoPs detected by polly.
360 If -polly-import-jscop is present, transformations specified in the imported
361 jscop files will be applied.
</p>
362 <pre class=
"code">opt matmul.preopt.ll | opt -O3
> matmul.normalopt.ll
</pre>
365 -polly-import-jscop -polly-import-jscop-postfix=interchanged \
366 -polly-codegen matmul.preopt.ll \
367 | opt -O3
> matmul.polly.interchanged.ll
</pre>
369 Reading JScop 'for.cond =
> for.end19' in function 'init_array' from
370 './init_array___%for.cond---%for.end19.jscop.interchanged'.
371 File could not be read: No such file or directory
372 Reading JScop 'for.cond =
> for.end30' in function 'main' from
373 './main___%for.cond---%for.end30.jscop.interchanged'.
377 -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
378 -polly-codegen matmul.preopt.ll \
379 | opt -O3
> matmul.polly.interchanged+tiled.ll
</pre>
381 Reading JScop 'for.cond =
> for.end19' in function 'init_array' from
382 './init_array___%for.cond---%for.end19.jscop.interchanged+tiled'.
383 File could not be read: No such file or directory
384 Reading JScop 'for.cond =
> for.end30' in function 'main' from
385 './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
389 -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
390 -polly-codegen -polly-vectorizer=polly matmul.preopt.ll \
391 | opt -O3
> matmul.polly.interchanged+tiled+vector.ll
</pre>
393 Reading JScop 'for.cond =
> for.end19' in function 'init_array' from
394 './init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
395 File could not be read: No such file or directory
396 Reading JScop 'for.cond =
> for.end30' in function 'main' from
397 './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
401 -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
402 -polly-codegen -polly-vectorizer=polly -polly-parallel matmul.preopt.ll \
403 | opt -O3
> matmul.polly.interchanged+tiled+openmp.ll
</pre>
405 Reading JScop 'for.cond =
> for.end19' in function 'init_array' from
406 './init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
407 File could not be read: No such file or directory
408 Reading JScop 'for.cond =
> for.end30' in function 'main' from
409 './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
412 <li><h4>Create the executables
</h4>
414 Create one executable optimized with plain -O3 as well as a set of executables
415 optimized in different ways with Polly. One changes only the loop structure, the
416 other adds tiling, the next adds vectorization and finally we use OpenMP
419 llc matmul.normalopt.ll -o matmul.normalopt.s
&& \
420 gcc matmul.normalopt.s -o matmul.normalopt.exe
421 llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s
&& \
422 gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
423 llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s
&& \
424 gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
425 llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s
&& \
426 gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
427 llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s
&& \
428 gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe
</pre>
430 <li><h4>Compare the runtime of the executables
</h4>
432 By comparing the runtimes of the different code snippets we see that a simple
433 loop interchange gives here the largest performance boost. However by adding
434 vectorization and by using OpenMP we can further improve the performance
436 <pre class=
"code">time ./matmul.normalopt.exe
</pre>
437 <pre>42.68 real,
42.55 user,
0.00 sys
</pre>
438 <pre class=
"code">time ./matmul.polly.interchanged.exe
</pre>
439 <pre>04.33 real,
4.30 user,
0.01 sys
</pre>
440 <pre class=
"code">time ./matmul.polly.interchanged+tiled.exe
</pre>
441 <pre>04.11 real,
4.10 user,
0.00 sys
</pre>
442 <pre class=
"code">time ./matmul.polly.interchanged+tiled+vector.exe
</pre>
443 <pre>01.39 real,
1.36 user,
0.01 sys
</pre>
444 <pre class=
"code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe
</pre>
445 <pre>00.66 real,
2.58 user,
0.02 sys
</pre>