Change the determination of parameters of macro-kernel
commitd8bb18c4ed6f8559d71c6457ffad4c75c71dd8da
authorRoman Gareev <gareevroman@gmail.com>
Wed, 21 Dec 2016 12:51:12 +0000 (21 12:51 +0000)
committerRoman Gareev <gareevroman@gmail.com>
Wed, 21 Dec 2016 12:51:12 +0000 (21 12:51 +0000)
tree0a8fe28608287e4c90e9011130ab517230c3a18c
parent79e99e2a01ec4f586e2d09ea1b70f1baf1d11f93
Change the determination of parameters of macro-kernel

Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.

This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28019

git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@290256 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Transform/ScheduleOptimizer.cpp
test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll
test/ScheduleOptimizer/mat_mul_pattern_data_layout_2.ll
test/ScheduleOptimizer/pattern-matching-based-opts_3.ll