Extend fold_vec_perm to handle VLA vector_cst.
commita7dba4a1c05a76026d88dcccc0b519cf83bff9a2
authorPrathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Wed, 16 Aug 2023 11:21:44 +0000 (16 16:51 +0530)
committerPrathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Wed, 16 Aug 2023 11:21:44 +0000 (16 16:51 +0530)
tree16d14892f94dfc4edd4bbe0704429e781e2abbf9
parent1b7418ba1baf0d43fff6c6a68b8134813a35c1d9
Extend fold_vec_perm to handle VLA vector_cst.

The patch extends fold_vec_perm to fold VLA vector_csts.

For eg:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x

res = VEC_PERM_EXPR<arg0, arg1, sel>
--> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1

Eg 2:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x

For this case the index 2 in sel is ambiguous for len 2 + 2x:
if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0]
if x > 0, runtime vector length > 2 and sel[i] choose arg0[2].
So we return NULL_TREE for this case.

This leads us to defining a constraint that a stepped sequence in sel,
should only select a particular pattern from a particular input vector.

Eg 3:
arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x

sel contains a single pattern with stepped sequence: {0, 2, ...}.
Let, a1 = the first element of stepped part of sequence, which is 0.

Let esel = number of total elements in stepped sequence.
Thus,
esel = len / sel_npatterns
     = (4 + 4x) / 1
     = 4 + 4x

Let S = step of the sequence, which is 2 in this case.

Let ae = last element of the stepped sequence.
Thus,
ae = a1 + (esel - 2) * S
   = 0 + (4 + 4x - 2) * 2
   = 4 + 8x

To ensure that we select elements from the same input vector,
a1 /trunc len = ae /trunc len.
Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0
Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1
Since q1 != qe, we cross input vectors, and return NULL_TREE for this case.

However, if sel was:
sel = {len, 0, 1, ...}

The only change in this case is S = 1.
So,
ae = a1 + (esel - 2) * S
   = 0 + (4 + 4x - 2) * 1
   = 2 + 4x

In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements
from arg0.
Thus,
res = {arg1[0], arg0[0], arg0[1], ...}

For VLA folding, sel has to conform to constraints imposed in
valid_mask_for_fold_vec_perm_cst_p.
test_fold_vec_perm_cst defines several unit-tests for VLA folding.

gcc/ChangeLog:
* fold-const.cc (INCLUDE_ALGORITHM): Add Include.
(valid_mask_for_fold_vec_perm_cst_p): New function.
(fold_vec_perm_cst): Likewise.
(fold_vec_perm): Adjust assert and call fold_vec_perm_cst.
(test_fold_vec_perm_cst): New namespace.
(test_fold_vec_perm_cst::build_vec_cst_rand): New function.
(test_fold_vec_perm_cst::validate_res): Likewise.
(test_fold_vec_perm_cst::validate_res_vls): Likewise.
(test_fold_vec_perm_cst::builder_push_elems): Likewise.
(test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise.
(test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise.
(test_fold_vec_perm_cst::test_all_nunits): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_2): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_4): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_8): Likewise.
(test_fold_vec_perm_cst::test_nunits_max_4): Likewise.
(test_fold_vec_perm_cst::is_simple_vla_size): Likewise.
(test_fold_vec_perm_cst::test): Likewise.
(fold_const_cc_tests): Call test_fold_vec_perm_cst::test.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
gcc/fold-const.cc