aarch64: Spread out FPR usage between RA regions [PR113613]
commitff442719cdb64c9df9d069af88e90d51bee6fb56
authorRichard Sandiford <richard.sandiford@arm.com>
Fri, 23 Feb 2024 14:12:55 +0000 (23 14:12 +0000)
committerRichard Sandiford <richard.sandiford@arm.com>
Fri, 23 Feb 2024 14:12:55 +0000 (23 14:12 +0000)
tree57d15bb3a2c3f0ddab5a2a2d3d5eeec166ac53d6
parent9f105cfdc1bca6c9224384b3044c4ca5894e1e4c
aarch64: Spread out FPR usage between RA regions [PR113613]

early-ra already had code to do regrename-style "broadening"
of the allocation, to promote scheduling freedom.  However,
the pass divides the function into allocation regions
and this broadening only worked within a single region.
This meant that if a basic block contained one subblock
of FPR use, followed by a point at which no FPRs were live,
followed by another subblock of FPR use, the two subblocks
would tend to reuse the same registers.  This in turn meant
that it wasn't possible to form LDP/STP pairs between them.

The failure to form LDPs and STPs in the testcase was a
regression from GCC 13.

The patch adds a simple heuristic to prefer less recently
used registers in the event of a tie.

gcc/
PR target/113613
* config/aarch64/aarch64-early-ra.cc
(early_ra::m_current_region): New member variable.
(early_ra::m_fpr_recency): Likewise.
(early_ra::start_new_region): Bump m_current_region.
(early_ra::allocate_colors): Prefer less recently used registers
in the event of a tie.  Add a comment to explain why we prefer(ed)
higher-numbered registers.
(early_ra::find_oldest_color): Prefer less recently used registers
here too.
(early_ra::finalize_allocation): Update recency information for
allocated registers.
(early_ra::process_blocks): Initialize m_current_region and
m_fpr_recency.

gcc/testsuite/
PR target/113613
* gcc.target/aarch64/pr113613.c: New test.
gcc/config/aarch64/aarch64-early-ra.cc
gcc/testsuite/gcc.target/aarch64/pr113613.c [new file with mode: 0644]