From 8fa7f434855ab943b2e88821e8aabf5a2c2aa754 Mon Sep 17 00:00:00 2001 From: rsandifo Date: Sat, 13 Jan 2018 17:50:35 +0000 Subject: [PATCH] [AArch64] Add SVE support This patch adds support for ARM's Scalable Vector Extension. The patch just contains the core features that work with the current vectoriser framework; later patches will add extra capabilities to both the target-independent code and AArch64 code. The patch doesn't include: - support for unwinding frames whose size depends on the vector length - modelling the effect of __tls_get_addr on the SVE registers These are handled by later patches instead. Some notes: - The copyright years for aarch64-sve.md start at 2009 because some of the code is based on aarch64.md, which also starts from then. - The patch inserts spaces between items in the AArch64 section of sourcebuild.texi. This matches at least the surrounding architectures and looks a little nicer in the info output. - aarch64-sve.md includes a pattern: while_ult A later patch adds a matching "while_ult" optab, but the pattern is also needed by the predicate vec_duplicate expander. 2018-01-13 Richard Sandiford Alan Hayward David Sherwood gcc/ * doc/invoke.texi (-msve-vector-bits=): Document new option. (sve): Document new AArch64 extension. * doc/md.texi (w): Extend the description of the AArch64 constraint to include SVE vectors. (Upl, Upa): Document new AArch64 predicate constraints. * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New enum. * config/aarch64/aarch64.opt (sve_vector_bits): New enum. (msve-vector-bits=): New option. * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable SVE when these are disabled. (sve): New extension. * config/aarch64/aarch64-modes.def: Define SVE vector and predicate modes. Adjust their number of units based on aarch64_sve_vg. (MAX_BITSIZE_MODE_ANY_MODE): Define. * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New aarch64_addr_query_type. (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode) (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p) (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries) (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate) (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate) (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare. (aarch64_simd_imm_zero_p): Delete. (aarch64_check_zero_based_sve_index_immediate): Declare. (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) (aarch64_sve_float_mul_immediate_p): Likewise. (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT rather than an rtx. (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare. (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback. (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare. (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float) (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare. (aarch64_regmode_natural_size): Likewise. * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro. (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift left one place. (AARCH64_ISA_SVE, TARGET_SVE): New macros. (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries for VG and the SVE predicate registers. (V_ALIASES): Add a "z"-prefixed alias. (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1. (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros. (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise. (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes. (REG_CLASS_NAMES): Add entries for them. (REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG and the predicate registers. (aarch64_sve_vg): Declare. (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED) (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros. (REGMODE_NATURAL_SIZE): Define. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle SVE macros. * config/aarch64/aarch64.c: Include cfgrtl.h. (simd_immediate_info): Add a constructor for series vectors, and an associated step field. (aarch64_sve_vg): New variable. (aarch64_dbx_register_number): Handle VG and the predicate registers. (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete. (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE) (VEC_ANY_DATA, VEC_STRUCT): New constants. (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p) (aarch64_classify_vector_mode, aarch64_vector_data_mode_p) (aarch64_sve_data_mode_p, aarch64_sve_pred_mode) (aarch64_get_mask_mode): New functions. (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS. (aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE predicate modes and predicate registers. Explicitly restrict GPRs to modes of 16 bytes or smaller. Only allow FP registers to store a vector mode if it is recognized by aarch64_classify_vector_mode. (aarch64_regmode_natural_size): New function. (aarch64_hard_regno_caller_save_mode): Return the original mode for predicates. (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate) (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl) (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate) (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New functions. (aarch64_add_offset): Add a temp2 parameter. Assert that temp1 does not overlap dest if the function is frame-related. Handle SVE constants. (aarch64_split_add_offset): New function. (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass them aarch64_add_offset. (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter and update call to aarch64_sub_sp. (aarch64_add_cfa_expression): New function. (aarch64_expand_prologue): Pass extra temporary registers to the functions above. Handle the case in which we need to emit new DW_CFA_expressions for registers that were originally saved relative to the stack pointer, but now have to be expressed relative to the frame pointer. (aarch64_output_mi_thunk): Pass extra temporary registers to the functions above. (aarch64_expand_epilogue): Likewise. Prevent inheritance of IP0 and IP1 values for SVE frames. (aarch64_expand_vec_series): New function. (aarch64_expand_sve_widened_duplicate): Likewise. (aarch64_expand_sve_const_vector): Likewise. (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter. Handle SVE constants. Use emit_move_insn to move a force_const_mem into the register, rather than emitting a SET directly. (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move) (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p) (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p) (offset_9bit_signed_scaled_p): New functions. (aarch64_replicate_bitmask_imm): New function. (aarch64_bitmask_imm): Use it. (aarch64_cannot_force_const_mem): Reject expressions involving a CONST_POLY_INT. Update call to aarch64_classify_symbol. (aarch64_classify_index): Handle SVE indices, by requiring a plain register index with a scale that matches the element size. (aarch64_classify_address): Handle SVE addresses. Assert that the mode of the address is VOIDmode or an integer mode. Update call to aarch64_classify_symbol. (aarch64_classify_symbolic_expression): Update call to aarch64_classify_symbol. (aarch64_const_vec_all_in_range_p): New function. (aarch64_print_vector_float_operand): Likewise. (aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than "vN" for FP registers with SVE modes. Handle (const ...) vectors and the FP immediates 1.0 and 0.5. (aarch64_print_address_internal): Handle SVE addresses. (aarch64_print_operand_address): Use ADDR_QUERY_ANY. (aarch64_regno_regclass): Handle predicate registers. (aarch64_secondary_reload): Handle big-endian reloads of SVE data modes. (aarch64_class_max_nregs): Handle SVE modes and predicate registers. (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions. (aarch64_convert_sve_vector_bits): New function. (aarch64_override_options): Use it to handle -msve-vector-bits=. (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT rather than an rtx. (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode. Handle SVE vector and predicate modes. Accept VL-based constants that need only one temporary register, and VL offsets that require no temporary registers. (aarch64_conditional_register_usage): Mark the predicate registers as fixed if SVE isn't available. (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode. Return true for SVE vector and predicate modes. (aarch64_simd_container_mode): Take the number of bits as a poly_int64 rather than an unsigned int. Handle SVE modes. (aarch64_preferred_simd_mode): Update call accordingly. Handle SVE modes. (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR if SVE is enabled. (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) (aarch64_sve_float_mul_immediate_p): New functions. (aarch64_sve_valid_immediate): New function. (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors. Explicitly reject structure modes. Check for INDEX constants. Handle PTRUE and PFALSE constants. (aarch64_check_zero_based_sve_index_immediate): New function. (aarch64_simd_imm_zero_p): Delete. (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for vector modes. Accept constants in the range of CNT[BHWD]. (aarch64_simd_scalar_immediate_valid_for_move): Explicitly ask for an Advanced SIMD mode. (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions. (aarch64_simd_vector_alignment): Handle SVE predicates. (aarch64_vectorize_preferred_vector_alignment): New function. (aarch64_simd_vector_alignment_reachable): Use it instead of the vector size. (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p. (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New functions. (MAX_VECT_LEN): Delete. (expand_vec_perm_d): Add a vec_flags field. (emit_unspec2, aarch64_expand_sve_vec_perm): New functions. (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip) (aarch64_evpc_ext): Don't apply a big-endian lane correction for SVE modes. (aarch64_evpc_rev): Rename to... (aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE. (aarch64_evpc_rev_global): New function. (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP. (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of MAX_VECT_LEN. (aarch64_evpc_sve_tbl): New function. (aarch64_expand_vec_perm_const_1): Update after rename of aarch64_evpc_rev. Handle SVE permutes too, trying aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather than aarch64_evpc_tbl. (aarch64_vectorize_vec_perm_const): Initialize vec_flags. (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code) (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int) (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or) (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float) (aarch64_expand_sve_vcond): New functions. (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead of aarch64_vector_mode_p. (aarch64_dwarf_poly_indeterminate_value): New function. (aarch64_compute_pressure_classes): Likewise. (aarch64_can_change_mode_class): Likewise. (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine. (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise. (TARGET_VECTORIZE_GET_MASK_MODE): Likewise. (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise. (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise. (TARGET_CAN_CHANGE_MODE_CLASS): Likewise. * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr) (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New constraints. (Dn, Dl, Dr): Accept const as well as const_vector. (Dz): Likewise. Compare against CONST0_RTX. * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead of "vector" where appropriate. (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD) (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators. (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT) (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE) (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS) (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs. (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV) (v_int_equiv): Extend to SVE modes. (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New mode attributes. (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators. (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt. (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs. (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP) (SVE_COND_FP_CMP): New int iterators. (perm_hilo): Handle the new unpack unspecs. (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int attributes. * config/aarch64/predicates.md (aarch64_sve_cnt_immediate) (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate) (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand) (aarch64_equality_operator, aarch64_constant_vector_operand) (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates. (aarch64_sve_nonimmediate_operand): Likewise. (aarch64_sve_general_operand): Likewise. (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise. (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate) (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise. (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise. (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise. (aarch64_sve_float_arith_immediate): Likewise. (aarch64_sve_float_arith_with_sub_immediate): Likewise. (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise. (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise. (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise. (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise. (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise. (aarch64_sve_float_arith_operand): Likewise. (aarch64_sve_float_arith_with_sub_operand): Likewise. (aarch64_sve_float_mul_operand): Likewise. (aarch64_sve_vec_perm_operand): Likewise. (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate. (aarch64_mov_operand): Accept const_poly_int and const_vector. (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const as well as const_vector. (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier in file. Use CONST0_RTX and CONSTM1_RTX. (aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes. (aarch64_simd_reg_or_zero): Accept const as well as const_vector. Use aarch64_simd_imm_zero. * config/aarch64/aarch64-sve.md: New file. * config/aarch64/aarch64.md: Include it. (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers. (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE) (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI) (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK) (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants. (sve): New attribute. (enabled): Disable instructions with the sve attribute unless TARGET_SVE. (movqi, movhi): Pass CONST_POLY_INT operaneds through aarch64_expand_mov_immediate. (*mov_aarch64, *movsi_aarch64, *movdi_aarch64): Handle CNT[BHSD] immediates. (movti): Split CONST_POLY_INT moves into two halves. (add3): Accept aarch64_pluslong_or_poly_operand. Split additions that need a temporary here if the destination is the stack pointer. (*add3_aarch64): Handle ADDVL and ADDPL immediates. (*add3_poly_1): New instruction. (set_clobber_cc): New expander. Reviewed-by: James Greenhalgh git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@256612 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 292 +++ gcc/config/aarch64/aarch64-c.c | 9 + gcc/config/aarch64/aarch64-modes.def | 50 + gcc/config/aarch64/aarch64-option-extensions.def | 20 +- gcc/config/aarch64/aarch64-opts.h | 10 + gcc/config/aarch64/aarch64-protos.h | 48 +- gcc/config/aarch64/aarch64-sve.md | 1922 ++++++++++++++++++ gcc/config/aarch64/aarch64.c | 2318 ++++++++++++++++++++-- gcc/config/aarch64/aarch64.h | 96 +- gcc/config/aarch64/aarch64.md | 183 +- gcc/config/aarch64/aarch64.opt | 26 + gcc/config/aarch64/constraints.md | 120 +- gcc/config/aarch64/iterators.md | 400 +++- gcc/config/aarch64/predicates.md | 198 +- gcc/doc/invoke.texi | 20 + gcc/doc/md.texi | 8 +- 16 files changed, 5367 insertions(+), 353 deletions(-) create mode 100644 gcc/config/aarch64/aarch64-sve.md diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3f1919e774f..40da1eb477a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,4 +1,296 @@ 2018-01-13 Richard Sandiford + Alan Hayward + David Sherwood + + * doc/invoke.texi (-msve-vector-bits=): Document new option. + (sve): Document new AArch64 extension. + * doc/md.texi (w): Extend the description of the AArch64 + constraint to include SVE vectors. + (Upl, Upa): Document new AArch64 predicate constraints. + * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New + enum. + * config/aarch64/aarch64.opt (sve_vector_bits): New enum. + (msve-vector-bits=): New option. + * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable + SVE when these are disabled. + (sve): New extension. + * config/aarch64/aarch64-modes.def: Define SVE vector and predicate + modes. Adjust their number of units based on aarch64_sve_vg. + (MAX_BITSIZE_MODE_ANY_MODE): Define. + * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New + aarch64_addr_query_type. + (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode) + (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p) + (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries) + (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate) + (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate) + (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare. + (aarch64_simd_imm_zero_p): Delete. + (aarch64_check_zero_based_sve_index_immediate): Declare. + (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) + (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) + (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) + (aarch64_sve_float_mul_immediate_p): Likewise. + (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT + rather than an rtx. + (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare. + (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback. + (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare. + (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float) + (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare. + (aarch64_regmode_natural_size): Likewise. + * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro. + (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift + left one place. + (AARCH64_ISA_SVE, TARGET_SVE): New macros. + (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries + for VG and the SVE predicate registers. + (V_ALIASES): Add a "z"-prefixed alias. + (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1. + (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros. + (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise. + (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes. + (REG_CLASS_NAMES): Add entries for them. + (REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG + and the predicate registers. + (aarch64_sve_vg): Declare. + (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED) + (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros. + (REGMODE_NATURAL_SIZE): Define. + * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle + SVE macros. + * config/aarch64/aarch64.c: Include cfgrtl.h. + (simd_immediate_info): Add a constructor for series vectors, + and an associated step field. + (aarch64_sve_vg): New variable. + (aarch64_dbx_register_number): Handle VG and the predicate registers. + (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete. + (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE) + (VEC_ANY_DATA, VEC_STRUCT): New constants. + (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p) + (aarch64_classify_vector_mode, aarch64_vector_data_mode_p) + (aarch64_sve_data_mode_p, aarch64_sve_pred_mode) + (aarch64_get_mask_mode): New functions. + (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS + and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS. + (aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE + predicate modes and predicate registers. Explicitly restrict + GPRs to modes of 16 bytes or smaller. Only allow FP registers + to store a vector mode if it is recognized by + aarch64_classify_vector_mode. + (aarch64_regmode_natural_size): New function. + (aarch64_hard_regno_caller_save_mode): Return the original mode + for predicates. + (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate) + (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl) + (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate) + (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New + functions. + (aarch64_add_offset): Add a temp2 parameter. Assert that temp1 + does not overlap dest if the function is frame-related. Handle + SVE constants. + (aarch64_split_add_offset): New function. + (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass + them aarch64_add_offset. + (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter + and update call to aarch64_sub_sp. + (aarch64_add_cfa_expression): New function. + (aarch64_expand_prologue): Pass extra temporary registers to the + functions above. Handle the case in which we need to emit new + DW_CFA_expressions for registers that were originally saved + relative to the stack pointer, but now have to be expressed + relative to the frame pointer. + (aarch64_output_mi_thunk): Pass extra temporary registers to the + functions above. + (aarch64_expand_epilogue): Likewise. Prevent inheritance of + IP0 and IP1 values for SVE frames. + (aarch64_expand_vec_series): New function. + (aarch64_expand_sve_widened_duplicate): Likewise. + (aarch64_expand_sve_const_vector): Likewise. + (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter. + Handle SVE constants. Use emit_move_insn to move a force_const_mem + into the register, rather than emitting a SET directly. + (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move) + (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p) + (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p) + (offset_9bit_signed_scaled_p): New functions. + (aarch64_replicate_bitmask_imm): New function. + (aarch64_bitmask_imm): Use it. + (aarch64_cannot_force_const_mem): Reject expressions involving + a CONST_POLY_INT. Update call to aarch64_classify_symbol. + (aarch64_classify_index): Handle SVE indices, by requiring + a plain register index with a scale that matches the element size. + (aarch64_classify_address): Handle SVE addresses. Assert that + the mode of the address is VOIDmode or an integer mode. + Update call to aarch64_classify_symbol. + (aarch64_classify_symbolic_expression): Update call to + aarch64_classify_symbol. + (aarch64_const_vec_all_in_range_p): New function. + (aarch64_print_vector_float_operand): Likewise. + (aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than + "vN" for FP registers with SVE modes. Handle (const ...) vectors + and the FP immediates 1.0 and 0.5. + (aarch64_print_address_internal): Handle SVE addresses. + (aarch64_print_operand_address): Use ADDR_QUERY_ANY. + (aarch64_regno_regclass): Handle predicate registers. + (aarch64_secondary_reload): Handle big-endian reloads of SVE + data modes. + (aarch64_class_max_nregs): Handle SVE modes and predicate registers. + (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions. + (aarch64_convert_sve_vector_bits): New function. + (aarch64_override_options): Use it to handle -msve-vector-bits=. + (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT + rather than an rtx. + (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode. + Handle SVE vector and predicate modes. Accept VL-based constants + that need only one temporary register, and VL offsets that require + no temporary registers. + (aarch64_conditional_register_usage): Mark the predicate registers + as fixed if SVE isn't available. + (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode. + Return true for SVE vector and predicate modes. + (aarch64_simd_container_mode): Take the number of bits as a poly_int64 + rather than an unsigned int. Handle SVE modes. + (aarch64_preferred_simd_mode): Update call accordingly. Handle + SVE modes. + (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR + if SVE is enabled. + (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p) + (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p) + (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p) + (aarch64_sve_float_mul_immediate_p): New functions. + (aarch64_sve_valid_immediate): New function. + (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors. + Explicitly reject structure modes. Check for INDEX constants. + Handle PTRUE and PFALSE constants. + (aarch64_check_zero_based_sve_index_immediate): New function. + (aarch64_simd_imm_zero_p): Delete. + (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for + vector modes. Accept constants in the range of CNT[BHWD]. + (aarch64_simd_scalar_immediate_valid_for_move): Explicitly + ask for an Advanced SIMD mode. + (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions. + (aarch64_simd_vector_alignment): Handle SVE predicates. + (aarch64_vectorize_preferred_vector_alignment): New function. + (aarch64_simd_vector_alignment_reachable): Use it instead of + the vector size. + (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p. + (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New + functions. + (MAX_VECT_LEN): Delete. + (expand_vec_perm_d): Add a vec_flags field. + (emit_unspec2, aarch64_expand_sve_vec_perm): New functions. + (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip) + (aarch64_evpc_ext): Don't apply a big-endian lane correction + for SVE modes. + (aarch64_evpc_rev): Rename to... + (aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE. + (aarch64_evpc_rev_global): New function. + (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP. + (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of + MAX_VECT_LEN. + (aarch64_evpc_sve_tbl): New function. + (aarch64_expand_vec_perm_const_1): Update after rename of + aarch64_evpc_rev. Handle SVE permutes too, trying + aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather + than aarch64_evpc_tbl. + (aarch64_vectorize_vec_perm_const): Initialize vec_flags. + (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code) + (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int) + (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or) + (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float) + (aarch64_expand_sve_vcond): New functions. + (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead + of aarch64_vector_mode_p. + (aarch64_dwarf_poly_indeterminate_value): New function. + (aarch64_compute_pressure_classes): Likewise. + (aarch64_can_change_mode_class): Likewise. + (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine. + (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise. + (TARGET_VECTORIZE_GET_MASK_MODE): Likewise. + (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise. + (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise. + (TARGET_CAN_CHANGE_MODE_CLASS): Likewise. + * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr) + (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New + constraints. + (Dn, Dl, Dr): Accept const as well as const_vector. + (Dz): Likewise. Compare against CONST0_RTX. + * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead + of "vector" where appropriate. + (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD) + (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators. + (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT) + (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE) + (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS) + (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs. + (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV) + (v_int_equiv): Extend to SVE modes. + (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New + mode attributes. + (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators. + (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt. + (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs. + (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP) + (SVE_COND_FP_CMP): New int iterators. + (perm_hilo): Handle the new unpack unspecs. + (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int + attributes. + * config/aarch64/predicates.md (aarch64_sve_cnt_immediate) + (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate) + (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand) + (aarch64_equality_operator, aarch64_constant_vector_operand) + (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates. + (aarch64_sve_nonimmediate_operand): Likewise. + (aarch64_sve_general_operand): Likewise. + (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise. + (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate) + (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise. + (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise. + (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise. + (aarch64_sve_float_arith_immediate): Likewise. + (aarch64_sve_float_arith_with_sub_immediate): Likewise. + (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise. + (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise. + (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise. + (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise. + (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise. + (aarch64_sve_float_arith_operand): Likewise. + (aarch64_sve_float_arith_with_sub_operand): Likewise. + (aarch64_sve_float_mul_operand): Likewise. + (aarch64_sve_vec_perm_operand): Likewise. + (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate. + (aarch64_mov_operand): Accept const_poly_int and const_vector. + (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const + as well as const_vector. + (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier + in file. Use CONST0_RTX and CONSTM1_RTX. + (aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes. + (aarch64_simd_reg_or_zero): Accept const as well as const_vector. + Use aarch64_simd_imm_zero. + * config/aarch64/aarch64-sve.md: New file. + * config/aarch64/aarch64.md: Include it. + (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers. + (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE) + (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI) + (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK) + (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants. + (sve): New attribute. + (enabled): Disable instructions with the sve attribute unless + TARGET_SVE. + (movqi, movhi): Pass CONST_POLY_INT operaneds through + aarch64_expand_mov_immediate. + (*mov_aarch64, *movsi_aarch64, *movdi_aarch64): Handle + CNT[BHSD] immediates. + (movti): Split CONST_POLY_INT moves into two halves. + (add3): Accept aarch64_pluslong_or_poly_operand. + Split additions that need a temporary here if the destination + is the stack pointer. + (*add3_aarch64): Handle ADDVL and ADDPL immediates. + (*add3_poly_1): New instruction. + (set_clobber_cc): New expander. + +2018-01-13 Richard Sandiford * simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes parameter and use it instead of GET_MODE_SIZE (innermode). Use diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index 172c30fb520..40c738c7c3b 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -136,6 +136,15 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile); aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile); + aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile); + cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS"); + if (TARGET_SVE) + { + int bits; + if (!BITS_PER_SVE_VECTOR.is_constant (&bits)) + bits = 0; + builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits); + } aarch64_def_or_undef (TARGET_AES, "__ARM_FEATURE_AES", pfile); aarch64_def_or_undef (TARGET_SHA2, "__ARM_FEATURE_SHA2", pfile); diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index de40f72d666..4e9da29d321 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -30,6 +30,22 @@ FLOAT_MODE (HF, 2, 0); ADJUST_FLOAT_FORMAT (HF, &ieee_half_format); /* Vector modes. */ + +VECTOR_BOOL_MODE (VNx16BI, 16, 2); +VECTOR_BOOL_MODE (VNx8BI, 8, 2); +VECTOR_BOOL_MODE (VNx4BI, 4, 2); +VECTOR_BOOL_MODE (VNx2BI, 2, 2); + +ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8); +ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4); +ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2); +ADJUST_NUNITS (VNx2BI, aarch64_sve_vg); + +ADJUST_ALIGNMENT (VNx16BI, 2); +ADJUST_ALIGNMENT (VNx8BI, 2); +ADJUST_ALIGNMENT (VNx4BI, 2); +ADJUST_ALIGNMENT (VNx2BI, 2); + VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI. */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */ VECTOR_MODES (FLOAT, 8); /* V2SF. */ @@ -45,9 +61,43 @@ INT_MODE (OI, 32); INT_MODE (CI, 48); INT_MODE (XI, 64); +/* Define SVE modes for NVECS vectors. VB, VH, VS and VD are the prefixes + for 8-bit, 16-bit, 32-bit and 64-bit elements respectively. It isn't + strictly necessary to set the alignment here, since the default would + be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer. */ +#define SVE_MODES(NVECS, VB, VH, VS, VD) \ + VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS); \ + VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS); \ + \ + ADJUST_NUNITS (VB##QI, aarch64_sve_vg * NVECS * 8); \ + ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \ + ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \ + ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \ + ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \ + ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \ + ADJUST_NUNITS (VD##DF, aarch64_sve_vg * NVECS); \ + \ + ADJUST_ALIGNMENT (VB##QI, 16); \ + ADJUST_ALIGNMENT (VH##HI, 16); \ + ADJUST_ALIGNMENT (VS##SI, 16); \ + ADJUST_ALIGNMENT (VD##DI, 16); \ + ADJUST_ALIGNMENT (VH##HF, 16); \ + ADJUST_ALIGNMENT (VS##SF, 16); \ + ADJUST_ALIGNMENT (VD##DF, 16); + +/* Give SVE vectors the names normally used for 256-bit vectors. + The actual number depends on command-line flags. */ +SVE_MODES (1, VNx16, VNx8, VNx4, VNx2) + /* Quad float: 128-bit floating mode for long doubles. */ FLOAT_MODE (TF, 16, ieee_quad_format); +/* A 4-tuple of SVE vectors with the maximum -msve-vector-bits= setting. + Note that this is a limit only on the compile-time sizes of modes; + it is not a limit on the runtime sizes, since VL-agnostic code + must work with arbitary vector lengths. */ +#define MAX_BITSIZE_MODE_ANY_MODE (2048 * 4) + /* Coefficient 1 is multiplied by the number of 128-bit chunks in an SVE vector (referred to as "VQ") minus one. */ #define NUM_POLY_INT_COEFFS 2 diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 593dad9381c..5fe5e3f7ddd 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -39,16 +39,19 @@ that are required. Their order is not important. */ /* Enabling "fp" just enables "fp". - Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2", "sha3", and sm3/sm4. */ + Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2", + "sha3", sm3/sm4 and "sve". */ AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO |\ AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2 |\ - AARCH64_FL_SHA3 | AARCH64_FL_SM4, "fp") + AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE, "fp") /* Enabling "simd" also enables "fp". - Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3" and "sm3/sm4". */ + Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3", + "sm3/sm4" and "sve". */ AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, AARCH64_FL_CRYPTO |\ AARCH64_FL_DOTPROD | AARCH64_FL_AES | AARCH64_FL_SHA2 |\ - AARCH64_FL_SHA3 | AARCH64_FL_SM4, "asimd") + AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE, + "asimd") /* Enabling "crypto" also enables "fp" and "simd". Disabling "crypto" disables "crypto", "aes", "sha2", "sha3" and "sm3/sm4". */ @@ -63,8 +66,9 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32") AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics") /* Enabling "fp16" also enables "fp". - Disabling "fp16" disables "fp16" and "fp16fml". */ -AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, AARCH64_FL_F16FML, "fphp asimdhp") + Disabling "fp16" disables "fp16", "fp16fml" and "sve". */ +AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, + AARCH64_FL_F16FML | AARCH64_FL_SVE, "fphp asimdhp") /* Enabling or disabling "rcpc" only changes "rcpc". */ AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc") @@ -97,4 +101,8 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, 0, "sm3 sm4") Disabling "fp16fml" just disables "fp16fml". */ AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, AARCH64_FL_FP | AARCH64_FL_F16, 0, "asimdfml") +/* Enabling "sve" also enables "fp16", "fp" and "simd". + Disabling "sve" just disables "sve". */ +AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | AARCH64_FL_F16, 0, "sve") + #undef AARCH64_OPT_EXTENSION diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h index 19929728a31..7a5c6d7664f 100644 --- a/gcc/config/aarch64/aarch64-opts.h +++ b/gcc/config/aarch64/aarch64-opts.h @@ -81,4 +81,14 @@ enum aarch64_function_type { AARCH64_FUNCTION_ALL }; +/* SVE vector register sizes. */ +enum aarch64_sve_vector_bits_enum { + SVE_SCALABLE, + SVE_128 = 128, + SVE_256 = 256, + SVE_512 = 512, + SVE_1024 = 1024, + SVE_2048 = 2048 +}; + #endif diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8c3471bdbb8..4f1fc15d39d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -118,10 +118,17 @@ enum aarch64_symbol_type (the rules are the same for both). ADDR_QUERY_LDP_STP - Query what is valid for a load/store pair. */ + Query what is valid for a load/store pair. + + ADDR_QUERY_ANY + Query what is valid for at least one memory constraint, which may + allow things that "m" doesn't. For example, the SVE LDR and STR + addressing modes allow a wider range of immediate offsets than "m" + does. */ enum aarch64_addr_query_type { ADDR_QUERY_M, - ADDR_QUERY_LDP_STP + ADDR_QUERY_LDP_STP, + ADDR_QUERY_ANY }; /* A set of tuning parameters contains references to size and time @@ -344,6 +351,8 @@ int aarch64_branch_cost (bool, bool); enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx); bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode); bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); +bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, + HOST_WIDE_INT); bool aarch64_constant_address_p (rtx); bool aarch64_emit_approx_div (rtx, rtx, rtx); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); @@ -364,23 +373,41 @@ bool aarch64_legitimate_pic_operand_p (rtx); bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx); bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx); bool aarch64_move_imm (HOST_WIDE_INT, machine_mode); +opt_machine_mode aarch64_sve_pred_mode (unsigned int); +bool aarch64_sve_cnt_immediate_p (rtx); +bool aarch64_sve_addvl_addpl_immediate_p (rtx); +bool aarch64_sve_inc_dec_immediate_p (rtx); +int aarch64_add_offset_temporaries (rtx); +void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx); bool aarch64_mov_operand_p (rtx, machine_mode); rtx aarch64_reverse_mask (machine_mode, unsigned int); bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64); +char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx); +char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx); +char *aarch64_output_sve_inc_dec_immediate (const char *, rtx); char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode); char *aarch64_output_simd_mov_immediate (rtx, unsigned, enum simd_immediate_check w = AARCH64_CHECK_MOV); +char *aarch64_output_sve_mov_immediate (rtx); +char *aarch64_output_ptrue (machine_mode, char); bool aarch64_pad_reg_upward (machine_mode, const_tree, bool); bool aarch64_regno_ok_for_base_p (int, bool); bool aarch64_regno_ok_for_index_p (int, bool); bool aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *fail); bool aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode, bool high); -bool aarch64_simd_imm_zero_p (rtx, machine_mode); bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode); bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool); bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *, enum simd_immediate_check w = AARCH64_CHECK_MOV); +rtx aarch64_check_zero_based_sve_index_immediate (rtx); +bool aarch64_sve_index_immediate_p (rtx); +bool aarch64_sve_arith_immediate_p (rtx, bool); +bool aarch64_sve_bitmask_immediate_p (rtx); +bool aarch64_sve_dup_immediate_p (rtx); +bool aarch64_sve_cmp_immediate_p (rtx, bool); +bool aarch64_sve_float_arith_immediate_p (rtx, bool); +bool aarch64_sve_float_mul_immediate_p (rtx); bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); @@ -388,7 +415,7 @@ bool aarch64_use_return_insn_p (void); const char *aarch64_mangle_builtin_type (const_tree); const char *aarch64_output_casesi (rtx *); -enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx); +enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT); enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx); enum reg_class aarch64_regno_regclass (unsigned); int aarch64_asm_preferred_eh_data_format (int, int); @@ -403,6 +430,8 @@ const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); bool aarch64_simd_mem_operand_p (rtx); +bool aarch64_sve_ld1r_operand_p (rtx); +bool aarch64_sve_ldr_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool); rtx aarch64_tls_get_addr (void); tree aarch64_fold_builtin (tree, int, tree *, bool); @@ -414,7 +443,9 @@ const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *); const char * aarch64_output_probe_stack_range (rtx, rtx); void aarch64_err_no_fpadvsimd (machine_mode, const char *); void aarch64_expand_epilogue (bool); -void aarch64_expand_mov_immediate (rtx, rtx); +void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0); +void aarch64_emit_sve_pred_move (rtx, rtx, rtx); +void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode); void aarch64_expand_prologue (void); void aarch64_expand_vector_init (rtx, rtx); void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, @@ -467,6 +498,10 @@ void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx); void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE); + +void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx); +bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool); +void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *); #endif /* RTX_CODE */ void aarch64_init_builtins (void); @@ -485,6 +520,7 @@ tree aarch64_builtin_vectorized_function (unsigned int, tree, tree); extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int); +extern void aarch64_expand_sve_vec_perm (rtx, rtx, rtx, rtx); extern bool aarch64_madd_needs_nop (rtx_insn *); extern void aarch64_final_prescan_insn (rtx_insn *); void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *); @@ -508,4 +544,6 @@ std::string aarch64_get_extension_string_for_isa_flags (unsigned long, rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt); +poly_uint64 aarch64_regmode_natural_size (machine_mode); + #endif /* GCC_AARCH64_PROTOS_H */ diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md new file mode 100644 index 00000000000..352c3065094 --- /dev/null +++ b/gcc/config/aarch64/aarch64-sve.md @@ -0,0 +1,1922 @@ +;; Machine description for AArch64 SVE. +;; Copyright (C) 2009-2016 Free Software Foundation, Inc. +;; Contributed by ARM Ltd. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +;; Note on the handling of big-endian SVE +;; -------------------------------------- +;; +;; On big-endian systems, Advanced SIMD mov patterns act in the +;; same way as movdi or movti would: the first byte of memory goes +;; into the most significant byte of the register and the last byte +;; of memory goes into the least significant byte of the register. +;; This is the most natural ordering for Advanced SIMD and matches +;; the ABI layout for 64-bit and 128-bit vector types. +;; +;; As a result, the order of bytes within the register is what GCC +;; expects for a big-endian target, and subreg offsets therefore work +;; as expected, with the first element in memory having subreg offset 0 +;; and the last element in memory having the subreg offset associated +;; with a big-endian lowpart. However, this ordering also means that +;; GCC's lane numbering does not match the architecture's numbering: +;; GCC always treats the element at the lowest address in memory +;; (subreg offset 0) as element 0, while the architecture treats +;; the least significant end of the register as element 0. +;; +;; The situation for SVE is different. We want the layout of the +;; SVE register to be same for mov as it is for maskload: +;; logically, a mov load must be indistinguishable from a +;; maskload whose mask is all true. We therefore need the +;; register layout to match LD1 rather than LDR. The ABI layout of +;; SVE types also matches LD1 byte ordering rather than LDR byte ordering. +;; +;; As a result, the architecture lane numbering matches GCC's lane +;; numbering, with element 0 always being the first in memory. +;; However: +;; +;; - Applying a subreg offset to a register does not give the element +;; that GCC expects: the first element in memory has the subreg offset +;; associated with a big-endian lowpart while the last element in memory +;; has subreg offset 0. We handle this via TARGET_CAN_CHANGE_MODE_CLASS. +;; +;; - We cannot use LDR and STR for spill slots that might be accessed +;; via subregs, since although the elements have the order GCC expects, +;; the order of the bytes within the elements is different. We instead +;; access spill slots via LD1 and ST1, using secondary reloads to +;; reserve a predicate register. + + +;; SVE data moves. +(define_expand "mov" + [(set (match_operand:SVE_ALL 0 "nonimmediate_operand") + (match_operand:SVE_ALL 1 "general_operand"))] + "TARGET_SVE" + { + /* Use the predicated load and store patterns where possible. + This is required for big-endian targets (see the comment at the + head of the file) and increases the addressing choices for + little-endian. */ + if ((MEM_P (operands[0]) || MEM_P (operands[1])) + && can_create_pseudo_p ()) + { + aarch64_expand_sve_mem_move (operands[0], operands[1], mode); + DONE; + } + + if (CONSTANT_P (operands[1])) + { + aarch64_expand_mov_immediate (operands[0], operands[1], + gen_vec_duplicate); + DONE; + } + } +) + +;; Unpredicated moves (little-endian). Only allow memory operations +;; during and after RA; before RA we want the predicated load and +;; store patterns to be used instead. +(define_insn "*aarch64_sve_mov_le" + [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w") + (match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))] + "TARGET_SVE + && !BYTES_BIG_ENDIAN + && ((lra_in_progress || reload_completed) + || (register_operand (operands[0], mode) + && nonmemory_operand (operands[1], mode)))" + "@ + ldr\t%0, %1 + str\t%1, %0 + mov\t%0.d, %1.d + * return aarch64_output_sve_mov_immediate (operands[1]);" +) + +;; Unpredicated moves (big-endian). Memory accesses require secondary +;; reloads. +(define_insn "*aarch64_sve_mov_be" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w") + (match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))] + "TARGET_SVE && BYTES_BIG_ENDIAN" + "@ + mov\t%0.d, %1.d + * return aarch64_output_sve_mov_immediate (operands[1]);" +) + +;; Handle big-endian memory reloads. We use byte PTRUE for all modes +;; to try to encourage reuse. +(define_expand "aarch64_sve_reload_be" + [(parallel + [(set (match_operand 0) + (match_operand 1)) + (clobber (match_operand:VNx16BI 2 "register_operand" "=Upl"))])] + "TARGET_SVE && BYTES_BIG_ENDIAN" + { + /* Create a PTRUE. */ + emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode)); + + /* Refer to the PTRUE in the appropriate mode for this move. */ + machine_mode mode = GET_MODE (operands[0]); + machine_mode pred_mode + = aarch64_sve_pred_mode (GET_MODE_UNIT_SIZE (mode)).require (); + rtx pred = gen_lowpart (pred_mode, operands[2]); + + /* Emit a predicated load or store. */ + aarch64_emit_sve_pred_move (operands[0], pred, operands[1]); + DONE; + } +) + +;; A predicated load or store for which the predicate is known to be +;; all-true. Note that this pattern is generated directly by +;; aarch64_emit_sve_pred_move, so changes to this pattern will +;; need changes there as well. +(define_insn "*pred_mov" + [(set (match_operand:SVE_ALL 0 "nonimmediate_operand" "=w, m") + (unspec:SVE_ALL + [(match_operand: 1 "register_operand" "Upl, Upl") + (match_operand:SVE_ALL 2 "nonimmediate_operand" "m, w")] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE + && (register_operand (operands[0], mode) + || register_operand (operands[2], mode))" + "@ + ld1\t%0., %1/z, %2 + st1\t%2., %1, %0" +) + +(define_expand "movmisalign" + [(set (match_operand:SVE_ALL 0 "nonimmediate_operand") + (match_operand:SVE_ALL 1 "general_operand"))] + "TARGET_SVE" + { + /* Equivalent to a normal move for our purpooses. */ + emit_move_insn (operands[0], operands[1]); + DONE; + } +) + +(define_insn "maskload" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL + [(match_operand: 2 "register_operand" "Upl") + (match_operand:SVE_ALL 1 "memory_operand" "m")] + UNSPEC_LD1_SVE))] + "TARGET_SVE" + "ld1\t%0., %2/z, %1" +) + +(define_insn "maskstore" + [(set (match_operand:SVE_ALL 0 "memory_operand" "+m") + (unspec:SVE_ALL [(match_operand: 2 "register_operand" "Upl") + (match_operand:SVE_ALL 1 "register_operand" "w") + (match_dup 0)] + UNSPEC_ST1_SVE))] + "TARGET_SVE" + "st1\t%1., %2, %0" +) + +(define_expand "mov" + [(set (match_operand:PRED_ALL 0 "nonimmediate_operand") + (match_operand:PRED_ALL 1 "general_operand"))] + "TARGET_SVE" + { + if (GET_CODE (operands[0]) == MEM) + operands[1] = force_reg (mode, operands[1]); + } +) + +(define_insn "*aarch64_sve_mov" + [(set (match_operand:PRED_ALL 0 "nonimmediate_operand" "=Upa, m, Upa, Upa, Upa") + (match_operand:PRED_ALL 1 "general_operand" "Upa, Upa, m, Dz, Dm"))] + "TARGET_SVE + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "@ + mov\t%0.b, %1.b + str\t%1, %0 + ldr\t%0, %1 + pfalse\t%0.b + * return aarch64_output_ptrue (mode, '');" +) + +;; Handle extractions from a predicate by converting to an integer vector +;; and extracting from there. +(define_expand "vec_extract" + [(match_operand: 0 "register_operand") + (match_operand: 1 "register_operand") + (match_operand:SI 2 "nonmemory_operand") + ;; Dummy operand to which we can attach the iterator. + (reg:SVE_I V0_REGNUM)] + "TARGET_SVE" + { + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_aarch64_sve_dup_const (tmp, operands[1], + CONST1_RTX (mode), + CONST0_RTX (mode))); + emit_insn (gen_vec_extract (operands[0], tmp, operands[2])); + DONE; + } +) + +(define_expand "vec_extract" + [(set (match_operand: 0 "register_operand") + (vec_select: + (match_operand:SVE_ALL 1 "register_operand") + (parallel [(match_operand:SI 2 "nonmemory_operand")])))] + "TARGET_SVE" + { + poly_int64 val; + if (poly_int_rtx_p (operands[2], &val) + && known_eq (val, GET_MODE_NUNITS (mode) - 1)) + { + /* The last element can be extracted with a LASTB and a false + predicate. */ + rtx sel = force_reg (mode, CONST0_RTX (mode)); + emit_insn (gen_aarch64_sve_lastb (operands[0], sel, + operands[1])); + DONE; + } + if (!CONST_INT_P (operands[2])) + { + /* Create an index with operand[2] as the base and -1 as the step. + It will then be zero for the element we care about. */ + rtx index = gen_lowpart (mode, operands[2]); + index = force_reg (mode, index); + rtx series = gen_reg_rtx (mode); + emit_insn (gen_vec_series (series, index, constm1_rtx)); + + /* Get a predicate that is true for only that element. */ + rtx zero = CONST0_RTX (mode); + rtx cmp = gen_rtx_EQ (mode, series, zero); + rtx sel = gen_reg_rtx (mode); + emit_insn (gen_vec_cmp (sel, cmp, series, zero)); + + /* Select the element using LASTB. */ + emit_insn (gen_aarch64_sve_lastb (operands[0], sel, + operands[1])); + DONE; + } + } +) + +;; Extract an element from the Advanced SIMD portion of the register. +;; We don't just reuse the aarch64-simd.md pattern because we don't +;; want any chnage in lane number on big-endian targets. +(define_insn "*vec_extract_v128" + [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") + (vec_select: + (match_operand:SVE_ALL 1 "register_operand" "w, w, w") + (parallel [(match_operand:SI 2 "const_int_operand")])))] + "TARGET_SVE + && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 0, 15)" + { + operands[1] = gen_lowpart (mode, operands[1]); + switch (which_alternative) + { + case 0: + return "umov\\t%0, %1.[%2]"; + case 1: + return "dup\\t%0, %1.[%2]"; + case 2: + return "st1\\t{%1.}[%2], %0"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "neon_to_gp_q, neon_dup_q, neon_store1_one_lane_q")] +) + +;; Extract an element in the range of DUP. This pattern allows the +;; source and destination to be different. +(define_insn "*vec_extract_dup" + [(set (match_operand: 0 "register_operand" "=w") + (vec_select: + (match_operand:SVE_ALL 1 "register_operand" "w") + (parallel [(match_operand:SI 2 "const_int_operand")])))] + "TARGET_SVE + && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 16, 63)" + { + operands[0] = gen_rtx_REG (mode, REGNO (operands[0])); + return "dup\t%0., %1.[%2]"; + } +) + +;; Extract an element outside the range of DUP. This pattern requires the +;; source and destination to be the same. +(define_insn "*vec_extract_ext" + [(set (match_operand: 0 "register_operand" "=w") + (vec_select: + (match_operand:SVE_ALL 1 "register_operand" "0") + (parallel [(match_operand:SI 2 "const_int_operand")])))] + "TARGET_SVE && INTVAL (operands[2]) * GET_MODE_SIZE (mode) >= 64" + { + operands[0] = gen_rtx_REG (mode, REGNO (operands[0])); + operands[2] = GEN_INT (INTVAL (operands[2]) * GET_MODE_SIZE (mode)); + return "ext\t%0.b, %0.b, %0.b, #%2"; + } +) + +;; Extract the last active element of operand 1 into operand 0. +;; If no elements are active, extract the last inactive element instead. +(define_insn "aarch64_sve_lastb" + [(set (match_operand: 0 "register_operand" "=r, w") + (unspec: + [(match_operand: 1 "register_operand" "Upl, Upl") + (match_operand:SVE_ALL 2 "register_operand" "w, w")] + UNSPEC_LASTB))] + "TARGET_SVE" + "@ + lastb\t%0, %1, %2. + lastb\t%0, %1, %2." +) + +(define_expand "vec_duplicate" + [(parallel + [(set (match_operand:SVE_ALL 0 "register_operand") + (vec_duplicate:SVE_ALL + (match_operand: 1 "aarch64_sve_dup_operand"))) + (clobber (scratch:))])] + "TARGET_SVE" + { + if (MEM_P (operands[1])) + { + rtx ptrue = force_reg (mode, CONSTM1_RTX (mode)); + emit_insn (gen_sve_ld1r (operands[0], ptrue, operands[1], + CONST0_RTX (mode))); + DONE; + } + } +) + +;; Accept memory operands for the benefit of combine, and also in case +;; the scalar input gets spilled to memory during RA. We want to split +;; the load at the first opportunity in order to allow the PTRUE to be +;; optimized with surrounding code. +(define_insn_and_split "*vec_duplicate_reg" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w, w") + (vec_duplicate:SVE_ALL + (match_operand: 1 "aarch64_sve_dup_operand" "r, w, Uty"))) + (clobber (match_scratch: 2 "=X, X, Upl"))] + "TARGET_SVE" + "@ + mov\t%0., %1 + mov\t%0., %1 + #" + "&& MEM_P (operands[1])" + [(const_int 0)] + { + if (GET_CODE (operands[2]) == SCRATCH) + operands[2] = gen_reg_rtx (mode); + emit_move_insn (operands[2], CONSTM1_RTX (mode)); + emit_insn (gen_sve_ld1r (operands[0], operands[2], operands[1], + CONST0_RTX (mode))); + DONE; + } + [(set_attr "length" "4,4,8")] +) + +;; This is used for vec_duplicates from memory, but can also +;; be used by combine to optimize selects of a a vec_duplicate +;; with zero. +(define_insn "sve_ld1r" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL + [(match_operand: 1 "register_operand" "Upl") + (vec_duplicate:SVE_ALL + (match_operand: 2 "aarch64_sve_ld1r_operand" "Uty")) + (match_operand:SVE_ALL 3 "aarch64_simd_imm_zero")] + UNSPEC_SEL))] + "TARGET_SVE" + "ld1r\t%0., %1/z, %2" +) + +;; Load 128 bits from memory and duplicate to fill a vector. Since there +;; are so few operations on 128-bit "elements", we don't define a VNx1TI +;; and simply use vectors of bytes instead. +(define_insn "sve_ld1rq" + [(set (match_operand:VNx16QI 0 "register_operand" "=w") + (unspec:VNx16QI + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")] + UNSPEC_LD1RQ))] + "TARGET_SVE" + "ld1rqb\t%0.b, %1/z, %2" +) + +;; Implement a predicate broadcast by shifting the low bit of the scalar +;; input into the top bit and using a WHILELO. An alternative would be to +;; duplicate the input and do a compare with zero. +(define_expand "vec_duplicate" + [(set (match_operand:PRED_ALL 0 "register_operand") + (vec_duplicate:PRED_ALL (match_operand 1 "register_operand")))] + "TARGET_SVE" + { + rtx tmp = gen_reg_rtx (DImode); + rtx op1 = gen_lowpart (DImode, operands[1]); + emit_insn (gen_ashldi3 (tmp, op1, gen_int_mode (63, DImode))); + emit_insn (gen_while_ultdi (operands[0], const0_rtx, tmp)); + DONE; + } +) + +(define_insn "vec_series" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w") + (vec_series:SVE_I + (match_operand: 1 "aarch64_sve_index_operand" "Usi, r, r") + (match_operand: 2 "aarch64_sve_index_operand" "r, Usi, r")))] + "TARGET_SVE" + "@ + index\t%0., #%1, %2 + index\t%0., %1, #%2 + index\t%0., %1, %2" +) + +;; Optimize {x, x, x, x, ...} + {0, n, 2*n, 3*n, ...} if n is in range +;; of an INDEX instruction. +(define_insn "*vec_series_plus" + [(set (match_operand:SVE_I 0 "register_operand" "=w") + (plus:SVE_I + (vec_duplicate:SVE_I + (match_operand: 1 "register_operand" "r")) + (match_operand:SVE_I 2 "immediate_operand")))] + "TARGET_SVE && aarch64_check_zero_based_sve_index_immediate (operands[2])" + { + operands[2] = aarch64_check_zero_based_sve_index_immediate (operands[2]); + return "index\t%0., %1, #%2"; + } +) + +(define_expand "vec_perm" + [(match_operand:SVE_ALL 0 "register_operand") + (match_operand:SVE_ALL 1 "register_operand") + (match_operand:SVE_ALL 2 "register_operand") + (match_operand: 3 "aarch64_sve_vec_perm_operand")] + "TARGET_SVE && GET_MODE_NUNITS (mode).is_constant ()" + { + aarch64_expand_sve_vec_perm (operands[0], operands[1], + operands[2], operands[3]); + DONE; + } +) + +(define_insn "*aarch64_sve_tbl" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL + [(match_operand:SVE_ALL 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w")] + UNSPEC_TBL))] + "TARGET_SVE" + "tbl\t%0., %1., %2." +) + +(define_insn "*aarch64_sve_" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (unspec:PRED_ALL [(match_operand:PRED_ALL 1 "register_operand" "Upa") + (match_operand:PRED_ALL 2 "register_operand" "Upa")] + PERMUTE))] + "TARGET_SVE" + "\t%0., %1., %2." +) + +(define_insn "*aarch64_sve_" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w") + (match_operand:SVE_ALL 2 "register_operand" "w")] + PERMUTE))] + "TARGET_SVE" + "\t%0., %1., %2." +) + +(define_insn "*aarch64_sve_rev64" + [(set (match_operand:SVE_BHS 0 "register_operand" "=w") + (unspec:SVE_BHS + [(match_operand:VNx2BI 1 "register_operand" "Upl") + (unspec:SVE_BHS [(match_operand:SVE_BHS 2 "register_operand" "w")] + UNSPEC_REV64)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "rev\t%0.d, %1/m, %2.d" +) + +(define_insn "*aarch64_sve_rev32" + [(set (match_operand:SVE_BH 0 "register_operand" "=w") + (unspec:SVE_BH + [(match_operand:VNx4BI 1 "register_operand" "Upl") + (unspec:SVE_BH [(match_operand:SVE_BH 2 "register_operand" "w")] + UNSPEC_REV32)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "rev\t%0.s, %1/m, %2.s" +) + +(define_insn "*aarch64_sve_rev16vnx16qi" + [(set (match_operand:VNx16QI 0 "register_operand" "=w") + (unspec:VNx16QI + [(match_operand:VNx8BI 1 "register_operand" "Upl") + (unspec:VNx16QI [(match_operand:VNx16QI 2 "register_operand" "w")] + UNSPEC_REV16)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "revb\t%0.h, %1/m, %2.h" +) + +(define_insn "*aarch64_sve_rev" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")] + UNSPEC_REV))] + "TARGET_SVE" + "rev\t%0., %1.") + +(define_insn "*aarch64_sve_dup_lane" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (vec_duplicate:SVE_ALL + (vec_select: + (match_operand:SVE_ALL 1 "register_operand" "w") + (parallel [(match_operand:SI 2 "const_int_operand")]))))] + "TARGET_SVE + && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 0, 63)" + "dup\t%0., %1.[%2]" +) + +;; Note that the immediate (third) operand is the lane index not +;; the byte index. +(define_insn "*aarch64_sve_ext" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "0") + (match_operand:SVE_ALL 2 "register_operand" "w") + (match_operand:SI 3 "const_int_operand")] + UNSPEC_EXT))] + "TARGET_SVE + && IN_RANGE (INTVAL (operands[3]) * GET_MODE_SIZE (mode), 0, 255)" + { + operands[3] = GEN_INT (INTVAL (operands[3]) * GET_MODE_SIZE (mode)); + return "ext\\t%0.b, %0.b, %2.b, #%3"; + } +) + +(define_insn "add3" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, w") + (plus:SVE_I + (match_operand:SVE_I 1 "register_operand" "%0, 0, 0, w") + (match_operand:SVE_I 2 "aarch64_sve_add_operand" "vsa, vsn, vsi, w")))] + "TARGET_SVE" + "@ + add\t%0., %0., #%D2 + sub\t%0., %0., #%N2 + * return aarch64_output_sve_inc_dec_immediate (\"%0.\", operands[2]); + add\t%0., %1., %2." +) + +(define_insn "sub3" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w") + (minus:SVE_I + (match_operand:SVE_I 1 "aarch64_sve_arith_operand" "w, vsa") + (match_operand:SVE_I 2 "register_operand" "w, 0")))] + "TARGET_SVE" + "@ + sub\t%0., %1., %2. + subr\t%0., %0., #%D1" +) + +;; Unpredicated multiplication. +(define_expand "mul3" + [(set (match_operand:SVE_I 0 "register_operand") + (unspec:SVE_I + [(match_dup 3) + (mult:SVE_I + (match_operand:SVE_I 1 "register_operand") + (match_operand:SVE_I 2 "aarch64_sve_mul_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Multiplication predicated with a PTRUE. We don't actually need the +;; predicate for the first alternative, but using Upa or X isn't likely +;; to gain much and would make the instruction seem less uniform to the +;; register allocator. +(define_insn "*mul3" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + (mult:SVE_I + (match_operand:SVE_I 2 "register_operand" "%0, 0") + (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + mul\t%0., %0., #%3 + mul\t%0., %1/m, %0., %3." +) + +(define_insn "*madd" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w") + (plus:SVE_I + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w") + (match_operand:SVE_I 3 "register_operand" "w, w"))] + UNSPEC_MERGE_PTRUE) + (match_operand:SVE_I 4 "register_operand" "w, 0")))] + "TARGET_SVE" + "@ + mad\t%0., %1/m, %3., %4. + mla\t%0., %1/m, %2., %3." +) + +(define_insn "*msub3" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w") + (minus:SVE_I + (match_operand:SVE_I 4 "register_operand" "w, 0") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w") + (match_operand:SVE_I 3 "register_operand" "w, w"))] + UNSPEC_MERGE_PTRUE)))] + "TARGET_SVE" + "@ + msb\t%0., %1/m, %3., %4. + mls\t%0., %1/m, %2., %3." +) + +;; Unpredicated NEG, NOT and POPCOUNT. +(define_expand "2" + [(set (match_operand:SVE_I 0 "register_operand") + (unspec:SVE_I + [(match_dup 2) + (SVE_INT_UNARY:SVE_I (match_operand:SVE_I 1 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; NEG, NOT and POPCOUNT predicated with a PTRUE. +(define_insn "*2" + [(set (match_operand:SVE_I 0 "register_operand" "=w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl") + (SVE_INT_UNARY:SVE_I + (match_operand:SVE_I 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "\t%0., %1/m, %2." +) + +;; Vector AND, ORR and XOR. +(define_insn "3" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w") + (LOGICAL:SVE_I + (match_operand:SVE_I 1 "register_operand" "%0, w") + (match_operand:SVE_I 2 "aarch64_sve_logical_operand" "vsl, w")))] + "TARGET_SVE" + "@ + \t%0., %0., #%C2 + \t%0.d, %1.d, %2.d" +) + +;; Vector AND, ORR and XOR on floating-point modes. We avoid subregs +;; by providing this, but we need to use UNSPECs since rtx logical ops +;; aren't defined for floating-point modes. +(define_insn "*3" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand" "w") + (match_operand:SVE_F 2 "register_operand" "w")] + LOGICALF))] + "TARGET_SVE" + "\t%0.d, %1.d, %2.d" +) + +;; REG_EQUAL notes on "not3" should ensure that we can generate +;; this pattern even though the NOT instruction itself is predicated. +(define_insn "bic3" + [(set (match_operand:SVE_I 0 "register_operand" "=w") + (and:SVE_I + (not:SVE_I (match_operand:SVE_I 1 "register_operand" "w")) + (match_operand:SVE_I 2 "register_operand" "w")))] + "TARGET_SVE" + "bic\t%0.d, %2.d, %1.d" +) + +;; Predicate AND. We can reuse one of the inputs as the GP. +(define_insn "and3" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa") + (match_operand:PRED_ALL 2 "register_operand" "Upa")))] + "TARGET_SVE" + "and\t%0.b, %1/z, %1.b, %2.b" +) + +;; Unpredicated predicate ORR and XOR. +(define_expand "3" + [(set (match_operand:PRED_ALL 0 "register_operand") + (and:PRED_ALL + (LOGICAL_OR:PRED_ALL + (match_operand:PRED_ALL 1 "register_operand") + (match_operand:PRED_ALL 2 "register_operand")) + (match_dup 3)))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated predicate ORR and XOR. +(define_insn "pred_3" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (and:PRED_ALL + (LOGICAL:PRED_ALL + (match_operand:PRED_ALL 2 "register_operand" "Upa") + (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + "TARGET_SVE" + "\t%0.b, %1/z, %2.b, %3.b" +) + +;; Perform a logical operation on operands 2 and 3, using operand 1 as +;; the GP (which is known to be a PTRUE). Store the result in operand 0 +;; and set the flags in the same way as for PTEST. The (and ...) in the +;; UNSPEC_PTEST_PTRUE is logically redundant, but means that the tested +;; value is structurally equivalent to rhs of the second set. +(define_insn "*3_cc" + [(set (reg:CC CC_REGNUM) + (compare:CC + (unspec:SI [(match_operand:PRED_ALL 1 "register_operand" "Upa") + (and:PRED_ALL + (LOGICAL:PRED_ALL + (match_operand:PRED_ALL 2 "register_operand" "Upa") + (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_dup 1))] + UNSPEC_PTEST_PTRUE) + (const_int 0))) + (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3)) + (match_dup 1)))] + "TARGET_SVE" + "s\t%0.b, %1/z, %2.b, %3.b" +) + +;; Unpredicated predicate inverse. +(define_expand "one_cmpl2" + [(set (match_operand:PRED_ALL 0 "register_operand") + (and:PRED_ALL + (not:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")) + (match_dup 2)))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated predicate inverse. +(define_insn "*one_cmpl3" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (and:PRED_ALL + (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + "TARGET_SVE" + "not\t%0.b, %1/z, %2.b" +) + +;; Predicated predicate BIC and ORN. +(define_insn "*3" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (and:PRED_ALL + (NLOGICAL:PRED_ALL + (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + "TARGET_SVE" + "\t%0.b, %1/z, %3.b, %2.b" +) + +;; Predicated predicate NAND and NOR. +(define_insn "*3" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (and:PRED_ALL + (NLOGICAL:PRED_ALL + (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa"))) + (match_operand:PRED_ALL 1 "register_operand" "Upa")))] + "TARGET_SVE" + "\t%0.b, %1/z, %2.b, %3.b" +) + +;; Unpredicated LSL, LSR and ASR by a vector. +(define_expand "v3" + [(set (match_operand:SVE_I 0 "register_operand") + (unspec:SVE_I + [(match_dup 3) + (ASHIFT:SVE_I + (match_operand:SVE_I 1 "register_operand") + (match_operand:SVE_I 2 "aarch64_sve_shift_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; LSL, LSR and ASR by a vector, predicated with a PTRUE. We don't +;; actually need the predicate for the first alternative, but using Upa +;; or X isn't likely to gain much and would make the instruction seem +;; less uniform to the register allocator. +(define_insn "*v3" + [(set (match_operand:SVE_I 0 "register_operand" "=w, w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + (ASHIFT:SVE_I + (match_operand:SVE_I 2 "register_operand" "w, 0") + (match_operand:SVE_I 3 "aarch64_sve_shift_operand" "D, w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + \t%0., %2., #%3 + \t%0., %1/m, %0., %3." +) + +;; LSL, LSR and ASR by a scalar, which expands into one of the vector +;; shifts above. +(define_expand "3" + [(set (match_operand:SVE_I 0 "register_operand") + (ASHIFT:SVE_I (match_operand:SVE_I 1 "register_operand") + (match_operand: 2 "general_operand")))] + "TARGET_SVE" + { + rtx amount; + if (CONST_INT_P (operands[2])) + { + amount = gen_const_vec_duplicate (mode, operands[2]); + if (!aarch64_sve_shift_operand (operands[2], mode)) + amount = force_reg (mode, amount); + } + else + { + amount = gen_reg_rtx (mode); + emit_insn (gen_vec_duplicate (amount, + convert_to_mode (mode, + operands[2], 0))); + } + emit_insn (gen_v3 (operands[0], operands[1], amount)); + DONE; + } +) + +;; Test all bits of operand 1. Operand 0 is a GP that is known to hold PTRUE. +;; +;; Using UNSPEC_PTEST_PTRUE allows combine patterns to assume that the GP +;; is a PTRUE even if the optimizers haven't yet been able to propagate +;; the constant. We would use a separate unspec code for PTESTs involving +;; GPs that might not be PTRUEs. +(define_insn "ptest_ptrue" + [(set (reg:CC CC_REGNUM) + (compare:CC + (unspec:SI [(match_operand:PRED_ALL 0 "register_operand" "Upa") + (match_operand:PRED_ALL 1 "register_operand" "Upa")] + UNSPEC_PTEST_PTRUE) + (const_int 0)))] + "TARGET_SVE" + "ptest\t%0, %1.b" +) + +;; Set element I of the result if operand1 + J < operand2 for all J in [0, I]. +;; with the comparison being unsigned. +(define_insn "while_ult" + [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (unspec:PRED_ALL [(match_operand:GPI 1 "aarch64_reg_or_zero" "rZ") + (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")] + UNSPEC_WHILE_LO)) + (clobber (reg:CC CC_REGNUM))] + "TARGET_SVE" + "whilelo\t%0., %1, %2" +) + +;; WHILELO sets the flags in the same way as a PTEST with a PTRUE GP. +;; Handle the case in which both results are useful. The GP operand +;; to the PTEST isn't needed, so we allow it to be anything. +(define_insn_and_split "while_ult_cc" + [(set (reg:CC CC_REGNUM) + (compare:CC + (unspec:SI [(match_operand:PRED_ALL 1) + (unspec:PRED_ALL + [(match_operand:GPI 2 "aarch64_reg_or_zero" "rZ") + (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")] + UNSPEC_WHILE_LO)] + UNSPEC_PTEST_PTRUE) + (const_int 0))) + (set (match_operand:PRED_ALL 0 "register_operand" "=Upa") + (unspec:PRED_ALL [(match_dup 2) + (match_dup 3)] + UNSPEC_WHILE_LO))] + "TARGET_SVE" + "whilelo\t%0., %2, %3" + ;; Force the compiler to drop the unused predicate operand, so that we + ;; don't have an unnecessary PTRUE. + "&& !CONSTANT_P (operands[1])" + [(const_int 0)] + { + emit_insn (gen_while_ult_cc + (operands[0], CONSTM1_RTX (mode), + operands[2], operands[3])); + DONE; + } +) + +;; Predicated integer comparison. +(define_insn "*vec_cmp_" + [(set (match_operand: 0 "register_operand" "=Upa, Upa") + (unspec: + [(match_operand: 1 "register_operand" "Upl, Upl") + (match_operand:SVE_I 2 "register_operand" "w, w") + (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" ", w")] + SVE_COND_INT_CMP)) + (clobber (reg:CC CC_REGNUM))] + "TARGET_SVE" + "@ + cmp\t%0., %1/z, %2., #%3 + cmp\t%0., %1/z, %2., %3." +) + +;; Predicated integer comparison in which only the flags result is interesting. +(define_insn "*vec_cmp__ptest" + [(set (reg:CC CC_REGNUM) + (compare:CC + (unspec:SI + [(match_operand: 1 "register_operand" "Upl, Upl") + (unspec: + [(match_dup 1) + (match_operand:SVE_I 2 "register_operand" "w, w") + (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" ", w")] + SVE_COND_INT_CMP)] + UNSPEC_PTEST_PTRUE) + (const_int 0))) + (clobber (match_scratch: 0 "=Upa, Upa"))] + "TARGET_SVE" + "@ + cmp\t%0., %1/z, %2., #%3 + cmp\t%0., %1/z, %2., %3." +) + +;; Predicated comparison in which both the flag and predicate results +;; are interesting. +(define_insn "*vec_cmp__cc" + [(set (reg:CC CC_REGNUM) + (compare:CC + (unspec:SI + [(match_operand: 1 "register_operand" "Upl, Upl") + (unspec: + [(match_dup 1) + (match_operand:SVE_I 2 "register_operand" "w, w") + (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" ", w")] + SVE_COND_INT_CMP)] + UNSPEC_PTEST_PTRUE) + (const_int 0))) + (set (match_operand: 0 "register_operand" "=Upa, Upa") + (unspec: + [(match_dup 1) + (match_dup 2) + (match_dup 3)] + SVE_COND_INT_CMP))] + "TARGET_SVE" + "@ + cmp\t%0., %1/z, %2., #%3 + cmp\t%0., %1/z, %2., %3." +) + +;; Predicated floating-point comparison (excluding FCMUO, which doesn't +;; allow #0.0 as an operand). +(define_insn "*vec_fcm" + [(set (match_operand: 0 "register_operand" "=Upa, Upa") + (unspec: + [(match_operand: 1 "register_operand" "Upl, Upl") + (match_operand:SVE_F 2 "register_operand" "w, w") + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w")] + SVE_COND_FP_CMP))] + "TARGET_SVE" + "@ + fcm\t%0., %1/z, %2., #0.0 + fcm\t%0., %1/z, %2., %3." +) + +;; Predicated FCMUO. +(define_insn "*vec_fcmuo" + [(set (match_operand: 0 "register_operand" "=Upa") + (unspec: + [(match_operand: 1 "register_operand" "Upl") + (match_operand:SVE_F 2 "register_operand" "w") + (match_operand:SVE_F 3 "register_operand" "w")] + UNSPEC_COND_UO))] + "TARGET_SVE" + "fcmuo\t%0., %1/z, %2., %3." +) + +;; vcond_mask operand order: true, false, mask +;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR) +;; SEL operand order: mask, true, false +(define_insn "vcond_mask_" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w") + (unspec:SVE_ALL + [(match_operand: 3 "register_operand" "Upa") + (match_operand:SVE_ALL 1 "register_operand" "w") + (match_operand:SVE_ALL 2 "register_operand" "w")] + UNSPEC_SEL))] + "TARGET_SVE" + "sel\t%0., %3, %1., %2." +) + +;; Selects between a duplicated immediate and zero. +(define_insn "aarch64_sve_dup_const" + [(set (match_operand:SVE_I 0 "register_operand" "=w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl") + (match_operand:SVE_I 2 "aarch64_sve_dup_immediate") + (match_operand:SVE_I 3 "aarch64_simd_imm_zero")] + UNSPEC_SEL))] + "TARGET_SVE" + "mov\t%0., %1/z, #%2" +) + +;; Integer (signed) vcond. Don't enforce an immediate range here, since it +;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead. +(define_expand "vcond" + [(set (match_operand:SVE_ALL 0 "register_operand") + (if_then_else:SVE_ALL + (match_operator 3 "comparison_operator" + [(match_operand: 4 "register_operand") + (match_operand: 5 "nonmemory_operand")]) + (match_operand:SVE_ALL 1 "register_operand") + (match_operand:SVE_ALL 2 "register_operand")))] + "TARGET_SVE" + { + aarch64_expand_sve_vcond (mode, mode, operands); + DONE; + } +) + +;; Integer vcondu. Don't enforce an immediate range here, since it +;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead. +(define_expand "vcondu" + [(set (match_operand:SVE_ALL 0 "register_operand") + (if_then_else:SVE_ALL + (match_operator 3 "comparison_operator" + [(match_operand: 4 "register_operand") + (match_operand: 5 "nonmemory_operand")]) + (match_operand:SVE_ALL 1 "register_operand") + (match_operand:SVE_ALL 2 "register_operand")))] + "TARGET_SVE" + { + aarch64_expand_sve_vcond (mode, mode, operands); + DONE; + } +) + +;; Floating-point vcond. All comparisons except FCMUO allow a zero +;; operand; aarch64_expand_sve_vcond handles the case of an FCMUO +;; with zero. +(define_expand "vcond" + [(set (match_operand:SVE_SD 0 "register_operand") + (if_then_else:SVE_SD + (match_operator 3 "comparison_operator" + [(match_operand: 4 "register_operand") + (match_operand: 5 "aarch64_simd_reg_or_zero")]) + (match_operand:SVE_SD 1 "register_operand") + (match_operand:SVE_SD 2 "register_operand")))] + "TARGET_SVE" + { + aarch64_expand_sve_vcond (mode, mode, operands); + DONE; + } +) + +;; Signed integer comparisons. Don't enforce an immediate range here, since +;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int +;; instead. +(define_expand "vec_cmp" + [(parallel + [(set (match_operand: 0 "register_operand") + (match_operator: 1 "comparison_operator" + [(match_operand:SVE_I 2 "register_operand") + (match_operand:SVE_I 3 "nonmemory_operand")])) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_SVE" + { + aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]), + operands[2], operands[3]); + DONE; + } +) + +;; Unsigned integer comparisons. Don't enforce an immediate range here, since +;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int +;; instead. +(define_expand "vec_cmpu" + [(parallel + [(set (match_operand: 0 "register_operand") + (match_operator: 1 "comparison_operator" + [(match_operand:SVE_I 2 "register_operand") + (match_operand:SVE_I 3 "nonmemory_operand")])) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_SVE" + { + aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]), + operands[2], operands[3]); + DONE; + } +) + +;; Floating-point comparisons. All comparisons except FCMUO allow a zero +;; operand; aarch64_expand_sve_vec_cmp_float handles the case of an FCMUO +;; with zero. +(define_expand "vec_cmp" + [(set (match_operand: 0 "register_operand") + (match_operator: 1 "comparison_operator" + [(match_operand:SVE_F 2 "register_operand") + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero")]))] + "TARGET_SVE" + { + aarch64_expand_sve_vec_cmp_float (operands[0], GET_CODE (operands[1]), + operands[2], operands[3], false); + DONE; + } +) + +;; Branch based on predicate equality or inequality. +(define_expand "cbranch4" + [(set (pc) + (if_then_else + (match_operator 0 "aarch64_equality_operator" + [(match_operand:PRED_ALL 1 "register_operand") + (match_operand:PRED_ALL 2 "aarch64_simd_reg_or_zero")]) + (label_ref (match_operand 3 "")) + (pc)))] + "" + { + rtx ptrue = force_reg (mode, CONSTM1_RTX (mode)); + rtx pred; + if (operands[2] == CONST0_RTX (mode)) + pred = operands[1]; + else + { + pred = gen_reg_rtx (mode); + emit_insn (gen_pred_xor3 (pred, ptrue, operands[1], + operands[2])); + } + emit_insn (gen_ptest_ptrue (ptrue, pred)); + operands[1] = gen_rtx_REG (CCmode, CC_REGNUM); + operands[2] = const0_rtx; + } +) + +;; Unpredicated integer MIN/MAX. +(define_expand "3" + [(set (match_operand:SVE_I 0 "register_operand") + (unspec:SVE_I + [(match_dup 3) + (MAXMIN:SVE_I (match_operand:SVE_I 1 "register_operand") + (match_operand:SVE_I 2 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Integer MIN/MAX predicated with a PTRUE. +(define_insn "*3" + [(set (match_operand:SVE_I 0 "register_operand" "=w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl") + (MAXMIN:SVE_I (match_operand:SVE_I 2 "register_operand" "%0") + (match_operand:SVE_I 3 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "\t%0., %1/m, %0., %3." +) + +;; Unpredicated floating-point MIN/MAX. +(define_expand "3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (FMAXMIN:SVE_F (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Floating-point MIN/MAX predicated with a PTRUE. +(define_insn "*3" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl") + (FMAXMIN:SVE_F (match_operand:SVE_F 2 "register_operand" "%0") + (match_operand:SVE_F 3 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "fnm\t%0., %1/m, %0., %3." +) + +;; Unpredicated fmin/fmax. +(define_expand "3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "register_operand")] + FMAXMIN_UNS)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; fmin/fmax predicated with a PTRUE. +(define_insn "*3" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl") + (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "%0") + (match_operand:SVE_F 3 "register_operand" "w")] + FMAXMIN_UNS)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "\t%0., %1/m, %0., %3." +) + +;; Unpredicated integer add reduction. +(define_expand "reduc_plus_scal_" + [(set (match_operand: 0 "register_operand") + (unspec: [(match_dup 2) + (match_operand:SVE_I 1 "register_operand")] + UNSPEC_ADDV))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated integer add reduction. The result is always 64-bits. +(define_insn "*reduc_plus_scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand: 1 "register_operand" "Upl") + (match_operand:SVE_I 2 "register_operand" "w")] + UNSPEC_ADDV))] + "TARGET_SVE" + "uaddv\t%d0, %1, %2." +) + +;; Unpredicated floating-point add reduction. +(define_expand "reduc_plus_scal_" + [(set (match_operand: 0 "register_operand") + (unspec: [(match_dup 2) + (match_operand:SVE_F 1 "register_operand")] + UNSPEC_FADDV))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated floating-point add reduction. +(define_insn "*reduc_plus_scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand: 1 "register_operand" "Upl") + (match_operand:SVE_F 2 "register_operand" "w")] + UNSPEC_FADDV))] + "TARGET_SVE" + "faddv\t%0, %1, %2." +) + +;; Unpredicated integer MIN/MAX reduction. +(define_expand "reduc__scal_" + [(set (match_operand: 0 "register_operand") + (unspec: [(match_dup 2) + (match_operand:SVE_I 1 "register_operand")] + MAXMINV))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated integer MIN/MAX reduction. +(define_insn "*reduc__scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand: 1 "register_operand" "Upl") + (match_operand:SVE_I 2 "register_operand" "w")] + MAXMINV))] + "TARGET_SVE" + "v\t%0, %1, %2." +) + +;; Unpredicated floating-point MIN/MAX reduction. +(define_expand "reduc__scal_" + [(set (match_operand: 0 "register_operand") + (unspec: [(match_dup 2) + (match_operand:SVE_F 1 "register_operand")] + FMAXMINV))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated floating-point MIN/MAX reduction. +(define_insn "*reduc__scal_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand: 1 "register_operand" "Upl") + (match_operand:SVE_F 2 "register_operand" "w")] + FMAXMINV))] + "TARGET_SVE" + "v\t%0, %1, %2." +) + +;; Unpredicated floating-point addition. +(define_expand "add3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (plus:SVE_F + (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Floating-point addition predicated with a PTRUE. +(define_insn "*add3" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + (plus:SVE_F + (match_operand:SVE_F 2 "register_operand" "%0, 0, w") + (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fadd\t%0., %1/m, %0., #%3 + fsub\t%0., %1/m, %0., #%N3 + fadd\t%0., %2., %3." +) + +;; Unpredicated floating-point subtraction. +(define_expand "sub3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (minus:SVE_F + (match_operand:SVE_F 1 "aarch64_sve_float_arith_operand") + (match_operand:SVE_F 2 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Floating-point subtraction predicated with a PTRUE. +(define_insn "*sub3" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl") + (minus:SVE_F + (match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "0, 0, vsA, w") + (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, 0, w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE + && (register_operand (operands[2], mode) + || register_operand (operands[3], mode))" + "@ + fsub\t%0., %1/m, %0., #%3 + fadd\t%0., %1/m, %0., #%N3 + fsubr\t%0., %1/m, %0., #%2 + fsub\t%0., %2., %3." +) + +;; Unpredicated floating-point multiplication. +(define_expand "mul3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (mult:SVE_F + (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "aarch64_sve_float_mul_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Floating-point multiplication predicated with a PTRUE. +(define_insn "*mul3" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (mult:SVE_F + (match_operand:SVE_F 2 "register_operand" "%0, w") + (match_operand:SVE_F 3 "aarch64_sve_float_mul_operand" "vsM, w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fmul\t%0., %1/m, %0., #%3 + fmul\t%0., %2., %3." +) + +;; Unpredicated fma (%0 = (%1 * %2) + %3). +(define_expand "fma4" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 4) + (fma:SVE_F (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "register_operand") + (match_operand:SVE_F 3 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[4] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; fma predicated with a PTRUE. +(define_insn "*fma4" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w") + (match_operand:SVE_F 4 "register_operand" "w, w") + (match_operand:SVE_F 2 "register_operand" "w, 0"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fmad\t%0., %1/m, %4., %2. + fmla\t%0., %1/m, %3., %4." +) + +;; Unpredicated fnma (%0 = (-%1 * %2) + %3). +(define_expand "fnma4" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 4) + (fma:SVE_F (neg:SVE_F + (match_operand:SVE_F 1 "register_operand")) + (match_operand:SVE_F 2 "register_operand") + (match_operand:SVE_F 3 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[4] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; fnma predicated with a PTRUE. +(define_insn "*fnma4" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (fma:SVE_F (neg:SVE_F + (match_operand:SVE_F 3 "register_operand" "%0, w")) + (match_operand:SVE_F 4 "register_operand" "w, w") + (match_operand:SVE_F 2 "register_operand" "w, 0"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fmsb\t%0., %1/m, %4., %2. + fmls\t%0., %1/m, %3., %4." +) + +;; Unpredicated fms (%0 = (%1 * %2) - %3). +(define_expand "fms4" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 4) + (fma:SVE_F (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "register_operand") + (neg:SVE_F + (match_operand:SVE_F 3 "register_operand")))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[4] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; fms predicated with a PTRUE. +(define_insn "*fms4" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w") + (match_operand:SVE_F 4 "register_operand" "w, w") + (neg:SVE_F + (match_operand:SVE_F 2 "register_operand" "w, 0")))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fnmsb\t%0., %1/m, %4., %2. + fnmls\t%0., %1/m, %3., %4." +) + +;; Unpredicated fnms (%0 = (-%1 * %2) - %3). +(define_expand "fnms4" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 4) + (fma:SVE_F (neg:SVE_F + (match_operand:SVE_F 1 "register_operand")) + (match_operand:SVE_F 2 "register_operand") + (neg:SVE_F + (match_operand:SVE_F 3 "register_operand")))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[4] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; fnms predicated with a PTRUE. +(define_insn "*fnms4" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (fma:SVE_F (neg:SVE_F + (match_operand:SVE_F 3 "register_operand" "%0, w")) + (match_operand:SVE_F 4 "register_operand" "w, w") + (neg:SVE_F + (match_operand:SVE_F 2 "register_operand" "w, 0")))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fnmad\t%0., %1/m, %4., %2. + fnmla\t%0., %1/m, %3., %4." +) + +;; Unpredicated floating-point division. +(define_expand "div3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (div:SVE_F (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Floating-point division predicated with a PTRUE. +(define_insn "*div3" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (div:SVE_F (match_operand:SVE_F 2 "register_operand" "0, w") + (match_operand:SVE_F 3 "register_operand" "w, 0"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "@ + fdiv\t%0., %1/m, %0., %3. + fdivr\t%0., %1/m, %0., %2." +) + +;; Unpredicated FNEG, FABS and FSQRT. +(define_expand "2" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 2) + (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 1 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; FNEG, FABS and FSQRT predicated with a PTRUE. +(define_insn "*2" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl") + (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "\t%0., %1/m, %2." +) + +;; Unpredicated FRINTy. +(define_expand "2" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 2) + (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")] + FRINT)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; FRINTy predicated with a PTRUE. +(define_insn "*2" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl") + (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "w")] + FRINT)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "frint\t%0., %1/m, %2." +) + +;; Unpredicated conversion of floats to integers of the same size (HF to HI, +;; SF to SI or DF to DI). +(define_expand "2" + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_dup 2) + (FIXUORS: + (match_operand:SVE_F 1 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Conversion of SF to DI, SI or HI, predicated with a PTRUE. +(define_insn "*v16hsf2" + [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") + (unspec:SVE_HSDI + [(match_operand: 1 "register_operand" "Upl") + (FIXUORS:SVE_HSDI + (match_operand:VNx8HF 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "fcvtz\t%0., %1/m, %2.h" +) + +;; Conversion of SF to DI or SI, predicated with a PTRUE. +(define_insn "*vnx4sf2" + [(set (match_operand:SVE_SDI 0 "register_operand" "=w") + (unspec:SVE_SDI + [(match_operand: 1 "register_operand" "Upl") + (FIXUORS:SVE_SDI + (match_operand:VNx4SF 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "fcvtz\t%0., %1/m, %2.s" +) + +;; Conversion of DF to DI or SI, predicated with a PTRUE. +(define_insn "*vnx2df2" + [(set (match_operand:SVE_SDI 0 "register_operand" "=w") + (unspec:SVE_SDI + [(match_operand:VNx2BI 1 "register_operand" "Upl") + (FIXUORS:SVE_SDI + (match_operand:VNx2DF 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "fcvtz\t%0., %1/m, %2.d" +) + +;; Unpredicated conversion of integers to floats of the same size +;; (HI to HF, SI to SF or DI to DF). +(define_expand "2" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 2) + (FLOATUORS:SVE_F + (match_operand: 1 "register_operand"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Conversion of DI, SI or HI to the same number of HFs, predicated +;; with a PTRUE. +(define_insn "*vnx8hf2" + [(set (match_operand:VNx8HF 0 "register_operand" "=w") + (unspec:VNx8HF + [(match_operand: 1 "register_operand" "Upl") + (FLOATUORS:VNx8HF + (match_operand:SVE_HSDI 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "cvtf\t%0.h, %1/m, %2." +) + +;; Conversion of DI or SI to the same number of SFs, predicated with a PTRUE. +(define_insn "*vnx4sf2" + [(set (match_operand:VNx4SF 0 "register_operand" "=w") + (unspec:VNx4SF + [(match_operand: 1 "register_operand" "Upl") + (FLOATUORS:VNx4SF + (match_operand:SVE_SDI 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "cvtf\t%0.s, %1/m, %2." +) + +;; Conversion of DI or SI to DF, predicated with a PTRUE. +(define_insn "*vnx2df2" + [(set (match_operand:VNx2DF 0 "register_operand" "=w") + (unspec:VNx2DF + [(match_operand:VNx2BI 1 "register_operand" "Upl") + (FLOATUORS:VNx2DF + (match_operand:SVE_SDI 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "cvtf\t%0.d, %1/m, %2." +) + +;; Conversion of DFs to the same number of SFs, or SFs to the same number +;; of HFs. +(define_insn "*trunc2" + [(set (match_operand:SVE_HSF 0 "register_operand" "=w") + (unspec:SVE_HSF + [(match_operand: 1 "register_operand" "Upl") + (unspec:SVE_HSF + [(match_operand: 2 "register_operand" "w")] + UNSPEC_FLOAT_CONVERT)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "fcvt\t%0., %1/m, %2." +) + +;; Conversion of SFs to the same number of DFs, or HFs to the same number +;; of SFs. +(define_insn "*extend2" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand: 1 "register_operand" "Upl") + (unspec: + [(match_operand:SVE_HSF 2 "register_operand" "w")] + UNSPEC_FLOAT_CONVERT)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + "fcvt\t%0., %1/m, %2." +) + +;; PUNPKHI and PUNPKLO. +(define_insn "vec_unpack__" + [(set (match_operand: 0 "register_operand" "=Upa") + (unspec: [(match_operand:PRED_BHS 1 "register_operand" "Upa")] + UNPACK))] + "TARGET_SVE" + "punpk\t%0.h, %1.b" +) + +;; SUNPKHI, UUNPKHI, SUNPKLO and UUNPKLO. +(define_insn "vec_unpack__" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: [(match_operand:SVE_BHSI 1 "register_operand" "w")] + UNPACK))] + "TARGET_SVE" + "unpk\t%0., %1." +) + +;; Used by the vec_unpacks__ expander to unpack the bit +;; representation of a VNx4SF or VNx8HF without conversion. The choice +;; between signed and unsigned isn't significant. +(define_insn "*vec_unpacku___no_convert" + [(set (match_operand:SVE_HSF 0 "register_operand" "=w") + (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand" "w")] + UNPACK_UNSIGNED))] + "TARGET_SVE" + "uunpk\t%0., %1." +) + +;; Unpack one half of a VNx4SF to VNx2DF, or one half of a VNx8HF to VNx4SF. +;; First unpack the source without conversion, then float-convert the +;; unpacked source. +(define_expand "vec_unpacks__" + [(set (match_dup 2) + (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")] + UNPACK_UNSIGNED)) + (set (match_operand: 0 "register_operand") + (unspec: [(match_dup 3) + (unspec: [(match_dup 2)] UNSPEC_FLOAT_CONVERT)] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = gen_reg_rtx (mode); + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Unpack one half of a VNx4SI to VNx2DF. First unpack from VNx4SI +;; to VNx2DI, reinterpret the VNx2DI as a VNx4SI, then convert the +;; unpacked VNx4SI to VNx2DF. +(define_expand "vec_unpack_float__vnx4si" + [(set (match_dup 2) + (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")] + UNPACK_UNSIGNED)) + (set (match_operand:VNx2DF 0 "register_operand") + (unspec:VNx2DF [(match_dup 3) + (FLOATUORS:VNx2DF (match_dup 4))] + UNSPEC_MERGE_PTRUE))] + "TARGET_SVE" + { + operands[2] = gen_reg_rtx (VNx2DImode); + operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode)); + operands[4] = gen_rtx_SUBREG (VNx4SImode, operands[2], 0); + } +) + +;; Predicate pack. Use UZP1 on the narrower type, which discards +;; the high part of each wide element. +(define_insn "vec_pack_trunc_" + [(set (match_operand:PRED_BHS 0 "register_operand" "=Upa") + (unspec:PRED_BHS + [(match_operand: 1 "register_operand" "Upa") + (match_operand: 2 "register_operand" "Upa")] + UNSPEC_PACK))] + "TARGET_SVE" + "uzp1\t%0., %1., %2." +) + +;; Integer pack. Use UZP1 on the narrower type, which discards +;; the high part of each wide element. +(define_insn "vec_pack_trunc_" + [(set (match_operand:SVE_BHSI 0 "register_operand" "=w") + (unspec:SVE_BHSI + [(match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w")] + UNSPEC_PACK))] + "TARGET_SVE" + "uzp1\t%0., %1., %2." +) + +;; Convert two vectors of DF to SF, or two vectors of SF to HF, and pack +;; the results into a single vector. +(define_expand "vec_pack_trunc_" + [(set (match_dup 4) + (unspec:SVE_HSF + [(match_dup 3) + (unspec:SVE_HSF [(match_operand: 1 "register_operand")] + UNSPEC_FLOAT_CONVERT)] + UNSPEC_MERGE_PTRUE)) + (set (match_dup 5) + (unspec:SVE_HSF + [(match_dup 3) + (unspec:SVE_HSF [(match_operand: 2 "register_operand")] + UNSPEC_FLOAT_CONVERT)] + UNSPEC_MERGE_PTRUE)) + (set (match_operand:SVE_HSF 0 "register_operand") + (unspec:SVE_HSF [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))] + "TARGET_SVE" + { + operands[3] = force_reg (mode, CONSTM1_RTX (mode)); + operands[4] = gen_reg_rtx (mode); + operands[5] = gen_reg_rtx (mode); + } +) + +;; Convert two vectors of DF to SI and pack the results into a single vector. +(define_expand "vec_pack_fix_trunc_vnx2df" + [(set (match_dup 4) + (unspec:VNx4SI + [(match_dup 3) + (FIXUORS:VNx4SI (match_operand:VNx2DF 1 "register_operand"))] + UNSPEC_MERGE_PTRUE)) + (set (match_dup 5) + (unspec:VNx4SI + [(match_dup 3) + (FIXUORS:VNx4SI (match_operand:VNx2DF 2 "register_operand"))] + UNSPEC_MERGE_PTRUE)) + (set (match_operand:VNx4SI 0 "register_operand") + (unspec:VNx4SI [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))] + "TARGET_SVE" + { + operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode)); + operands[4] = gen_reg_rtx (VNx4SImode); + operands[5] = gen_reg_rtx (VNx4SImode); + } +) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ae44c2abe11..c5ed870ef57 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -67,8 +67,10 @@ #include "sched-int.h" #include "target-globals.h" #include "common/common-target.h" +#include "cfgrtl.h" #include "selftest.h" #include "selftest-rtl.h" +#include "rtx-vector-builder.h" /* This file should be included last. */ #include "target-def.h" @@ -129,13 +131,18 @@ struct simd_immediate_info simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT, insn_type = MOV, modifier_type = LSL, unsigned int = 0); + simd_immediate_info (scalar_mode, rtx, rtx); /* The mode of the elements. */ scalar_mode elt_mode; - /* The value of each element. */ + /* The value of each element if all elements are the same, or the + first value if the constant is a series. */ rtx value; + /* The value of the step if the constant is a series, null otherwise. */ + rtx step; + /* The instruction to use to move the immediate into a vector. */ insn_type insn; @@ -149,7 +156,7 @@ struct simd_immediate_info ELT_MODE_IN and value VALUE_IN. */ inline simd_immediate_info ::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in) - : elt_mode (elt_mode_in), value (value_in), insn (MOV), + : elt_mode (elt_mode_in), value (value_in), step (NULL_RTX), insn (MOV), modifier (LSL), shift (0) {} @@ -162,12 +169,23 @@ inline simd_immediate_info insn_type insn_in, modifier_type modifier_in, unsigned int shift_in) : elt_mode (elt_mode_in), value (gen_int_mode (value_in, elt_mode_in)), - insn (insn_in), modifier (modifier_in), shift (shift_in) + step (NULL_RTX), insn (insn_in), modifier (modifier_in), shift (shift_in) +{} + +/* Construct an integer immediate in which each element has mode ELT_MODE_IN + and where element I is equal to VALUE_IN + I * STEP_IN. */ +inline simd_immediate_info +::simd_immediate_info (scalar_mode elt_mode_in, rtx value_in, rtx step_in) + : elt_mode (elt_mode_in), value (value_in), step (step_in), insn (MOV), + modifier (LSL), shift (0) {} /* The current code model. */ enum aarch64_code_model aarch64_cmodel; +/* The number of 64-bit elements in an SVE vector. */ +poly_uint16 aarch64_sve_vg; + #ifdef HAVE_AS_TLS #undef TARGET_HAVE_TLS #define TARGET_HAVE_TLS 1 @@ -187,8 +205,7 @@ static bool aarch64_builtin_support_vector_misalignment (machine_mode mode, const_tree type, int misalignment, bool is_packed); -static machine_mode -aarch64_simd_container_mode (scalar_mode mode, unsigned width); +static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64); static bool aarch64_print_ldpstp_address (FILE *, machine_mode, rtx); /* Major revision number of the ARM Architecture implemented by the target. */ @@ -1100,25 +1117,95 @@ aarch64_dbx_register_number (unsigned regno) return AARCH64_DWARF_SP; else if (FP_REGNUM_P (regno)) return AARCH64_DWARF_V0 + regno - V0_REGNUM; + else if (PR_REGNUM_P (regno)) + return AARCH64_DWARF_P0 + regno - P0_REGNUM; + else if (regno == VG_REGNUM) + return AARCH64_DWARF_VG; /* Return values >= DWARF_FRAME_REGISTERS indicate that there is no equivalent DWARF register. */ return DWARF_FRAME_REGISTERS; } -/* Return TRUE if MODE is any of the large INT modes. */ +/* Return true if MODE is any of the Advanced SIMD structure modes. */ +static bool +aarch64_advsimd_struct_mode_p (machine_mode mode) +{ + return (TARGET_SIMD + && (mode == OImode || mode == CImode || mode == XImode)); +} + +/* Return true if MODE is an SVE predicate mode. */ +static bool +aarch64_sve_pred_mode_p (machine_mode mode) +{ + return (TARGET_SVE + && (mode == VNx16BImode + || mode == VNx8BImode + || mode == VNx4BImode + || mode == VNx2BImode)); +} + +/* Three mutually-exclusive flags describing a vector or predicate type. */ +const unsigned int VEC_ADVSIMD = 1; +const unsigned int VEC_SVE_DATA = 2; +const unsigned int VEC_SVE_PRED = 4; +/* Can be used in combination with VEC_ADVSIMD or VEC_SVE_DATA to indicate + a structure of 2, 3 or 4 vectors. */ +const unsigned int VEC_STRUCT = 8; +/* Useful combinations of the above. */ +const unsigned int VEC_ANY_SVE = VEC_SVE_DATA | VEC_SVE_PRED; +const unsigned int VEC_ANY_DATA = VEC_ADVSIMD | VEC_SVE_DATA; + +/* Return a set of flags describing the vector properties of mode MODE. + Ignore modes that are not supported by the current target. */ +static unsigned int +aarch64_classify_vector_mode (machine_mode mode) +{ + if (aarch64_advsimd_struct_mode_p (mode)) + return VEC_ADVSIMD | VEC_STRUCT; + + if (aarch64_sve_pred_mode_p (mode)) + return VEC_SVE_PRED; + + scalar_mode inner = GET_MODE_INNER (mode); + if (VECTOR_MODE_P (mode) + && (inner == QImode + || inner == HImode + || inner == HFmode + || inner == SImode + || inner == SFmode + || inner == DImode + || inner == DFmode)) + { + if (TARGET_SVE + && known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR)) + return VEC_SVE_DATA; + + /* This includes V1DF but not V1DI (which doesn't exist). */ + if (TARGET_SIMD + && (known_eq (GET_MODE_BITSIZE (mode), 64) + || known_eq (GET_MODE_BITSIZE (mode), 128))) + return VEC_ADVSIMD; + } + + return 0; +} + +/* Return true if MODE is any of the data vector modes, including + structure modes. */ static bool -aarch64_vect_struct_mode_p (machine_mode mode) +aarch64_vector_data_mode_p (machine_mode mode) { - return mode == OImode || mode == CImode || mode == XImode; + return aarch64_classify_vector_mode (mode) & VEC_ANY_DATA; } -/* Return TRUE if MODE is any of the vector modes. */ +/* Return true if MODE is an SVE data vector mode; either a single vector + or a structure of vectors. */ static bool -aarch64_vector_mode_p (machine_mode mode) +aarch64_sve_data_mode_p (machine_mode mode) { - return aarch64_vector_mode_supported_p (mode) - || aarch64_vect_struct_mode_p (mode); + return aarch64_classify_vector_mode (mode) & VEC_SVE_DATA; } /* Implement target hook TARGET_ARRAY_MODE_SUPPORTED_P. */ @@ -1135,6 +1222,42 @@ aarch64_array_mode_supported_p (machine_mode mode, return false; } +/* Return the SVE predicate mode to use for elements that have + ELEM_NBYTES bytes, if such a mode exists. */ + +opt_machine_mode +aarch64_sve_pred_mode (unsigned int elem_nbytes) +{ + if (TARGET_SVE) + { + if (elem_nbytes == 1) + return VNx16BImode; + if (elem_nbytes == 2) + return VNx8BImode; + if (elem_nbytes == 4) + return VNx4BImode; + if (elem_nbytes == 8) + return VNx2BImode; + } + return opt_machine_mode (); +} + +/* Implement TARGET_VECTORIZE_GET_MASK_MODE. */ + +static opt_machine_mode +aarch64_get_mask_mode (poly_uint64 nunits, poly_uint64 nbytes) +{ + if (TARGET_SVE && known_eq (nbytes, BYTES_PER_SVE_VECTOR)) + { + unsigned int elem_nbytes = vector_element_size (nbytes, nunits); + machine_mode pred_mode; + if (aarch64_sve_pred_mode (elem_nbytes).exists (&pred_mode)) + return pred_mode; + } + + return default_get_mask_mode (nunits, nbytes); +} + /* Implement TARGET_HARD_REGNO_NREGS. */ static unsigned int @@ -1149,7 +1272,14 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode) { case FP_REGS: case FP_LO_REGS: + if (aarch64_sve_data_mode_p (mode)) + return exact_div (GET_MODE_SIZE (mode), + BYTES_PER_SVE_VECTOR).to_constant (); return CEIL (lowest_size, UNITS_PER_VREG); + case PR_REGS: + case PR_LO_REGS: + case PR_HI_REGS: + return 1; default: return CEIL (lowest_size, UNITS_PER_WORD); } @@ -1164,6 +1294,17 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) if (GET_MODE_CLASS (mode) == MODE_CC) return regno == CC_REGNUM; + if (regno == VG_REGNUM) + /* This must have the same size as _Unwind_Word. */ + return mode == DImode; + + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + if (vec_flags & VEC_SVE_PRED) + return PR_REGNUM_P (regno); + + if (PR_REGNUM_P (regno)) + return 0; + if (regno == SP_REGNUM) /* The purpose of comparing with ptr_mode is to support the global register variable associated with the stack pointer @@ -1173,15 +1314,15 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM) return mode == Pmode; - if (GP_REGNUM_P (regno) && ! aarch64_vect_struct_mode_p (mode)) + if (GP_REGNUM_P (regno) && known_le (GET_MODE_SIZE (mode), 16)) return true; if (FP_REGNUM_P (regno)) { - if (aarch64_vect_struct_mode_p (mode)) + if (vec_flags & VEC_STRUCT) return end_hard_regno (mode, regno) - 1 <= V31_REGNUM; else - return true; + return !VECTOR_MODE_P (mode) || vec_flags != 0; } return false; @@ -1197,10 +1338,39 @@ aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode) return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8); } +/* Implement REGMODE_NATURAL_SIZE. */ +poly_uint64 +aarch64_regmode_natural_size (machine_mode mode) +{ + /* The natural size for SVE data modes is one SVE data vector, + and similarly for predicates. We can't independently modify + anything smaller than that. */ + /* ??? For now, only do this for variable-width SVE registers. + Doing it for constant-sized registers breaks lower-subreg.c. */ + /* ??? And once that's fixed, we should probably have similar + code for Advanced SIMD. */ + if (!aarch64_sve_vg.is_constant ()) + { + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + if (vec_flags & VEC_SVE_PRED) + return BYTES_PER_SVE_PRED; + if (vec_flags & VEC_SVE_DATA) + return BYTES_PER_SVE_VECTOR; + } + return UNITS_PER_WORD; +} + /* Implement HARD_REGNO_CALLER_SAVE_MODE. */ machine_mode -aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode mode) -{ +aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned, + machine_mode mode) +{ + /* The predicate mode determines which bits are significant and + which are "don't care". Decreasing the number of lanes would + lose data while increasing the number of lanes would make bits + unnecessarily significant. */ + if (PR_REGNUM_P (regno)) + return mode; if (known_ge (GET_MODE_SIZE (mode), 4)) return mode; else @@ -1886,6 +2056,200 @@ aarch64_force_temporary (machine_mode mode, rtx x, rtx value) } } +/* Return true if we can move VALUE into a register using a single + CNT[BHWD] instruction. */ + +static bool +aarch64_sve_cnt_immediate_p (poly_int64 value) +{ + HOST_WIDE_INT factor = value.coeffs[0]; + /* The coefficient must be [1, 16] * {2, 4, 8, 16}. */ + return (value.coeffs[1] == factor + && IN_RANGE (factor, 2, 16 * 16) + && (factor & 1) == 0 + && factor <= 16 * (factor & -factor)); +} + +/* Likewise for rtx X. */ + +bool +aarch64_sve_cnt_immediate_p (rtx x) +{ + poly_int64 value; + return poly_int_rtx_p (x, &value) && aarch64_sve_cnt_immediate_p (value); +} + +/* Return the asm string for an instruction with a CNT-like vector size + operand (a vector pattern followed by a multiplier in the range [1, 16]). + PREFIX is the mnemonic without the size suffix and OPERANDS is the + first part of the operands template (the part that comes before the + vector size itself). FACTOR is the number of quadwords. + NELTS_PER_VQ, if nonzero, is the number of elements in each quadword. + If it is zero, we can use any element size. */ + +static char * +aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands, + unsigned int factor, + unsigned int nelts_per_vq) +{ + static char buffer[sizeof ("sqincd\t%x0, %w0, all, mul #16")]; + + if (nelts_per_vq == 0) + /* There is some overlap in the ranges of the four CNT instructions. + Here we always use the smallest possible element size, so that the + multiplier is 1 whereever possible. */ + nelts_per_vq = factor & -factor; + int shift = std::min (exact_log2 (nelts_per_vq), 4); + gcc_assert (IN_RANGE (shift, 1, 4)); + char suffix = "dwhb"[shift - 1]; + + factor >>= shift; + unsigned int written; + if (factor == 1) + written = snprintf (buffer, sizeof (buffer), "%s%c\t%s", + prefix, suffix, operands); + else + written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, all, mul #%d", + prefix, suffix, operands, factor); + gcc_assert (written < sizeof (buffer)); + return buffer; +} + +/* Return the asm string for an instruction with a CNT-like vector size + operand (a vector pattern followed by a multiplier in the range [1, 16]). + PREFIX is the mnemonic without the size suffix and OPERANDS is the + first part of the operands template (the part that comes before the + vector size itself). X is the value of the vector size operand, + as a polynomial integer rtx. */ + +char * +aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands, + rtx x) +{ + poly_int64 value = rtx_to_poly_int64 (x); + gcc_assert (aarch64_sve_cnt_immediate_p (value)); + return aarch64_output_sve_cnt_immediate (prefix, operands, + value.coeffs[1], 0); +} + +/* Return true if we can add VALUE to a register using a single ADDVL + or ADDPL instruction. */ + +static bool +aarch64_sve_addvl_addpl_immediate_p (poly_int64 value) +{ + HOST_WIDE_INT factor = value.coeffs[0]; + if (factor == 0 || value.coeffs[1] != factor) + return false; + /* FACTOR counts VG / 2, so a value of 2 is one predicate width + and a value of 16 is one vector width. */ + return (((factor & 15) == 0 && IN_RANGE (factor, -32 * 16, 31 * 16)) + || ((factor & 1) == 0 && IN_RANGE (factor, -32 * 2, 31 * 2))); +} + +/* Likewise for rtx X. */ + +bool +aarch64_sve_addvl_addpl_immediate_p (rtx x) +{ + poly_int64 value; + return (poly_int_rtx_p (x, &value) + && aarch64_sve_addvl_addpl_immediate_p (value)); +} + +/* Return the asm string for adding ADDVL or ADDPL immediate X to operand 1 + and storing the result in operand 0. */ + +char * +aarch64_output_sve_addvl_addpl (rtx dest, rtx base, rtx offset) +{ + static char buffer[sizeof ("addpl\t%x0, %x1, #-") + 3 * sizeof (int)]; + poly_int64 offset_value = rtx_to_poly_int64 (offset); + gcc_assert (aarch64_sve_addvl_addpl_immediate_p (offset_value)); + + /* Use INC or DEC if possible. */ + if (rtx_equal_p (dest, base) && GP_REGNUM_P (REGNO (dest))) + { + if (aarch64_sve_cnt_immediate_p (offset_value)) + return aarch64_output_sve_cnt_immediate ("inc", "%x0", + offset_value.coeffs[1], 0); + if (aarch64_sve_cnt_immediate_p (-offset_value)) + return aarch64_output_sve_cnt_immediate ("dec", "%x0", + -offset_value.coeffs[1], 0); + } + + int factor = offset_value.coeffs[1]; + if ((factor & 15) == 0) + snprintf (buffer, sizeof (buffer), "addvl\t%%x0, %%x1, #%d", factor / 16); + else + snprintf (buffer, sizeof (buffer), "addpl\t%%x0, %%x1, #%d", factor / 2); + return buffer; +} + +/* Return true if X is a valid immediate for an SVE vector INC or DEC + instruction. If it is, store the number of elements in each vector + quadword in *NELTS_PER_VQ_OUT (if nonnull) and store the multiplication + factor in *FACTOR_OUT (if nonnull). */ + +bool +aarch64_sve_inc_dec_immediate_p (rtx x, int *factor_out, + unsigned int *nelts_per_vq_out) +{ + rtx elt; + poly_int64 value; + + if (!const_vec_duplicate_p (x, &elt) + || !poly_int_rtx_p (elt, &value)) + return false; + + unsigned int nelts_per_vq = 128 / GET_MODE_UNIT_BITSIZE (GET_MODE (x)); + if (nelts_per_vq != 8 && nelts_per_vq != 4 && nelts_per_vq != 2) + /* There's no vector INCB. */ + return false; + + HOST_WIDE_INT factor = value.coeffs[0]; + if (value.coeffs[1] != factor) + return false; + + /* The coefficient must be [1, 16] * NELTS_PER_VQ. */ + if ((factor % nelts_per_vq) != 0 + || !IN_RANGE (abs (factor), nelts_per_vq, 16 * nelts_per_vq)) + return false; + + if (factor_out) + *factor_out = factor; + if (nelts_per_vq_out) + *nelts_per_vq_out = nelts_per_vq; + return true; +} + +/* Return true if X is a valid immediate for an SVE vector INC or DEC + instruction. */ + +bool +aarch64_sve_inc_dec_immediate_p (rtx x) +{ + return aarch64_sve_inc_dec_immediate_p (x, NULL, NULL); +} + +/* Return the asm template for an SVE vector INC or DEC instruction. + OPERANDS gives the operands before the vector count and X is the + value of the vector count operand itself. */ + +char * +aarch64_output_sve_inc_dec_immediate (const char *operands, rtx x) +{ + int factor; + unsigned int nelts_per_vq; + if (!aarch64_sve_inc_dec_immediate_p (x, &factor, &nelts_per_vq)) + gcc_unreachable (); + if (factor < 0) + return aarch64_output_sve_cnt_immediate ("dec", operands, -factor, + nelts_per_vq); + else + return aarch64_output_sve_cnt_immediate ("inc", operands, factor, + nelts_per_vq); +} static int aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate, @@ -2011,6 +2375,15 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate, return num_insns; } +/* Return the number of temporary registers that aarch64_add_offset_1 + would need to add OFFSET to a register. */ + +static unsigned int +aarch64_add_offset_1_temporaries (HOST_WIDE_INT offset) +{ + return abs_hwi (offset) < 0x1000000 ? 0 : 1; +} + /* A subroutine of aarch64_add_offset. Set DEST to SRC + OFFSET for a non-polynomial OFFSET. MODE is the mode of the addition. FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should @@ -2092,15 +2465,64 @@ aarch64_add_offset_1 (scalar_int_mode mode, rtx dest, } } +/* Return the number of temporary registers that aarch64_add_offset + would need to move OFFSET into a register or add OFFSET to a register; + ADD_P is true if we want the latter rather than the former. */ + +static unsigned int +aarch64_offset_temporaries (bool add_p, poly_int64 offset) +{ + /* This follows the same structure as aarch64_add_offset. */ + if (add_p && aarch64_sve_addvl_addpl_immediate_p (offset)) + return 0; + + unsigned int count = 0; + HOST_WIDE_INT factor = offset.coeffs[1]; + HOST_WIDE_INT constant = offset.coeffs[0] - factor; + poly_int64 poly_offset (factor, factor); + if (add_p && aarch64_sve_addvl_addpl_immediate_p (poly_offset)) + /* Need one register for the ADDVL/ADDPL result. */ + count += 1; + else if (factor != 0) + { + factor = abs (factor); + if (factor > 16 * (factor & -factor)) + /* Need one register for the CNT result and one for the multiplication + factor. If necessary, the second temporary can be reused for the + constant part of the offset. */ + return 2; + /* Need one register for the CNT result (which might then + be shifted). */ + count += 1; + } + return count + aarch64_add_offset_1_temporaries (constant); +} + +/* If X can be represented as a poly_int64, return the number + of temporaries that are required to add it to a register. + Return -1 otherwise. */ + +int +aarch64_add_offset_temporaries (rtx x) +{ + poly_int64 offset; + if (!poly_int_rtx_p (x, &offset)) + return -1; + return aarch64_offset_temporaries (true, offset); +} + /* Set DEST to SRC + OFFSET. MODE is the mode of the addition. FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should be set and CFA adjustments added to the generated instructions. TEMP1, if nonnull, is a register of mode MODE that can be used as a temporary if register allocation is already complete. This temporary - register may overlap DEST but must not overlap SRC. If TEMP1 is known - to hold abs (OFFSET), EMIT_MOVE_IMM can be set to false to avoid emitting - the immediate again. + register may overlap DEST if !FRAME_RELATED_P but must not overlap SRC. + If TEMP1 is known to hold abs (OFFSET), EMIT_MOVE_IMM can be set to + false to avoid emitting the immediate again. + + TEMP2, if nonnull, is a second temporary register that doesn't + overlap either DEST or REG. Since this function may be used to adjust the stack pointer, we must ensure that it cannot cause transient stack deallocation (for example @@ -2109,27 +2531,177 @@ aarch64_add_offset_1 (scalar_int_mode mode, rtx dest, static void aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, - poly_int64 offset, rtx temp1, bool frame_related_p, - bool emit_move_imm = true) + poly_int64 offset, rtx temp1, rtx temp2, + bool frame_related_p, bool emit_move_imm = true) { gcc_assert (emit_move_imm || temp1 != NULL_RTX); gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src)); + gcc_assert (temp1 == NULL_RTX + || !frame_related_p + || !reg_overlap_mentioned_p (temp1, dest)); + gcc_assert (temp2 == NULL_RTX || !reg_overlap_mentioned_p (dest, temp2)); + + /* Try using ADDVL or ADDPL to add the whole value. */ + if (src != const0_rtx && aarch64_sve_addvl_addpl_immediate_p (offset)) + { + rtx offset_rtx = gen_int_mode (offset, mode); + rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx)); + RTX_FRAME_RELATED_P (insn) = frame_related_p; + return; + } + + /* Coefficient 1 is multiplied by the number of 128-bit blocks in an + SVE vector register, over and above the minimum size of 128 bits. + This is equivalent to half the value returned by CNTD with a + vector shape of ALL. */ + HOST_WIDE_INT factor = offset.coeffs[1]; + HOST_WIDE_INT constant = offset.coeffs[0] - factor; + + /* Try using ADDVL or ADDPL to add the VG-based part. */ + poly_int64 poly_offset (factor, factor); + if (src != const0_rtx + && aarch64_sve_addvl_addpl_immediate_p (poly_offset)) + { + rtx offset_rtx = gen_int_mode (poly_offset, mode); + if (frame_related_p) + { + rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx)); + RTX_FRAME_RELATED_P (insn) = true; + src = dest; + } + else + { + rtx addr = gen_rtx_PLUS (mode, src, offset_rtx); + src = aarch64_force_temporary (mode, temp1, addr); + temp1 = temp2; + temp2 = NULL_RTX; + } + } + /* Otherwise use a CNT-based sequence. */ + else if (factor != 0) + { + /* Use a subtraction if we have a negative factor. */ + rtx_code code = PLUS; + if (factor < 0) + { + factor = -factor; + code = MINUS; + } + + /* Calculate CNTD * FACTOR / 2. First try to fold the division + into the multiplication. */ + rtx val; + int shift = 0; + if (factor & 1) + /* Use a right shift by 1. */ + shift = -1; + else + factor /= 2; + HOST_WIDE_INT low_bit = factor & -factor; + if (factor <= 16 * low_bit) + { + if (factor > 16 * 8) + { + /* "CNTB Xn, ALL, MUL #FACTOR" is out of range, so calculate + the value with the minimum multiplier and shift it into + position. */ + int extra_shift = exact_log2 (low_bit); + shift += extra_shift; + factor >>= extra_shift; + } + val = gen_int_mode (poly_int64 (factor * 2, factor * 2), mode); + } + else + { + /* Use CNTD, then multiply it by FACTOR. */ + val = gen_int_mode (poly_int64 (2, 2), mode); + val = aarch64_force_temporary (mode, temp1, val); + + /* Go back to using a negative multiplication factor if we have + no register from which to subtract. */ + if (code == MINUS && src == const0_rtx) + { + factor = -factor; + code = PLUS; + } + rtx coeff1 = gen_int_mode (factor, mode); + coeff1 = aarch64_force_temporary (mode, temp2, coeff1); + val = gen_rtx_MULT (mode, val, coeff1); + } + + if (shift > 0) + { + /* Multiply by 1 << SHIFT. */ + val = aarch64_force_temporary (mode, temp1, val); + val = gen_rtx_ASHIFT (mode, val, GEN_INT (shift)); + } + else if (shift == -1) + { + /* Divide by 2. */ + val = aarch64_force_temporary (mode, temp1, val); + val = gen_rtx_ASHIFTRT (mode, val, const1_rtx); + } + + /* Calculate SRC +/- CNTD * FACTOR / 2. */ + if (src != const0_rtx) + { + val = aarch64_force_temporary (mode, temp1, val); + val = gen_rtx_fmt_ee (code, mode, src, val); + } + else if (code == MINUS) + { + val = aarch64_force_temporary (mode, temp1, val); + val = gen_rtx_NEG (mode, val); + } + + if (constant == 0 || frame_related_p) + { + rtx_insn *insn = emit_insn (gen_rtx_SET (dest, val)); + if (frame_related_p) + { + RTX_FRAME_RELATED_P (insn) = true; + add_reg_note (insn, REG_CFA_ADJUST_CFA, + gen_rtx_SET (dest, plus_constant (Pmode, src, + poly_offset))); + } + src = dest; + if (constant == 0) + return; + } + else + { + src = aarch64_force_temporary (mode, temp1, val); + temp1 = temp2; + temp2 = NULL_RTX; + } + + emit_move_imm = true; + } - /* SVE support will go here. */ - HOST_WIDE_INT constant = offset.to_constant (); aarch64_add_offset_1 (mode, dest, src, constant, temp1, frame_related_p, emit_move_imm); } +/* Like aarch64_add_offset, but the offset is given as an rtx rather + than a poly_int64. */ + +void +aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src, + rtx offset_rtx, rtx temp1, rtx temp2) +{ + aarch64_add_offset (mode, dest, src, rtx_to_poly_int64 (offset_rtx), + temp1, temp2, false); +} + /* Add DELTA to the stack pointer, marking the instructions frame-related. TEMP1 is available as a temporary if nonnull. EMIT_MOVE_IMM is false if TEMP1 already contains abs (DELTA). */ static inline void -aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm) +aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm) { aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta, - temp1, true, emit_move_imm); + temp1, temp2, true, emit_move_imm); } /* Subtract DELTA from the stack pointer, marking the instructions @@ -2137,44 +2709,195 @@ aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm) if nonnull. */ static inline void -aarch64_sub_sp (rtx temp1, poly_int64 delta, bool frame_related_p) +aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p) { aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta, - temp1, frame_related_p); + temp1, temp2, frame_related_p); } -void -aarch64_expand_mov_immediate (rtx dest, rtx imm) +/* Set DEST to (vec_series BASE STEP). */ + +static void +aarch64_expand_vec_series (rtx dest, rtx base, rtx step) { machine_mode mode = GET_MODE (dest); + scalar_mode inner = GET_MODE_INNER (mode); + + /* Each operand can be a register or an immediate in the range [-16, 15]. */ + if (!aarch64_sve_index_immediate_p (base)) + base = force_reg (inner, base); + if (!aarch64_sve_index_immediate_p (step)) + step = force_reg (inner, step); + + emit_set_insn (dest, gen_rtx_VEC_SERIES (mode, base, step)); +} - gcc_assert (mode == SImode || mode == DImode); +/* Try to duplicate SRC into SVE register DEST, given that SRC is an + integer of mode INT_MODE. Return true on success. */ + +static bool +aarch64_expand_sve_widened_duplicate (rtx dest, scalar_int_mode src_mode, + rtx src) +{ + /* If the constant is smaller than 128 bits, we can do the move + using a vector of SRC_MODEs. */ + if (src_mode != TImode) + { + poly_uint64 count = exact_div (GET_MODE_SIZE (GET_MODE (dest)), + GET_MODE_SIZE (src_mode)); + machine_mode dup_mode = mode_for_vector (src_mode, count).require (); + emit_move_insn (gen_lowpart (dup_mode, dest), + gen_const_vec_duplicate (dup_mode, src)); + return true; + } + + /* The bytes are loaded in little-endian order, so do a byteswap on + big-endian targets. */ + if (BYTES_BIG_ENDIAN) + { + src = simplify_unary_operation (BSWAP, src_mode, src, src_mode); + if (!src) + return NULL_RTX; + } + + /* Use LD1RQ to load the 128 bits from memory. */ + src = force_const_mem (src_mode, src); + if (!src) + return false; + + /* Make sure that the address is legitimate. */ + if (!aarch64_sve_ld1r_operand_p (src)) + { + rtx addr = force_reg (Pmode, XEXP (src, 0)); + src = replace_equiv_address (src, addr); + } + + rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode)); + emit_insn (gen_sve_ld1rq (gen_lowpart (VNx16QImode, dest), ptrue, src)); + return true; +} + +/* Expand a move of general CONST_VECTOR SRC into DEST, given that it + isn't a simple duplicate or series. */ + +static void +aarch64_expand_sve_const_vector (rtx dest, rtx src) +{ + machine_mode mode = GET_MODE (src); + unsigned int npatterns = CONST_VECTOR_NPATTERNS (src); + unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src); + gcc_assert (npatterns > 1); + + if (nelts_per_pattern == 1) + { + /* The constant is a repeating seqeuence of at least two elements, + where the repeating elements occupy no more than 128 bits. + Get an integer representation of the replicated value. */ + unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; + gcc_assert (int_bits <= 128); + + scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); + rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0); + if (int_value + && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value)) + return; + } + + /* Expand each pattern individually. */ + rtx_vector_builder builder; + auto_vec vectors (npatterns); + for (unsigned int i = 0; i < npatterns; ++i) + { + builder.new_vector (mode, 1, nelts_per_pattern); + for (unsigned int j = 0; j < nelts_per_pattern; ++j) + builder.quick_push (CONST_VECTOR_ELT (src, i + j * npatterns)); + vectors.quick_push (force_reg (mode, builder.build ())); + } + + /* Use permutes to interleave the separate vectors. */ + while (npatterns > 1) + { + npatterns /= 2; + for (unsigned int i = 0; i < npatterns; ++i) + { + rtx tmp = (npatterns == 1 ? dest : gen_reg_rtx (mode)); + rtvec v = gen_rtvec (2, vectors[i], vectors[i + npatterns]); + emit_set_insn (tmp, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1)); + vectors[i] = tmp; + } + } + gcc_assert (vectors[0] == dest); +} + +/* Set DEST to immediate IMM. For SVE vector modes, GEN_VEC_DUPLICATE + is a pattern that can be used to set DEST to a replicated scalar + element. */ + +void +aarch64_expand_mov_immediate (rtx dest, rtx imm, + rtx (*gen_vec_duplicate) (rtx, rtx)) +{ + machine_mode mode = GET_MODE (dest); /* Check on what type of symbol it is. */ scalar_int_mode int_mode; if ((GET_CODE (imm) == SYMBOL_REF || GET_CODE (imm) == LABEL_REF - || GET_CODE (imm) == CONST) + || GET_CODE (imm) == CONST + || GET_CODE (imm) == CONST_POLY_INT) && is_a (mode, &int_mode)) { - rtx mem, base, offset; + rtx mem; + poly_int64 offset; + HOST_WIDE_INT const_offset; enum aarch64_symbol_type sty; /* If we have (const (plus symbol offset)), separate out the offset before we start classifying the symbol. */ - split_const (imm, &base, &offset); + rtx base = strip_offset (imm, &offset); - sty = aarch64_classify_symbol (base, offset); + /* We must always add an offset involving VL separately, rather than + folding it into the relocation. */ + if (!offset.is_constant (&const_offset)) + { + if (base == const0_rtx && aarch64_sve_cnt_immediate_p (offset)) + emit_insn (gen_rtx_SET (dest, imm)); + else + { + /* Do arithmetic on 32-bit values if the result is smaller + than that. */ + if (partial_subreg_p (int_mode, SImode)) + { + /* It is invalid to do symbol calculations in modes + narrower than SImode. */ + gcc_assert (base == const0_rtx); + dest = gen_lowpart (SImode, dest); + int_mode = SImode; + } + if (base != const0_rtx) + { + base = aarch64_force_temporary (int_mode, dest, base); + aarch64_add_offset (int_mode, dest, base, offset, + NULL_RTX, NULL_RTX, false); + } + else + aarch64_add_offset (int_mode, dest, base, offset, + dest, NULL_RTX, false); + } + return; + } + + sty = aarch64_classify_symbol (base, const_offset); switch (sty) { case SYMBOL_FORCE_TO_MEM: - if (offset != const0_rtx + if (const_offset != 0 && targetm.cannot_force_const_mem (int_mode, imm)) { gcc_assert (can_create_pseudo_p ()); base = aarch64_force_temporary (int_mode, dest, base); - aarch64_add_offset (int_mode, dest, base, INTVAL (offset), - NULL_RTX, false); + aarch64_add_offset (int_mode, dest, base, const_offset, + NULL_RTX, NULL_RTX, false); return; } @@ -2209,12 +2932,12 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) case SYMBOL_SMALL_GOT_4G: case SYMBOL_TINY_GOT: case SYMBOL_TINY_TLSIE: - if (offset != const0_rtx) + if (const_offset != 0) { gcc_assert(can_create_pseudo_p ()); base = aarch64_force_temporary (int_mode, dest, base); - aarch64_add_offset (int_mode, dest, base, INTVAL (offset), - NULL_RTX, false); + aarch64_add_offset (int_mode, dest, base, const_offset, + NULL_RTX, NULL_RTX, false); return; } /* FALLTHRU */ @@ -2235,13 +2958,36 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) if (!CONST_INT_P (imm)) { - if (GET_CODE (imm) == HIGH) + rtx base, step, value; + if (GET_CODE (imm) == HIGH + || aarch64_simd_valid_immediate (imm, NULL)) emit_insn (gen_rtx_SET (dest, imm)); + else if (const_vec_series_p (imm, &base, &step)) + aarch64_expand_vec_series (dest, base, step); + else if (const_vec_duplicate_p (imm, &value)) + { + /* If the constant is out of range of an SVE vector move, + load it from memory if we can, otherwise move it into + a register and use a DUP. */ + scalar_mode inner_mode = GET_MODE_INNER (mode); + rtx op = force_const_mem (inner_mode, value); + if (!op) + op = force_reg (inner_mode, value); + else if (!aarch64_sve_ld1r_operand_p (op)) + { + rtx addr = force_reg (Pmode, XEXP (op, 0)); + op = replace_equiv_address (op, addr); + } + emit_insn (gen_vec_duplicate (dest, op)); + } + else if (GET_CODE (imm) == CONST_VECTOR + && !GET_MODE_NUNITS (GET_MODE (imm)).is_constant ()) + aarch64_expand_sve_const_vector (dest, imm); else - { + { rtx mem = force_const_mem (mode, imm); gcc_assert (mem); - emit_insn (gen_rtx_SET (dest, mem)); + emit_move_insn (dest, mem); } return; @@ -2251,6 +2997,44 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) as_a (mode)); } +/* Emit an SVE predicated move from SRC to DEST. PRED is a predicate + that is known to contain PTRUE. */ + +void +aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src) +{ + emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (GET_MODE (dest), + gen_rtvec (2, pred, src), + UNSPEC_MERGE_PTRUE))); +} + +/* Expand a pre-RA SVE data move from SRC to DEST in which at least one + operand is in memory. In this case we need to use the predicated LD1 + and ST1 instead of LDR and STR, both for correctness on big-endian + targets and because LD1 and ST1 support a wider range of addressing modes. + PRED_MODE is the mode of the predicate. + + See the comment at the head of aarch64-sve.md for details about the + big-endian handling. */ + +void +aarch64_expand_sve_mem_move (rtx dest, rtx src, machine_mode pred_mode) +{ + machine_mode mode = GET_MODE (dest); + rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode)); + if (!register_operand (src, mode) + && !register_operand (dest, mode)) + { + rtx tmp = gen_reg_rtx (mode); + if (MEM_P (src)) + aarch64_emit_sve_pred_move (tmp, ptrue, src); + else + emit_move_insn (tmp, src); + src = tmp; + } + aarch64_emit_sve_pred_move (dest, ptrue, src); +} + static bool aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, tree exp ATTRIBUTE_UNUSED) @@ -2715,6 +3499,21 @@ aarch64_function_arg_boundary (machine_mode mode, const_tree type) return MIN (MAX (alignment, PARM_BOUNDARY), STACK_BOUNDARY); } +/* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE. */ + +static fixed_size_mode +aarch64_get_reg_raw_mode (int regno) +{ + if (TARGET_SVE && FP_REGNUM_P (regno)) + /* Don't use the SVE part of the register for __builtin_apply and + __builtin_return. The SVE registers aren't used by the normal PCS, + so using them there would be a waste of time. The PCS extensions + for SVE types are fundamentally incompatible with the + __builtin_return/__builtin_apply interface. */ + return as_a (V16QImode); + return default_get_reg_raw_mode (regno); +} + /* Implement TARGET_FUNCTION_ARG_PADDING. Small aggregate types are placed in the lowest memory address. @@ -3472,6 +4271,41 @@ aarch64_restore_callee_saves (machine_mode mode, } } +/* Return true if OFFSET is a signed 4-bit value multiplied by the size + of MODE. */ + +static inline bool +offset_4bit_signed_scaled_p (machine_mode mode, poly_int64 offset) +{ + HOST_WIDE_INT multiple; + return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple) + && IN_RANGE (multiple, -8, 7)); +} + +/* Return true if OFFSET is a unsigned 6-bit value multiplied by the size + of MODE. */ + +static inline bool +offset_6bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset) +{ + HOST_WIDE_INT multiple; + return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple) + && IN_RANGE (multiple, 0, 63)); +} + +/* Return true if OFFSET is a signed 7-bit value multiplied by the size + of MODE. */ + +bool +aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset) +{ + HOST_WIDE_INT multiple; + return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple) + && IN_RANGE (multiple, -64, 63)); +} + +/* Return true if OFFSET is a signed 9-bit value. */ + static inline bool offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, poly_int64 offset) @@ -3481,20 +4315,26 @@ offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, && IN_RANGE (const_offset, -256, 255)); } +/* Return true if OFFSET is a signed 9-bit value multiplied by the size + of MODE. */ + static inline bool -offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset) +offset_9bit_signed_scaled_p (machine_mode mode, poly_int64 offset) { HOST_WIDE_INT multiple; return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple) - && IN_RANGE (multiple, 0, 4095)); + && IN_RANGE (multiple, -256, 255)); } -bool -aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset) +/* Return true if OFFSET is an unsigned 12-bit value multiplied by the size + of MODE. */ + +static inline bool +offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset) { HOST_WIDE_INT multiple; return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple) - && IN_RANGE (multiple, -64, 63)); + && IN_RANGE (multiple, 0, 4095)); } /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ @@ -3713,6 +4553,18 @@ aarch64_set_handled_components (sbitmap components) cfun->machine->reg_is_wrapped_separately[regno] = true; } +/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG + is saved at BASE + OFFSET. */ + +static void +aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg, + rtx base, poly_int64 offset) +{ + rtx mem = gen_frame_mem (DImode, plus_constant (Pmode, base, offset)); + add_reg_note (insn, REG_CFA_EXPRESSION, + gen_rtx_SET (mem, regno_reg_rtx[reg])); +} + /* AArch64 stack frames generated by this compiler look like: +-------------------------------+ @@ -3798,19 +4650,55 @@ aarch64_expand_prologue (void) rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM); rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM); - aarch64_sub_sp (ip0_rtx, initial_adjust, true); + aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true); if (callee_adjust != 0) aarch64_push_regs (reg1, reg2, callee_adjust); if (emit_frame_chain) { + poly_int64 reg_offset = callee_adjust; if (callee_adjust == 0) - aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM, - R30_REGNUM, false); + { + reg1 = R29_REGNUM; + reg2 = R30_REGNUM; + reg_offset = callee_offset; + aarch64_save_callee_saves (DImode, reg_offset, reg1, reg2, false); + } aarch64_add_offset (Pmode, hard_frame_pointer_rtx, - stack_pointer_rtx, callee_offset, ip1_rtx, - frame_pointer_needed); + stack_pointer_rtx, callee_offset, + ip1_rtx, ip0_rtx, frame_pointer_needed); + if (frame_pointer_needed && !frame_size.is_constant ()) + { + /* Variable-sized frames need to describe the save slot + address using DW_CFA_expression rather than DW_CFA_offset. + This means that, without taking further action, the + locations of the registers that we've already saved would + remain based on the stack pointer even after we redefine + the CFA based on the frame pointer. We therefore need new + DW_CFA_expressions to re-express the save slots with addresses + based on the frame pointer. */ + rtx_insn *insn = get_last_insn (); + gcc_assert (RTX_FRAME_RELATED_P (insn)); + + /* Add an explicit CFA definition if this was previously + implicit. */ + if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX)) + { + rtx src = plus_constant (Pmode, stack_pointer_rtx, + callee_offset); + add_reg_note (insn, REG_CFA_ADJUST_CFA, + gen_rtx_SET (hard_frame_pointer_rtx, src)); + } + + /* Change the save slot expressions for the registers that + we've already saved. */ + reg_offset -= callee_offset; + aarch64_add_cfa_expression (insn, reg2, hard_frame_pointer_rtx, + reg_offset + UNITS_PER_WORD); + aarch64_add_cfa_expression (insn, reg1, hard_frame_pointer_rtx, + reg_offset); + } emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx)); } @@ -3818,7 +4706,7 @@ aarch64_expand_prologue (void) callee_adjust != 0 || emit_frame_chain); aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, callee_adjust != 0 || emit_frame_chain); - aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed); + aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed); } /* Return TRUE if we can use a simple_return insn. @@ -3859,6 +4747,13 @@ aarch64_expand_epilogue (bool for_sibcall) unsigned reg2 = cfun->machine->frame.wb_candidate2; rtx cfi_ops = NULL; rtx_insn *insn; + /* A stack clash protection prologue may not have left IP0_REGNUM or + IP1_REGNUM in a usable state. The same is true for allocations + with an SVE component, since we then need both temporary registers + for each allocation. */ + bool can_inherit_p = (initial_adjust.is_constant () + && final_adjust.is_constant () + && !flag_stack_clash_protection); /* We need to add memory barrier to prevent read from deallocated stack. */ bool need_barrier_p @@ -3884,9 +4779,10 @@ aarch64_expand_epilogue (bool for_sibcall) is restored on the instruction doing the writeback. */ aarch64_add_offset (Pmode, stack_pointer_rtx, hard_frame_pointer_rtx, -callee_offset, - ip1_rtx, callee_adjust == 0); + ip1_rtx, ip0_rtx, callee_adjust == 0); else - aarch64_add_sp (ip1_rtx, final_adjust, df_regs_ever_live_p (IP1_REGNUM)); + aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust, + !can_inherit_p || df_regs_ever_live_p (IP1_REGNUM)); aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM, callee_adjust != 0, &cfi_ops); @@ -3909,7 +4805,8 @@ aarch64_expand_epilogue (bool for_sibcall) cfi_ops = NULL; } - aarch64_add_sp (ip0_rtx, initial_adjust, df_regs_ever_live_p (IP0_REGNUM)); + aarch64_add_sp (ip0_rtx, ip1_rtx, initial_adjust, + !can_inherit_p || df_regs_ever_live_p (IP0_REGNUM)); if (cfi_ops) { @@ -4019,7 +4916,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, temp1 = gen_rtx_REG (Pmode, IP1_REGNUM); if (vcall_offset == 0) - aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, false); + aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, temp0, false); else { gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0); @@ -4031,8 +4928,8 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx, plus_constant (Pmode, this_rtx, delta)); else - aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, - false); + aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, + temp1, temp0, false); } if (Pmode == ptr_mode) @@ -4126,11 +5023,27 @@ aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode) } else { - /* Ignore sign extension. */ - val &= (HOST_WIDE_INT) 0xffffffff; + /* Ignore sign extension. */ + val &= (HOST_WIDE_INT) 0xffffffff; + } + return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) == val + || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val); +} + +/* VAL is a value with the inner mode of MODE. Replicate it to fill a + 64-bit (DImode) integer. */ + +static unsigned HOST_WIDE_INT +aarch64_replicate_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode) +{ + unsigned int size = GET_MODE_UNIT_PRECISION (mode); + while (size < 64) + { + val &= (HOST_WIDE_INT_1U << size) - 1; + val |= val << size; + size *= 2; } - return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) == val - || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val); + return val; } /* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */ @@ -4155,7 +5068,7 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode) /* Check for a single sequence of one bits and return quickly if so. The special cases of all ones and all zeroes returns false. */ - val = (unsigned HOST_WIDE_INT) val_in; + val = aarch64_replicate_bitmask_imm (val_in, mode); tmp = val + (val & -val); if (tmp == (tmp & -tmp)) @@ -4257,10 +5170,16 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) if (GET_CODE (x) == HIGH) return true; + /* There's no way to calculate VL-based values using relocations. */ + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, x, ALL) + if (GET_CODE (*iter) == CONST_POLY_INT) + return true; + split_const (x, &base, &offset); if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF) { - if (aarch64_classify_symbol (base, offset) + if (aarch64_classify_symbol (base, INTVAL (offset)) != SYMBOL_FORCE_TO_MEM) return true; else @@ -4496,10 +5415,21 @@ aarch64_classify_index (struct aarch64_address_info *info, rtx x, && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))]) index = SUBREG_REG (index); - if ((shift == 0 - || (shift > 0 && shift <= 3 - && known_eq (1 << shift, GET_MODE_SIZE (mode)))) - && REG_P (index) + if (aarch64_sve_data_mode_p (mode)) + { + if (type != ADDRESS_REG_REG + || (1 << shift) != GET_MODE_UNIT_SIZE (mode)) + return false; + } + else + { + if (shift != 0 + && !(IN_RANGE (shift, 1, 3) + && known_eq (1 << shift, GET_MODE_SIZE (mode)))) + return false; + } + + if (REG_P (index) && aarch64_regno_ok_for_index_p (REGNO (index), strict_p)) { info->type = type; @@ -4552,23 +5482,34 @@ aarch64_classify_address (struct aarch64_address_info *info, /* On BE, we use load/store pair for all large int mode load/stores. TI/TFmode may also use a load/store pair. */ + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + bool advsimd_struct_p = (vec_flags == (VEC_ADVSIMD | VEC_STRUCT)); bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP || mode == TImode || mode == TFmode - || (BYTES_BIG_ENDIAN - && aarch64_vect_struct_mode_p (mode))); + || (BYTES_BIG_ENDIAN && advsimd_struct_p)); bool allow_reg_index_p = (!load_store_pair_p - && (maybe_ne (GET_MODE_SIZE (mode), 16) - || aarch64_vector_mode_supported_p (mode)) - && !aarch64_vect_struct_mode_p (mode)); + && (known_lt (GET_MODE_SIZE (mode), 16) + || vec_flags == VEC_ADVSIMD + || vec_flags == VEC_SVE_DATA)); + + /* For SVE, only accept [Rn], [Rn, Rm, LSL #shift] and + [Rn, #offset, MUL VL]. */ + if ((vec_flags & (VEC_SVE_DATA | VEC_SVE_PRED)) != 0 + && (code != REG && code != PLUS)) + return false; /* On LE, for AdvSIMD, don't support anything other than POST_INC or REG addressing. */ - if (aarch64_vect_struct_mode_p (mode) && !BYTES_BIG_ENDIAN + if (advsimd_struct_p + && !BYTES_BIG_ENDIAN && (code != POST_INC && code != REG)) return false; + gcc_checking_assert (GET_MODE (x) == VOIDmode + || SCALAR_INT_MODE_P (GET_MODE (x))); + switch (code) { case REG: @@ -4641,6 +5582,17 @@ aarch64_classify_address (struct aarch64_address_info *info, && aarch64_offset_7bit_signed_scaled_p (TImode, offset + 32)); + /* Make "m" use the LD1 offset range for SVE data modes, so + that pre-RTL optimizers like ivopts will work to that + instead of the wider LDR/STR range. */ + if (vec_flags == VEC_SVE_DATA) + return (type == ADDR_QUERY_M + ? offset_4bit_signed_scaled_p (mode, offset) + : offset_9bit_signed_scaled_p (mode, offset)); + + if (vec_flags == VEC_SVE_PRED) + return offset_9bit_signed_scaled_p (mode, offset); + if (load_store_pair_p) return ((known_eq (GET_MODE_SIZE (mode), 4) || known_eq (GET_MODE_SIZE (mode), 8)) @@ -4741,7 +5693,8 @@ aarch64_classify_address (struct aarch64_address_info *info, rtx sym, offs; split_const (info->offset, &sym, &offs); if (GET_CODE (sym) == SYMBOL_REF - && (aarch64_classify_symbol (sym, offs) == SYMBOL_SMALL_ABSOLUTE)) + && (aarch64_classify_symbol (sym, INTVAL (offs)) + == SYMBOL_SMALL_ABSOLUTE)) { /* The symbol and offset must be aligned to the access size. */ unsigned int align; @@ -4812,7 +5765,7 @@ aarch64_classify_symbolic_expression (rtx x) rtx offset; split_const (x, &x, &offset); - return aarch64_classify_symbol (x, offset); + return aarch64_classify_symbol (x, INTVAL (offset)); } @@ -5265,6 +6218,33 @@ aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT val) return aarch64_const_vec_all_same_in_range_p (x, val, val); } +/* Return true if VEC is a constant in which every element is in the range + [MINVAL, MAXVAL]. The elements do not need to have the same value. */ + +static bool +aarch64_const_vec_all_in_range_p (rtx vec, + HOST_WIDE_INT minval, + HOST_WIDE_INT maxval) +{ + if (GET_CODE (vec) != CONST_VECTOR + || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT) + return false; + + int nunits; + if (!CONST_VECTOR_STEPPED_P (vec)) + nunits = const_vector_encoded_nelts (vec); + else if (!CONST_VECTOR_NUNITS (vec).is_constant (&nunits)) + return false; + + for (int i = 0; i < nunits; i++) + { + rtx vec_elem = CONST_VECTOR_ELT (vec, i); + if (!CONST_INT_P (vec_elem) + || !IN_RANGE (INTVAL (vec_elem), minval, maxval)) + return false; + } + return true; +} /* N Z C V. */ #define AARCH64_CC_V 1 @@ -5293,10 +6273,43 @@ static const int aarch64_nzcv_codes[] = 0 /* NV, Any. */ }; +/* Print floating-point vector immediate operand X to F, negating it + first if NEGATE is true. Return true on success, false if it isn't + a constant we can handle. */ + +static bool +aarch64_print_vector_float_operand (FILE *f, rtx x, bool negate) +{ + rtx elt; + + if (!const_vec_duplicate_p (x, &elt)) + return false; + + REAL_VALUE_TYPE r = *CONST_DOUBLE_REAL_VALUE (elt); + if (negate) + r = real_value_negate (&r); + + /* We only handle the SVE single-bit immediates here. */ + if (real_equal (&r, &dconst0)) + asm_fprintf (f, "0.0"); + else if (real_equal (&r, &dconst1)) + asm_fprintf (f, "1.0"); + else if (real_equal (&r, &dconsthalf)) + asm_fprintf (f, "0.5"); + else + return false; + + return true; +} + /* Print operand X to file F in a target specific manner according to CODE. The acceptable formatting commands given by CODE are: 'c': An integer or symbol address without a preceding # sign. + 'C': Take the duplicated element in a vector constant + and print it in hex. + 'D': Take the duplicated element in a vector constant + and print it as an unsigned integer, in decimal. 'e': Print the sign/zero-extend size as a character 8->b, 16->h, 32->w. 'p': Prints N such that 2^N == X (X must be power of 2 and @@ -5306,6 +6319,8 @@ static const int aarch64_nzcv_codes[] = of regs. 'm': Print a condition (eq, ne, etc). 'M': Same as 'm', but invert condition. + 'N': Take the duplicated element in a vector constant + and print the negative of it in decimal. 'b/h/s/d/q': Print a scalar FP/SIMD register name. 'S/T/U/V': Print a FP/SIMD register name for a register list. The register printed is the FP/SIMD register name @@ -5332,6 +6347,7 @@ static const int aarch64_nzcv_codes[] = static void aarch64_print_operand (FILE *f, rtx x, int code) { + rtx elt; switch (code) { case 'c': @@ -5448,6 +6464,25 @@ aarch64_print_operand (FILE *f, rtx x, int code) } break; + case 'N': + if (!const_vec_duplicate_p (x, &elt)) + { + output_operand_lossage ("invalid vector constant"); + return; + } + + if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT) + asm_fprintf (f, "%wd", -INTVAL (elt)); + else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT + && aarch64_print_vector_float_operand (f, x, true)) + ; + else + { + output_operand_lossage ("invalid vector constant"); + return; + } + break; + case 'b': case 'h': case 's': @@ -5470,7 +6505,9 @@ aarch64_print_operand (FILE *f, rtx x, int code) output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code); return; } - asm_fprintf (f, "v%d", REGNO (x) - V0_REGNUM + (code - 'S')); + asm_fprintf (f, "%c%d", + aarch64_sve_data_mode_p (GET_MODE (x)) ? 'z' : 'v', + REGNO (x) - V0_REGNUM + (code - 'S')); break; case 'R': @@ -5491,6 +6528,33 @@ aarch64_print_operand (FILE *f, rtx x, int code) asm_fprintf (f, "0x%wx", UINTVAL (x) & 0xffff); break; + case 'C': + { + /* Print a replicated constant in hex. */ + if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt)) + { + output_operand_lossage ("invalid operand for '%%%c'", code); + return; + } + scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x)); + asm_fprintf (f, "0x%wx", UINTVAL (elt) & GET_MODE_MASK (inner_mode)); + } + break; + + case 'D': + { + /* Print a replicated constant in decimal, treating it as + unsigned. */ + if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt)) + { + output_operand_lossage ("invalid operand for '%%%c'", code); + return; + } + scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x)); + asm_fprintf (f, "%wd", UINTVAL (elt) & GET_MODE_MASK (inner_mode)); + } + break; + case 'w': case 'x': if (x == const0_rtx @@ -5524,14 +6588,16 @@ aarch64_print_operand (FILE *f, rtx x, int code) switch (GET_CODE (x)) { case REG: - asm_fprintf (f, "%s", reg_names [REGNO (x)]); + if (aarch64_sve_data_mode_p (GET_MODE (x))) + asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM); + else + asm_fprintf (f, "%s", reg_names [REGNO (x)]); break; case MEM: output_address (GET_MODE (x), XEXP (x, 0)); break; - case CONST: case LABEL_REF: case SYMBOL_REF: output_addr_const (asm_out_file, x); @@ -5541,21 +6607,31 @@ aarch64_print_operand (FILE *f, rtx x, int code) asm_fprintf (f, "%wd", INTVAL (x)); break; - case CONST_VECTOR: - if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT) + case CONST: + if (!VECTOR_MODE_P (GET_MODE (x))) { - gcc_assert ( - aarch64_const_vec_all_same_in_range_p (x, - HOST_WIDE_INT_MIN, - HOST_WIDE_INT_MAX)); - asm_fprintf (f, "%wd", INTVAL (CONST_VECTOR_ELT (x, 0))); + output_addr_const (asm_out_file, x); + break; } - else if (aarch64_simd_imm_zero_p (x, GET_MODE (x))) + /* fall through */ + + case CONST_VECTOR: + if (!const_vec_duplicate_p (x, &elt)) { - fputc ('0', f); + output_operand_lossage ("invalid vector constant"); + return; } + + if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT) + asm_fprintf (f, "%wd", INTVAL (elt)); + else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT + && aarch64_print_vector_float_operand (f, x, false)) + ; else - gcc_unreachable (); + { + output_operand_lossage ("invalid vector constant"); + return; + } break; case CONST_DOUBLE: @@ -5740,6 +6816,22 @@ aarch64_print_address_internal (FILE *f, machine_mode mode, rtx x, case ADDRESS_REG_IMM: if (known_eq (addr.const_offset, 0)) asm_fprintf (f, "[%s]", reg_names [REGNO (addr.base)]); + else if (aarch64_sve_data_mode_p (mode)) + { + HOST_WIDE_INT vnum + = exact_div (addr.const_offset, + BYTES_PER_SVE_VECTOR).to_constant (); + asm_fprintf (f, "[%s, #%wd, mul vl]", + reg_names[REGNO (addr.base)], vnum); + } + else if (aarch64_sve_pred_mode_p (mode)) + { + HOST_WIDE_INT vnum + = exact_div (addr.const_offset, + BYTES_PER_SVE_PRED).to_constant (); + asm_fprintf (f, "[%s, #%wd, mul vl]", + reg_names[REGNO (addr.base)], vnum); + } else asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)], INTVAL (addr.offset)); @@ -5827,7 +6919,7 @@ aarch64_print_ldpstp_address (FILE *f, machine_mode mode, rtx x) static void aarch64_print_operand_address (FILE *f, machine_mode mode, rtx x) { - if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_M)) + if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_ANY)) output_addr_const (f, x); } @@ -5882,6 +6974,9 @@ aarch64_regno_regclass (unsigned regno) if (FP_REGNUM_P (regno)) return FP_LO_REGNUM_P (regno) ? FP_LO_REGS : FP_REGS; + if (PR_REGNUM_P (regno)) + return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS; + return NO_REGS; } @@ -6035,6 +7130,14 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, machine_mode mode, secondary_reload_info *sri) { + if (BYTES_BIG_ENDIAN + && reg_class_subset_p (rclass, FP_REGS) + && (MEM_P (x) || (REG_P (x) && !HARD_REGISTER_P (x))) + && aarch64_sve_data_mode_p (mode)) + { + sri->icode = CODE_FOR_aarch64_sve_reload_be; + return NO_REGS; + } /* If we have to disable direct literal pool loads and stores because the function is too big, then we need a scratch register. */ @@ -6176,6 +7279,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) can hold MODE, but at the moment we need to handle all modes. Just ignore any runtime parts for registers that can't store them. */ HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode)); + unsigned int nregs; switch (regclass) { case CALLER_SAVE_REGS: @@ -6185,10 +7289,17 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) case POINTER_AND_FP_REGS: case FP_REGS: case FP_LO_REGS: - return (aarch64_vector_mode_p (mode) + if (aarch64_sve_data_mode_p (mode) + && constant_multiple_p (GET_MODE_SIZE (mode), + BYTES_PER_SVE_VECTOR, &nregs)) + return nregs; + return (aarch64_vector_data_mode_p (mode) ? CEIL (lowest_size, UNITS_PER_VREG) : CEIL (lowest_size, UNITS_PER_WORD)); case STACK_REG: + case PR_REGS: + case PR_LO_REGS: + case PR_HI_REGS: return 1; case NO_REGS: @@ -7497,8 +8608,8 @@ cost_plus: } if (GET_MODE_CLASS (mode) == MODE_INT - && CONST_INT_P (op1) - && aarch64_uimm12_shift (INTVAL (op1))) + && ((CONST_INT_P (op1) && aarch64_uimm12_shift (INTVAL (op1))) + || aarch64_sve_addvl_addpl_immediate (op1, mode))) { *cost += rtx_cost (op0, mode, PLUS, 0, speed); @@ -9415,6 +10526,21 @@ aarch64_get_arch (enum aarch64_arch arch) return &all_architectures[cpu->arch]; } +/* Return the VG value associated with -msve-vector-bits= value VALUE. */ + +static poly_uint16 +aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits_enum value) +{ + /* For now generate vector-length agnostic code for -msve-vector-bits=128. + This ensures we can clearly distinguish SVE and Advanced SIMD modes when + deciding which .md file patterns to use and when deciding whether + something is a legitimate address or constant. */ + if (value == SVE_SCALABLE || value == SVE_128) + return poly_uint16 (2, 2); + else + return (int) value / 64; +} + /* Implement TARGET_OPTION_OVERRIDE. This is called once in the beginning and is used to parse the -m{cpu,tune,arch} strings and setup the initial tuning structs. In particular it must set selected_tune and @@ -9516,6 +10642,9 @@ aarch64_override_options (void) error ("assembler does not support -mabi=ilp32"); #endif + /* Convert -msve-vector-bits to a VG count. */ + aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits); + if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE && TARGET_ILP32) sorry ("return address signing is only supported for -mabi=lp64"); @@ -10392,11 +11521,11 @@ aarch64_classify_tls_symbol (rtx x) } } -/* Return the method that should be used to access SYMBOL_REF or - LABEL_REF X. */ +/* Return the correct method for accessing X + OFFSET, where X is either + a SYMBOL_REF or LABEL_REF. */ enum aarch64_symbol_type -aarch64_classify_symbol (rtx x, rtx offset) +aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset) { if (GET_CODE (x) == LABEL_REF) { @@ -10439,7 +11568,7 @@ aarch64_classify_symbol (rtx x, rtx offset) resolve to a symbol in this module, then force to memory. */ if ((SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x)) - || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575) + || !IN_RANGE (offset, -1048575, 1048575)) return SYMBOL_FORCE_TO_MEM; return SYMBOL_TINY_ABSOLUTE; @@ -10448,7 +11577,7 @@ aarch64_classify_symbol (rtx x, rtx offset) 4G. */ if ((SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x)) - || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263), + || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263), HOST_WIDE_INT_C (4294967264))) return SYMBOL_FORCE_TO_MEM; return SYMBOL_SMALL_ABSOLUTE; @@ -10511,28 +11640,46 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x) if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR) return true; - /* Do not allow vector struct mode constants. We could support - 0 and -1 easily, but they need support in aarch64-simd.md. */ - if (aarch64_vect_struct_mode_p (mode)) + /* Do not allow vector struct mode constants for Advanced SIMD. + We could support 0 and -1 easily, but they need support in + aarch64-simd.md. */ + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT)) return false; /* Do not allow wide int constants - this requires support in movti. */ if (CONST_WIDE_INT_P (x)) return false; + /* Only accept variable-length vector constants if they can be + handled directly. + + ??? It would be possible to handle rematerialization of other + constants via secondary reloads. */ + if (vec_flags & VEC_ANY_SVE) + return aarch64_simd_valid_immediate (x, NULL); + if (GET_CODE (x) == HIGH) x = XEXP (x, 0); - /* Do not allow const (plus (anchor_symbol, const_int)). */ - if (GET_CODE (x) == CONST) - { - rtx offset; - - split_const (x, &x, &offset); + /* Accept polynomial constants that can be calculated by using the + destination of a move as the sole temporary. Constants that + require a second temporary cannot be rematerialized (they can't be + forced to memory and also aren't legitimate constants). */ + poly_int64 offset; + if (poly_int_rtx_p (x, &offset)) + return aarch64_offset_temporaries (false, offset) <= 1; + + /* If an offset is being added to something else, we need to allow the + base to be moved into the destination register, meaning that there + are no free temporaries for the offset. */ + x = strip_offset (x, &offset); + if (!offset.is_constant () && aarch64_offset_temporaries (true, offset) > 0) + return false; - if (SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x)) - return false; - } + /* Do not allow const (plus (anchor_symbol, const_int)). */ + if (maybe_ne (offset, 0) && SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x)) + return false; /* Treat symbols as constants. Avoid TLS symbols as they are complex, so spilling them is better than rematerialization. */ @@ -11079,6 +12226,12 @@ aarch64_conditional_register_usage (void) call_used_regs[i] = 1; } } + if (!TARGET_SVE) + for (i = P0_REGNUM; i <= P15_REGNUM; i++) + { + fixed_regs[i] = 1; + call_used_regs[i] = 1; + } } /* Walk down the type tree of TYPE counting consecutive base elements. @@ -11372,28 +12525,40 @@ aarch64_struct_value_rtx (tree fndecl ATTRIBUTE_UNUSED, static bool aarch64_vector_mode_supported_p (machine_mode mode) { - if (TARGET_SIMD - && (mode == V4SImode || mode == V8HImode - || mode == V16QImode || mode == V2DImode - || mode == V2SImode || mode == V4HImode - || mode == V8QImode || mode == V2SFmode - || mode == V4SFmode || mode == V2DFmode - || mode == V4HFmode || mode == V8HFmode - || mode == V1DFmode)) - return true; - - return false; + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + return vec_flags != 0 && (vec_flags & VEC_STRUCT) == 0; } /* Return appropriate SIMD container for MODE within a vector of WIDTH bits. */ static machine_mode -aarch64_simd_container_mode (scalar_mode mode, unsigned width) +aarch64_simd_container_mode (scalar_mode mode, poly_int64 width) { - gcc_assert (width == 64 || width == 128); + if (TARGET_SVE && known_eq (width, BITS_PER_SVE_VECTOR)) + switch (mode) + { + case E_DFmode: + return VNx2DFmode; + case E_SFmode: + return VNx4SFmode; + case E_HFmode: + return VNx8HFmode; + case E_DImode: + return VNx2DImode; + case E_SImode: + return VNx4SImode; + case E_HImode: + return VNx8HImode; + case E_QImode: + return VNx16QImode; + default: + return word_mode; + } + + gcc_assert (known_eq (width, 64) || known_eq (width, 128)); if (TARGET_SIMD) { - if (width == 128) + if (known_eq (width, 128)) switch (mode) { case E_DFmode: @@ -11437,7 +12602,8 @@ aarch64_simd_container_mode (scalar_mode mode, unsigned width) static machine_mode aarch64_preferred_simd_mode (scalar_mode mode) { - return aarch64_simd_container_mode (mode, 128); + poly_int64 bits = TARGET_SVE ? BITS_PER_SVE_VECTOR : 128; + return aarch64_simd_container_mode (mode, bits); } /* Return a list of possible vector sizes for the vectorizer @@ -11445,6 +12611,8 @@ aarch64_preferred_simd_mode (scalar_mode mode) static void aarch64_autovectorize_vector_sizes (vector_sizes *sizes) { + if (TARGET_SVE) + sizes->safe_push (BYTES_PER_SVE_VECTOR); sizes->safe_push (16); sizes->safe_push (8); } @@ -11606,6 +12774,125 @@ sizetochar (int size) } } +/* Return true if BASE_OR_STEP is a valid immediate operand for an SVE INDEX + instruction. */ + +bool +aarch64_sve_index_immediate_p (rtx base_or_step) +{ + return (CONST_INT_P (base_or_step) + && IN_RANGE (INTVAL (base_or_step), -16, 15)); +} + +/* Return true if X is a valid immediate for the SVE ADD and SUB + instructions. Negate X first if NEGATE_P is true. */ + +bool +aarch64_sve_arith_immediate_p (rtx x, bool negate_p) +{ + rtx elt; + + if (!const_vec_duplicate_p (x, &elt) + || !CONST_INT_P (elt)) + return false; + + HOST_WIDE_INT val = INTVAL (elt); + if (negate_p) + val = -val; + val &= GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x))); + + if (val & 0xff) + return IN_RANGE (val, 0, 0xff); + return IN_RANGE (val, 0, 0xff00); +} + +/* Return true if X is a valid immediate operand for an SVE logical + instruction such as AND. */ + +bool +aarch64_sve_bitmask_immediate_p (rtx x) +{ + rtx elt; + + return (const_vec_duplicate_p (x, &elt) + && CONST_INT_P (elt) + && aarch64_bitmask_imm (INTVAL (elt), + GET_MODE_INNER (GET_MODE (x)))); +} + +/* Return true if X is a valid immediate for the SVE DUP and CPY + instructions. */ + +bool +aarch64_sve_dup_immediate_p (rtx x) +{ + rtx elt; + + if (!const_vec_duplicate_p (x, &elt) + || !CONST_INT_P (elt)) + return false; + + HOST_WIDE_INT val = INTVAL (elt); + if (val & 0xff) + return IN_RANGE (val, -0x80, 0x7f); + return IN_RANGE (val, -0x8000, 0x7f00); +} + +/* Return true if X is a valid immediate operand for an SVE CMP instruction. + SIGNED_P says whether the operand is signed rather than unsigned. */ + +bool +aarch64_sve_cmp_immediate_p (rtx x, bool signed_p) +{ + rtx elt; + + return (const_vec_duplicate_p (x, &elt) + && CONST_INT_P (elt) + && (signed_p + ? IN_RANGE (INTVAL (elt), -16, 15) + : IN_RANGE (INTVAL (elt), 0, 127))); +} + +/* Return true if X is a valid immediate operand for an SVE FADD or FSUB + instruction. Negate X first if NEGATE_P is true. */ + +bool +aarch64_sve_float_arith_immediate_p (rtx x, bool negate_p) +{ + rtx elt; + REAL_VALUE_TYPE r; + + if (!const_vec_duplicate_p (x, &elt) + || GET_CODE (elt) != CONST_DOUBLE) + return false; + + r = *CONST_DOUBLE_REAL_VALUE (elt); + + if (negate_p) + r = real_value_negate (&r); + + if (real_equal (&r, &dconst1)) + return true; + if (real_equal (&r, &dconsthalf)) + return true; + return false; +} + +/* Return true if X is a valid immediate operand for an SVE FMUL + instruction. */ + +bool +aarch64_sve_float_mul_immediate_p (rtx x) +{ + rtx elt; + + /* GCC will never generate a multiply with an immediate of 2, so there is no + point testing for it (even though it is a valid constant). */ + return (const_vec_duplicate_p (x, &elt) + && GET_CODE (elt) == CONST_DOUBLE + && real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf)); +} + /* Return true if replicating VAL32 is a valid 2-byte or 4-byte immediate for the Advanced SIMD operation described by WHICH and INSN. If INFO is nonnull, use it to describe valid immediates. */ @@ -11710,6 +12997,52 @@ aarch64_advsimd_valid_immediate (unsigned HOST_WIDE_INT val64, return false; } +/* Return true if replicating VAL64 gives a valid immediate for an SVE MOV + instruction. If INFO is nonnull, use it to describe valid immediates. */ + +static bool +aarch64_sve_valid_immediate (unsigned HOST_WIDE_INT val64, + simd_immediate_info *info) +{ + scalar_int_mode mode = DImode; + unsigned int val32 = val64 & 0xffffffff; + if (val32 == (val64 >> 32)) + { + mode = SImode; + unsigned int val16 = val32 & 0xffff; + if (val16 == (val32 >> 16)) + { + mode = HImode; + unsigned int val8 = val16 & 0xff; + if (val8 == (val16 >> 8)) + mode = QImode; + } + } + HOST_WIDE_INT val = trunc_int_for_mode (val64, mode); + if (IN_RANGE (val, -0x80, 0x7f)) + { + /* DUP with no shift. */ + if (info) + *info = simd_immediate_info (mode, val); + return true; + } + if ((val & 0xff) == 0 && IN_RANGE (val, -0x8000, 0x7f00)) + { + /* DUP with LSL #8. */ + if (info) + *info = simd_immediate_info (mode, val); + return true; + } + if (aarch64_bitmask_imm (val64, mode)) + { + /* DUPM. */ + if (info) + *info = simd_immediate_info (mode, val); + return true; + } + return false; +} + /* Return true if OP is a valid SIMD immediate for the operation described by WHICH. If INFO is nonnull, use it to describe valid immediates. */ @@ -11717,18 +13050,39 @@ bool aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info, enum simd_immediate_check which) { - rtx elt = NULL; + machine_mode mode = GET_MODE (op); + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + if (vec_flags == 0 || vec_flags == (VEC_ADVSIMD | VEC_STRUCT)) + return false; + + scalar_mode elt_mode = GET_MODE_INNER (mode); + rtx elt = NULL, base, step; unsigned int n_elts; if (const_vec_duplicate_p (op, &elt)) n_elts = 1; + else if ((vec_flags & VEC_SVE_DATA) + && const_vec_series_p (op, &base, &step)) + { + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT); + if (!aarch64_sve_index_immediate_p (base) + || !aarch64_sve_index_immediate_p (step)) + return false; + + if (info) + *info = simd_immediate_info (elt_mode, base, step); + return true; + } else if (GET_CODE (op) == CONST_VECTOR && CONST_VECTOR_NUNITS (op).is_constant (&n_elts)) /* N_ELTS set above. */; else return false; - machine_mode mode = GET_MODE (op); - scalar_mode elt_mode = GET_MODE_INNER (mode); + /* Handle PFALSE and PTRUE. */ + if (vec_flags & VEC_SVE_PRED) + return (op == CONST0_RTX (mode) + || op == CONSTM1_RTX (mode)); + scalar_float_mode elt_float_mode; if (elt && is_a (elt_mode, &elt_float_mode) @@ -11785,7 +13139,24 @@ aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info, val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes] << (i * BITS_PER_UNIT)); - return aarch64_advsimd_valid_immediate (val64, info, which); + if (vec_flags & VEC_SVE_DATA) + return aarch64_sve_valid_immediate (val64, info); + else + return aarch64_advsimd_valid_immediate (val64, info, which); +} + +/* Check whether X is a VEC_SERIES-like constant that starts at 0 and + has a step in the range of INDEX. Return the index expression if so, + otherwise return null. */ +rtx +aarch64_check_zero_based_sve_index_immediate (rtx x) +{ + rtx base, step; + if (const_vec_series_p (x, &base, &step) + && base == const0_rtx + && aarch64_sve_index_immediate_p (step)) + return step; + return NULL_RTX; } /* Check of immediate shift constants are within range. */ @@ -11799,16 +13170,6 @@ aarch64_simd_shift_imm_p (rtx x, machine_mode mode, bool left) return aarch64_const_vec_all_same_in_range_p (x, 1, bit_width); } -/* Return true if X is a uniform vector where all elements - are either the floating-point constant 0.0 or the - integer constant 0. */ -bool -aarch64_simd_imm_zero_p (rtx x, machine_mode mode) -{ - return x == CONST0_RTX (mode); -} - - /* Return the bitmask CONST_INT to select the bits required by a zero extract operation of width WIDTH at bit position POS. */ @@ -11833,9 +13194,15 @@ aarch64_mov_operand_p (rtx x, machine_mode mode) if (CONST_INT_P (x)) return true; + if (VECTOR_MODE_P (GET_MODE (x))) + return aarch64_simd_valid_immediate (x, NULL); + if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x)) return true; + if (aarch64_sve_cnt_immediate_p (x)) + return true; + return aarch64_classify_symbolic_expression (x) == SYMBOL_TINY_ABSOLUTE; } @@ -11855,7 +13222,7 @@ aarch64_simd_scalar_immediate_valid_for_move (rtx op, scalar_int_mode mode) { machine_mode vmode; - vmode = aarch64_preferred_simd_mode (mode); + vmode = aarch64_simd_container_mode (mode, 64); rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op)); return aarch64_simd_valid_immediate (op_v, NULL); } @@ -11965,6 +13332,7 @@ aarch64_endian_lane_rtx (machine_mode mode, unsigned int n) } /* Return TRUE if OP is a valid vector addressing mode. */ + bool aarch64_simd_mem_operand_p (rtx op) { @@ -11972,6 +13340,34 @@ aarch64_simd_mem_operand_p (rtx op) || REG_P (XEXP (op, 0))); } +/* Return true if OP is a valid MEM operand for an SVE LD1R instruction. */ + +bool +aarch64_sve_ld1r_operand_p (rtx op) +{ + struct aarch64_address_info addr; + scalar_mode mode; + + return (MEM_P (op) + && is_a (GET_MODE (op), &mode) + && aarch64_classify_address (&addr, XEXP (op, 0), mode, false) + && addr.type == ADDRESS_REG_IMM + && offset_6bit_unsigned_scaled_p (mode, addr.const_offset)); +} + +/* Return true if OP is a valid MEM operand for an SVE LDR instruction. + The conditions for STR are the same. */ +bool +aarch64_sve_ldr_operand_p (rtx op) +{ + struct aarch64_address_info addr; + + return (MEM_P (op) + && aarch64_classify_address (&addr, XEXP (op, 0), GET_MODE (op), + false, ADDR_QUERY_ANY) + && addr.type == ADDRESS_REG_IMM); +} + /* Emit a register copy from operand to operand, taking care not to early-clobber source registers in the process. @@ -12006,14 +13402,36 @@ aarch64_simd_attr_length_rglist (machine_mode mode) } /* Implement target hook TARGET_VECTOR_ALIGNMENT. The AAPCS64 sets the maximum - alignment of a vector to 128 bits. */ + alignment of a vector to 128 bits. SVE predicates have an alignment of + 16 bits. */ static HOST_WIDE_INT aarch64_simd_vector_alignment (const_tree type) { + if (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST) + /* ??? Checking the mode isn't ideal, but VECTOR_BOOLEAN_TYPE_P can + be set for non-predicate vectors of booleans. Modes are the most + direct way we have of identifying real SVE predicate types. */ + return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128; HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type)); return MIN (align, 128); } +/* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT. */ +static HOST_WIDE_INT +aarch64_vectorize_preferred_vector_alignment (const_tree type) +{ + if (aarch64_sve_data_mode_p (TYPE_MODE (type))) + { + /* If the length of the vector is fixed, try to align to that length, + otherwise don't try to align at all. */ + HOST_WIDE_INT result; + if (!BITS_PER_SVE_VECTOR.is_constant (&result)) + result = TYPE_ALIGN (TREE_TYPE (type)); + return result; + } + return TYPE_ALIGN (type); +} + /* Implement target hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE. */ static bool aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed) @@ -12021,9 +13439,12 @@ aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed) if (is_packed) return false; - /* We guarantee alignment for vectors up to 128-bits. */ - if (tree_int_cst_compare (TYPE_SIZE (type), - bitsize_int (BIGGEST_ALIGNMENT)) > 0) + /* For fixed-length vectors, check that the vectorizer will aim for + full-vector alignment. This isn't true for generic GCC vectors + that are wider than the ABI maximum of 128 bits. */ + if (TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST + && (wi::to_widest (TYPE_SIZE (type)) + != aarch64_vectorize_preferred_vector_alignment (type))) return false; /* Vectors whose size is <= BIGGEST_ALIGNMENT are naturally aligned. */ @@ -12268,12 +13689,9 @@ aarch64_expand_vector_init (rtx target, rtx vals) static unsigned HOST_WIDE_INT aarch64_shift_truncation_mask (machine_mode mode) { - return - (!SHIFT_COUNT_TRUNCATED - || aarch64_vector_mode_supported_p (mode) - || aarch64_vect_struct_mode_p (mode)) - ? 0 - : (GET_MODE_UNIT_BITSIZE (mode) - 1); + if (!SHIFT_COUNT_TRUNCATED || aarch64_vector_data_mode_p (mode)) + return 0; + return GET_MODE_UNIT_BITSIZE (mode) - 1; } /* Select a format to encode pointers in exception handling data. */ @@ -13250,6 +14668,67 @@ aarch64_output_scalar_simd_mov_immediate (rtx immediate, scalar_int_mode mode) return aarch64_output_simd_mov_immediate (v_op, width); } +/* Return the output string to use for moving immediate CONST_VECTOR + into an SVE register. */ + +char * +aarch64_output_sve_mov_immediate (rtx const_vector) +{ + static char templ[40]; + struct simd_immediate_info info; + char element_char; + + bool is_valid = aarch64_simd_valid_immediate (const_vector, &info); + gcc_assert (is_valid); + + element_char = sizetochar (GET_MODE_BITSIZE (info.elt_mode)); + + if (info.step) + { + snprintf (templ, sizeof (templ), "index\t%%0.%c, #" + HOST_WIDE_INT_PRINT_DEC ", #" HOST_WIDE_INT_PRINT_DEC, + element_char, INTVAL (info.value), INTVAL (info.step)); + return templ; + } + + if (GET_MODE_CLASS (info.elt_mode) == MODE_FLOAT) + { + if (aarch64_float_const_zero_rtx_p (info.value)) + info.value = GEN_INT (0); + else + { + const int buf_size = 20; + char float_buf[buf_size] = {}; + real_to_decimal_for_mode (float_buf, + CONST_DOUBLE_REAL_VALUE (info.value), + buf_size, buf_size, 1, info.elt_mode); + + snprintf (templ, sizeof (templ), "fmov\t%%0.%c, #%s", + element_char, float_buf); + return templ; + } + } + + snprintf (templ, sizeof (templ), "mov\t%%0.%c, #" HOST_WIDE_INT_PRINT_DEC, + element_char, INTVAL (info.value)); + return templ; +} + +/* Return the asm format for a PTRUE instruction whose destination has + mode MODE. SUFFIX is the element size suffix. */ + +char * +aarch64_output_ptrue (machine_mode mode, char suffix) +{ + unsigned int nunits; + static char buf[sizeof ("ptrue\t%0.N, vlNNNNN")]; + if (GET_MODE_NUNITS (mode).is_constant (&nunits)) + snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, vl%d", suffix, nunits); + else + snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, all", suffix); + return buf; +} + /* Split operands into moves from op[1] + op[2] into op[0]. */ void @@ -13304,13 +14783,12 @@ aarch64_split_combinev16qi (rtx operands[3]) /* vec_perm support. */ -#define MAX_VECT_LEN 16 - struct expand_vec_perm_d { rtx target, op0, op1; vec_perm_indices perm; machine_mode vmode; + unsigned int vec_flags; bool one_vector_p; bool testing_p; }; @@ -13392,6 +14870,74 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel, aarch64_expand_vec_perm_1 (target, op0, op1, sel); } +/* Generate (set TARGET (unspec [OP0 OP1] CODE)). */ + +static void +emit_unspec2 (rtx target, int code, rtx op0, rtx op1) +{ + emit_insn (gen_rtx_SET (target, + gen_rtx_UNSPEC (GET_MODE (target), + gen_rtvec (2, op0, op1), code))); +} + +/* Expand an SVE vec_perm with the given operands. */ + +void +aarch64_expand_sve_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) +{ + machine_mode data_mode = GET_MODE (target); + machine_mode sel_mode = GET_MODE (sel); + /* Enforced by the pattern condition. */ + int nunits = GET_MODE_NUNITS (sel_mode).to_constant (); + + /* Note: vec_perm indices are supposed to wrap when they go beyond the + size of the two value vectors, i.e. the upper bits of the indices + are effectively ignored. SVE TBL instead produces 0 for any + out-of-range indices, so we need to modulo all the vec_perm indices + to ensure they are all in range. */ + rtx sel_reg = force_reg (sel_mode, sel); + + /* Check if the sel only references the first values vector. */ + if (GET_CODE (sel) == CONST_VECTOR + && aarch64_const_vec_all_in_range_p (sel, 0, nunits - 1)) + { + emit_unspec2 (target, UNSPEC_TBL, op0, sel_reg); + return; + } + + /* Check if the two values vectors are the same. */ + if (rtx_equal_p (op0, op1)) + { + rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode, nunits - 1); + rtx sel_mod = expand_simple_binop (sel_mode, AND, sel_reg, max_sel, + NULL, 0, OPTAB_DIRECT); + emit_unspec2 (target, UNSPEC_TBL, op0, sel_mod); + return; + } + + /* Run TBL on for each value vector and combine the results. */ + + rtx res0 = gen_reg_rtx (data_mode); + rtx res1 = gen_reg_rtx (data_mode); + rtx neg_num_elems = aarch64_simd_gen_const_vector_dup (sel_mode, -nunits); + if (GET_CODE (sel) != CONST_VECTOR + || !aarch64_const_vec_all_in_range_p (sel, 0, 2 * nunits - 1)) + { + rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode, + 2 * nunits - 1); + sel_reg = expand_simple_binop (sel_mode, AND, sel_reg, max_sel, + NULL, 0, OPTAB_DIRECT); + } + emit_unspec2 (res0, UNSPEC_TBL, op0, sel_reg); + rtx sel_sub = expand_simple_binop (sel_mode, PLUS, sel_reg, neg_num_elems, + NULL, 0, OPTAB_DIRECT); + emit_unspec2 (res1, UNSPEC_TBL, op1, sel_sub); + if (GET_MODE_CLASS (data_mode) == MODE_VECTOR_INT) + emit_insn (gen_rtx_SET (target, gen_rtx_IOR (data_mode, res0, res1))); + else + emit_unspec2 (target, UNSPEC_IORF, res0, res1); +} + /* Recognize patterns suitable for the TRN instructions. */ static bool aarch64_evpc_trn (struct expand_vec_perm_d *d) @@ -13418,7 +14964,9 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + /* We don't need a big-endian lane correction for SVE; see the comment + at the head of aarch64-sve.md for details. */ + if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD) { x = in0, in0 = in1, in1 = x; odd = !odd; @@ -13454,7 +15002,9 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + /* We don't need a big-endian lane correction for SVE; see the comment + at the head of aarch64-sve.md for details. */ + if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD) { x = in0, in0 = in1, in1 = x; odd = !odd; @@ -13493,7 +15043,9 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + /* We don't need a big-endian lane correction for SVE; see the comment + at the head of aarch64-sve.md for details. */ + if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD) { x = in0, in0 = in1, in1 = x; high = !high; @@ -13515,7 +15067,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d) /* The first element always refers to the first vector. Check if the extracted indices are increasing by one. */ - if (!d->perm[0].is_constant (&location) + if (d->vec_flags == VEC_SVE_PRED + || !d->perm[0].is_constant (&location) || !d->perm.series_p (0, 1, location, 1)) return false; @@ -13524,9 +15077,11 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d) return true; /* The case where (location == 0) is a no-op for both big- and little-endian, - and is removed by the mid-end at optimization levels -O1 and higher. */ + and is removed by the mid-end at optimization levels -O1 and higher. - if (BYTES_BIG_ENDIAN && (location != 0)) + We don't need a big-endian lane correction for SVE; see the comment + at the head of aarch64-sve.md for details. */ + if (BYTES_BIG_ENDIAN && location != 0 && d->vec_flags == VEC_ADVSIMD) { /* After setup, we want the high elements of the first vector (stored at the LSB end of the register), and the low elements of the second @@ -13546,25 +15101,37 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d) return true; } -/* Recognize patterns for the REV insns. */ +/* Recognize patterns for the REV{64,32,16} insns, which reverse elements + within each 64-bit, 32-bit or 16-bit granule. */ static bool -aarch64_evpc_rev (struct expand_vec_perm_d *d) +aarch64_evpc_rev_local (struct expand_vec_perm_d *d) { HOST_WIDE_INT diff; unsigned int i, size, unspec; + machine_mode pred_mode; - if (!d->one_vector_p + if (d->vec_flags == VEC_SVE_PRED + || !d->one_vector_p || !d->perm[0].is_constant (&diff)) return false; size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode); if (size == 8) - unspec = UNSPEC_REV64; + { + unspec = UNSPEC_REV64; + pred_mode = VNx2BImode; + } else if (size == 4) - unspec = UNSPEC_REV32; + { + unspec = UNSPEC_REV32; + pred_mode = VNx4BImode; + } else if (size == 2) - unspec = UNSPEC_REV16; + { + unspec = UNSPEC_REV16; + pred_mode = VNx8BImode; + } else return false; @@ -13577,8 +15144,37 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d) if (d->testing_p) return true; - emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), - unspec)); + rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), unspec); + if (d->vec_flags == VEC_SVE_DATA) + { + rtx pred = force_reg (pred_mode, CONSTM1_RTX (pred_mode)); + src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (2, pred, src), + UNSPEC_MERGE_PTRUE); + } + emit_set_insn (d->target, src); + return true; +} + +/* Recognize patterns for the REV insn, which reverses elements within + a full vector. */ + +static bool +aarch64_evpc_rev_global (struct expand_vec_perm_d *d) +{ + poly_uint64 nelt = d->perm.length (); + + if (!d->one_vector_p || d->vec_flags != VEC_SVE_DATA) + return false; + + if (!d->perm.series_p (0, 1, nelt - 1, -1)) + return false; + + /* Success! */ + if (d->testing_p) + return true; + + rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), UNSPEC_REV); + emit_set_insn (d->target, src); return true; } @@ -13591,10 +15187,14 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d) machine_mode vmode = d->vmode; rtx lane; - if (d->perm.encoding ().encoded_nelts () != 1 + if (d->vec_flags == VEC_SVE_PRED + || d->perm.encoding ().encoded_nelts () != 1 || !d->perm[0].is_constant (&elt)) return false; + if (d->vec_flags == VEC_SVE_DATA && elt >= 64 * GET_MODE_UNIT_SIZE (vmode)) + return false; + /* Success! */ if (d->testing_p) return true; @@ -13616,7 +15216,7 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d) static bool aarch64_evpc_tbl (struct expand_vec_perm_d *d) { - rtx rperm[MAX_VECT_LEN], sel; + rtx rperm[MAX_COMPILE_TIME_VEC_BYTES], sel; machine_mode vmode = d->vmode; /* Make sure that the indices are constant. */ @@ -13652,6 +15252,27 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d) return true; } +/* Try to implement D using an SVE TBL instruction. */ + +static bool +aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d) +{ + unsigned HOST_WIDE_INT nelt; + + /* Permuting two variable-length vectors could overflow the + index range. */ + if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt)) + return false; + + if (d->testing_p) + return true; + + machine_mode sel_mode = mode_for_int_vector (d->vmode).require (); + rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm); + aarch64_expand_sve_vec_perm (d->target, d->op0, d->op1, sel); + return true; +} + static bool aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) { @@ -13665,9 +15286,14 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) std::swap (d->op0, d->op1); } - if (TARGET_SIMD && known_gt (nelt, 1)) + if ((d->vec_flags == VEC_ADVSIMD + || d->vec_flags == VEC_SVE_DATA + || d->vec_flags == VEC_SVE_PRED) + && known_gt (nelt, 1)) { - if (aarch64_evpc_rev (d)) + if (aarch64_evpc_rev_local (d)) + return true; + else if (aarch64_evpc_rev_global (d)) return true; else if (aarch64_evpc_ext (d)) return true; @@ -13679,7 +15305,10 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; else if (aarch64_evpc_trn (d)) return true; - return aarch64_evpc_tbl (d); + if (d->vec_flags == VEC_SVE_DATA) + return aarch64_evpc_sve_tbl (d); + else if (d->vec_flags == VEC_SVE_DATA) + return aarch64_evpc_tbl (d); } return false; } @@ -13711,6 +15340,7 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0, d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, sel.nelts_per_input ()); d.vmode = vmode; + d.vec_flags = aarch64_classify_vector_mode (d.vmode); d.target = target; d.op0 = op0; d.op1 = op1; @@ -13749,6 +15379,272 @@ aarch64_reverse_mask (machine_mode mode, unsigned int nunits) return force_reg (V16QImode, mask); } +/* Return true if X is a valid second operand for the SVE instruction + that implements integer comparison OP_CODE. */ + +static bool +aarch64_sve_cmp_operand_p (rtx_code op_code, rtx x) +{ + if (register_operand (x, VOIDmode)) + return true; + + switch (op_code) + { + case LTU: + case LEU: + case GEU: + case GTU: + return aarch64_sve_cmp_immediate_p (x, false); + case LT: + case LE: + case GE: + case GT: + case NE: + case EQ: + return aarch64_sve_cmp_immediate_p (x, true); + default: + gcc_unreachable (); + } +} + +/* Return the UNSPEC_COND_* code for comparison CODE. */ + +static unsigned int +aarch64_unspec_cond_code (rtx_code code) +{ + switch (code) + { + case NE: + return UNSPEC_COND_NE; + case EQ: + return UNSPEC_COND_EQ; + case LT: + return UNSPEC_COND_LT; + case GT: + return UNSPEC_COND_GT; + case LE: + return UNSPEC_COND_LE; + case GE: + return UNSPEC_COND_GE; + case LTU: + return UNSPEC_COND_LO; + case GTU: + return UNSPEC_COND_HI; + case LEU: + return UNSPEC_COND_LS; + case GEU: + return UNSPEC_COND_HS; + case UNORDERED: + return UNSPEC_COND_UO; + default: + gcc_unreachable (); + } +} + +/* Return an (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_) expression, + where is the operation associated with comparison CODE. */ + +static rtx +aarch64_gen_unspec_cond (rtx_code code, machine_mode pred_mode, + rtx pred, rtx op0, rtx op1) +{ + rtvec vec = gen_rtvec (3, pred, op0, op1); + return gen_rtx_UNSPEC (pred_mode, vec, aarch64_unspec_cond_code (code)); +} + +/* Expand an SVE integer comparison: + + TARGET = CODE (OP0, OP1). */ + +void +aarch64_expand_sve_vec_cmp_int (rtx target, rtx_code code, rtx op0, rtx op1) +{ + machine_mode pred_mode = GET_MODE (target); + machine_mode data_mode = GET_MODE (op0); + + if (!aarch64_sve_cmp_operand_p (code, op1)) + op1 = force_reg (data_mode, op1); + + rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode)); + rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, ptrue, op0, op1); + emit_insn (gen_set_clobber_cc (target, unspec)); +} + +/* Emit an instruction: + + (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_)) + + where is the operation associated with comparison CODE. */ + +static void +aarch64_emit_unspec_cond (rtx target, rtx_code code, machine_mode pred_mode, + rtx pred, rtx op0, rtx op1) +{ + rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, pred, op0, op1); + emit_set_insn (target, unspec); +} + +/* Emit: + + (set TMP1 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_)) + (set TMP2 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_)) + (set TARGET (and:PRED_MODE (ior:PRED_MODE TMP1 TMP2) PTRUE)) + + where is the operation associated with comparison CODEi. */ + +static void +aarch64_emit_unspec_cond_or (rtx target, rtx_code code1, rtx_code code2, + machine_mode pred_mode, rtx ptrue, + rtx op0, rtx op1) +{ + rtx tmp1 = gen_reg_rtx (pred_mode); + aarch64_emit_unspec_cond (tmp1, code1, pred_mode, ptrue, op0, op1); + rtx tmp2 = gen_reg_rtx (pred_mode); + aarch64_emit_unspec_cond (tmp2, code2, pred_mode, ptrue, op0, op1); + emit_set_insn (target, gen_rtx_AND (pred_mode, + gen_rtx_IOR (pred_mode, tmp1, tmp2), + ptrue)); +} + +/* If CAN_INVERT_P, emit an instruction: + + (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_)) + + where is the operation associated with comparison CODE. Otherwise + emit: + + (set TMP (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_)) + (set TARGET (and:PRED_MODE (not:PRED_MODE TMP) PTRUE)) + + where the second instructions sets TARGET to the inverse of TMP. */ + +static void +aarch64_emit_inverted_unspec_cond (rtx target, rtx_code code, + machine_mode pred_mode, rtx ptrue, rtx pred, + rtx op0, rtx op1, bool can_invert_p) +{ + if (can_invert_p) + aarch64_emit_unspec_cond (target, code, pred_mode, pred, op0, op1); + else + { + rtx tmp = gen_reg_rtx (pred_mode); + aarch64_emit_unspec_cond (tmp, code, pred_mode, pred, op0, op1); + emit_set_insn (target, gen_rtx_AND (pred_mode, + gen_rtx_NOT (pred_mode, tmp), + ptrue)); + } +} + +/* Expand an SVE floating-point comparison: + + TARGET = CODE (OP0, OP1) + + If CAN_INVERT_P is true, the caller can also handle inverted results; + return true if the result is in fact inverted. */ + +bool +aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code, + rtx op0, rtx op1, bool can_invert_p) +{ + machine_mode pred_mode = GET_MODE (target); + machine_mode data_mode = GET_MODE (op0); + + rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode)); + switch (code) + { + case UNORDERED: + /* UNORDERED has no immediate form. */ + op1 = force_reg (data_mode, op1); + aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1); + return false; + + case LT: + case LE: + case GT: + case GE: + case EQ: + case NE: + /* There is native support for the comparison. */ + aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1); + return false; + + case ORDERED: + /* There is native support for the inverse comparison. */ + op1 = force_reg (data_mode, op1); + aarch64_emit_inverted_unspec_cond (target, UNORDERED, + pred_mode, ptrue, ptrue, op0, op1, + can_invert_p); + return can_invert_p; + + case LTGT: + /* This is a trapping operation (LT or GT). */ + aarch64_emit_unspec_cond_or (target, LT, GT, pred_mode, ptrue, op0, op1); + return false; + + case UNEQ: + if (!flag_trapping_math) + { + /* This would trap for signaling NaNs. */ + op1 = force_reg (data_mode, op1); + aarch64_emit_unspec_cond_or (target, UNORDERED, EQ, + pred_mode, ptrue, op0, op1); + return false; + } + /* fall through */ + + case UNLT: + case UNLE: + case UNGT: + case UNGE: + { + rtx ordered = ptrue; + if (flag_trapping_math) + { + /* Only compare the elements that are known to be ordered. */ + ordered = gen_reg_rtx (pred_mode); + op1 = force_reg (data_mode, op1); + aarch64_emit_inverted_unspec_cond (ordered, UNORDERED, pred_mode, + ptrue, ptrue, op0, op1, false); + } + if (code == UNEQ) + code = NE; + else + code = reverse_condition_maybe_unordered (code); + aarch64_emit_inverted_unspec_cond (target, code, pred_mode, ptrue, + ordered, op0, op1, can_invert_p); + return can_invert_p; + } + + default: + gcc_unreachable (); + } +} + +/* Expand an SVE vcond pattern with operands OPS. DATA_MODE is the mode + of the data being selected and CMP_MODE is the mode of the values being + compared. */ + +void +aarch64_expand_sve_vcond (machine_mode data_mode, machine_mode cmp_mode, + rtx *ops) +{ + machine_mode pred_mode + = aarch64_get_mask_mode (GET_MODE_NUNITS (cmp_mode), + GET_MODE_SIZE (cmp_mode)).require (); + rtx pred = gen_reg_rtx (pred_mode); + if (FLOAT_MODE_P (cmp_mode)) + { + if (aarch64_expand_sve_vec_cmp_float (pred, GET_CODE (ops[3]), + ops[4], ops[5], true)) + std::swap (ops[1], ops[2]); + } + else + aarch64_expand_sve_vec_cmp_int (pred, GET_CODE (ops[3]), ops[4], ops[5]); + + rtvec vec = gen_rtvec (3, pred, ops[1], ops[2]); + emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL)); +} + /* Implement TARGET_MODES_TIEABLE_P. In principle we should always return true. However due to issues with register allocation it is preferable to avoid tieing integer scalar and FP scalar modes. Executing integer @@ -13765,8 +15661,12 @@ aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2) /* We specifically want to allow elements of "structure" modes to be tieable to the structure. This more general condition allows - other rarer situations too. */ - if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2)) + other rarer situations too. The reason we don't extend this to + predicate modes is that there are no predicate structure modes + nor any specific instructions for extracting part of a predicate + register. */ + if (aarch64_vector_data_mode_p (mode1) + && aarch64_vector_data_mode_p (mode2)) return true; /* Also allow any scalar modes with vectors. */ @@ -15020,6 +16920,19 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode, } } +/* Implement the TARGET_DWARF_POLY_INDETERMINATE_VALUE hook. */ + +static unsigned int +aarch64_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor, + int *offset) +{ + /* Polynomial invariant 1 == (VG / 2) - 1. */ + gcc_assert (i == 1); + *factor = 2; + *offset = 1; + return AARCH64_DWARF_VG; +} + /* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE if MODE is HFmode, and punt to the generic implementation otherwise. */ @@ -15112,6 +17025,38 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn) } } +/* Implement TARGET_COMPUTE_PRESSURE_CLASSES. */ + +static int +aarch64_compute_pressure_classes (reg_class *classes) +{ + int i = 0; + classes[i++] = GENERAL_REGS; + classes[i++] = FP_REGS; + /* PR_REGS isn't a useful pressure class because many predicate pseudo + registers need to go in PR_LO_REGS at some point during their + lifetime. Splitting it into two halves has the effect of making + all predicates count against PR_LO_REGS, so that we try whenever + possible to restrict the number of live predicates to 8. This + greatly reduces the amount of spilling in certain loops. */ + classes[i++] = PR_LO_REGS; + classes[i++] = PR_HI_REGS; + return i; +} + +/* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ + +static bool +aarch64_can_change_mode_class (machine_mode from, + machine_mode to, reg_class_t) +{ + /* See the comment at the head of aarch64-sve.md for details. */ + if (BYTES_BIG_ENDIAN + && (aarch64_sve_data_mode_p (from) != aarch64_sve_data_mode_p (to))) + return false; + return true; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -15260,6 +17205,11 @@ aarch64_run_selftests (void) #undef TARGET_FUNCTION_ARG_PADDING #define TARGET_FUNCTION_ARG_PADDING aarch64_function_arg_padding +#undef TARGET_GET_RAW_RESULT_MODE +#define TARGET_GET_RAW_RESULT_MODE aarch64_get_reg_raw_mode +#undef TARGET_GET_RAW_ARG_MODE +#define TARGET_GET_RAW_ARG_MODE aarch64_get_reg_raw_mode + #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL aarch64_function_ok_for_sibcall @@ -15468,6 +17418,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_VECTOR_ALIGNMENT #define TARGET_VECTOR_ALIGNMENT aarch64_simd_vector_alignment +#undef TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT +#define TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT \ + aarch64_vectorize_preferred_vector_alignment #undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE #define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \ aarch64_simd_vector_alignment_reachable @@ -15478,6 +17431,9 @@ aarch64_libgcc_floating_mode_supported_p #define TARGET_VECTORIZE_VEC_PERM_CONST \ aarch64_vectorize_vec_perm_const +#undef TARGET_VECTORIZE_GET_MASK_MODE +#define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode + #undef TARGET_INIT_LIBFUNCS #define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs @@ -15532,6 +17488,10 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_OMIT_STRUCT_RETURN_REG #define TARGET_OMIT_STRUCT_RETURN_REG true +#undef TARGET_DWARF_POLY_INDETERMINATE_VALUE +#define TARGET_DWARF_POLY_INDETERMINATE_VALUE \ + aarch64_dwarf_poly_indeterminate_value + /* The architecture reserves bits 0 and 1 so use bit 2 for descriptors. */ #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4 @@ -15551,6 +17511,12 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment +#undef TARGET_COMPUTE_PRESSURE_CLASSES +#define TARGET_COMPUTE_PRESSURE_CLASSES aarch64_compute_pressure_classes + +#undef TARGET_CAN_CHANGE_MODE_CLASS +#define TARGET_CAN_CHANGE_MODE_CLASS aarch64_can_change_mode_class + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 98e45171043..fc99fc4627e 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -144,18 +144,19 @@ extern unsigned aarch64_architecture_version; /* ARMv8.2-A architecture extensions. */ #define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */ #define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */ +#define AARCH64_FL_SVE (1 << 10) /* Has Scalable Vector Extensions. */ /* ARMv8.3-A architecture extensions. */ -#define AARCH64_FL_V8_3 (1 << 10) /* Has ARMv8.3-A features. */ -#define AARCH64_FL_RCPC (1 << 11) /* Has support for RCpc model. */ -#define AARCH64_FL_DOTPROD (1 << 12) /* Has ARMv8.2-A Dot Product ins. */ +#define AARCH64_FL_V8_3 (1 << 11) /* Has ARMv8.3-A features. */ +#define AARCH64_FL_RCPC (1 << 12) /* Has support for RCpc model. */ +#define AARCH64_FL_DOTPROD (1 << 13) /* Has ARMv8.2-A Dot Product ins. */ /* New flags to split crypto into aes and sha2. */ -#define AARCH64_FL_AES (1 << 13) /* Has Crypto AES. */ -#define AARCH64_FL_SHA2 (1 << 14) /* Has Crypto SHA2. */ +#define AARCH64_FL_AES (1 << 14) /* Has Crypto AES. */ +#define AARCH64_FL_SHA2 (1 << 15) /* Has Crypto SHA2. */ /* ARMv8.4-A architecture extensions. */ -#define AARCH64_FL_V8_4 (1 << 15) /* Has ARMv8.4-A features. */ -#define AARCH64_FL_SM4 (1 << 16) /* Has ARMv8.4-A SM3 and SM4. */ -#define AARCH64_FL_SHA3 (1 << 17) /* Has ARMv8.4-a SHA3 and SHA512. */ -#define AARCH64_FL_F16FML (1 << 18) /* Has ARMv8.4-a FP16 extensions. */ +#define AARCH64_FL_V8_4 (1 << 16) /* Has ARMv8.4-A features. */ +#define AARCH64_FL_SM4 (1 << 17) /* Has ARMv8.4-A SM3 and SM4. */ +#define AARCH64_FL_SHA3 (1 << 18) /* Has ARMv8.4-a SHA3 and SHA512. */ +#define AARCH64_FL_F16FML (1 << 19) /* Has ARMv8.4-a FP16 extensions. */ /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -186,6 +187,7 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_RDMA) #define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2) #define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16) +#define AARCH64_ISA_SVE (aarch64_isa_flags & AARCH64_FL_SVE) #define AARCH64_ISA_V8_3 (aarch64_isa_flags & AARCH64_FL_V8_3) #define AARCH64_ISA_DOTPROD (aarch64_isa_flags & AARCH64_FL_DOTPROD) #define AARCH64_ISA_AES (aarch64_isa_flags & AARCH64_FL_AES) @@ -226,6 +228,9 @@ extern unsigned aarch64_architecture_version; /* Dot Product is an optional extension to AdvSIMD enabled through +dotprod. */ #define TARGET_DOTPROD (TARGET_SIMD && AARCH64_ISA_DOTPROD) +/* SVE instructions, enabled through +sve. */ +#define TARGET_SVE (AARCH64_ISA_SVE) + /* ARMv8.3-A features. */ #define TARGET_ARMV8_3 (AARCH64_ISA_V8_3) @@ -286,8 +291,17 @@ extern unsigned aarch64_architecture_version; V0-V7 Parameter/result registers The vector register V0 holds scalar B0, H0, S0 and D0 in its least - significant bits. Unlike AArch32 S1 is not packed into D0, - etc. */ + significant bits. Unlike AArch32 S1 is not packed into D0, etc. + + P0-P7 Predicate low registers: valid in all predicate contexts + P8-P15 Predicate high registers: used as scratch space + + VG Pseudo "vector granules" register + + VG is the number of 64-bit elements in an SVE vector. We define + it as a hard register so that we can easily map it to the DWARF VG + register. GCC internally uses the poly_int variable aarch64_sve_vg + instead. */ /* Note that we don't mark X30 as a call-clobbered register. The idea is that it's really the call instructions themselves which clobber X30. @@ -308,7 +322,9 @@ extern unsigned aarch64_architecture_version; 0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \ 0, 0, 0, 0, 0, 0, 0, 0, /* V16 - V23 */ \ 0, 0, 0, 0, 0, 0, 0, 0, /* V24 - V31 */ \ - 1, 1, 1, /* SFP, AP, CC */ \ + 1, 1, 1, 1, /* SFP, AP, CC, VG */ \ + 0, 0, 0, 0, 0, 0, 0, 0, /* P0 - P7 */ \ + 0, 0, 0, 0, 0, 0, 0, 0, /* P8 - P15 */ \ } #define CALL_USED_REGISTERS \ @@ -321,7 +337,9 @@ extern unsigned aarch64_architecture_version; 0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* V16 - V23 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* V24 - V31 */ \ - 1, 1, 1, /* SFP, AP, CC */ \ + 1, 1, 1, 1, /* SFP, AP, CC, VG */ \ + 1, 1, 1, 1, 1, 1, 1, 1, /* P0 - P7 */ \ + 1, 1, 1, 1, 1, 1, 1, 1, /* P8 - P15 */ \ } #define REGISTER_NAMES \ @@ -334,7 +352,9 @@ extern unsigned aarch64_architecture_version; "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", \ "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", \ "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", \ - "sfp", "ap", "cc", \ + "sfp", "ap", "cc", "vg", \ + "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7", \ + "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", \ } /* Generate the register aliases for core register N */ @@ -345,7 +365,8 @@ extern unsigned aarch64_architecture_version; {"d" # N, V0_REGNUM + (N)}, \ {"s" # N, V0_REGNUM + (N)}, \ {"h" # N, V0_REGNUM + (N)}, \ - {"b" # N, V0_REGNUM + (N)} + {"b" # N, V0_REGNUM + (N)}, \ + {"z" # N, V0_REGNUM + (N)} /* Provide aliases for all of the ISA defined register name forms. These aliases are convenient for use in the clobber lists of inline @@ -387,7 +408,7 @@ extern unsigned aarch64_architecture_version; #define FRAME_POINTER_REGNUM SFP_REGNUM #define STACK_POINTER_REGNUM SP_REGNUM #define ARG_POINTER_REGNUM AP_REGNUM -#define FIRST_PSEUDO_REGISTER 67 +#define FIRST_PSEUDO_REGISTER (P15_REGNUM + 1) /* The number of (integer) argument register available. */ #define NUM_ARG_REGS 8 @@ -408,6 +429,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_DWARF_NUMBER_R 31 #define AARCH64_DWARF_SP 31 +#define AARCH64_DWARF_VG 46 +#define AARCH64_DWARF_P0 48 #define AARCH64_DWARF_V0 64 /* The number of V registers. */ @@ -472,6 +495,12 @@ extern unsigned aarch64_architecture_version; #define FP_LO_REGNUM_P(REGNO) \ (((unsigned) (REGNO - V0_REGNUM)) <= (V15_REGNUM - V0_REGNUM)) +#define PR_REGNUM_P(REGNO)\ + (((unsigned) (REGNO - P0_REGNUM)) <= (P15_REGNUM - P0_REGNUM)) + +#define PR_LO_REGNUM_P(REGNO)\ + (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM)) + /* Register and constant classes. */ @@ -485,6 +514,9 @@ enum reg_class FP_LO_REGS, FP_REGS, POINTER_AND_FP_REGS, + PR_LO_REGS, + PR_HI_REGS, + PR_REGS, ALL_REGS, LIM_REG_CLASSES /* Last */ }; @@ -501,6 +533,9 @@ enum reg_class "FP_LO_REGS", \ "FP_REGS", \ "POINTER_AND_FP_REGS", \ + "PR_LO_REGS", \ + "PR_HI_REGS", \ + "PR_REGS", \ "ALL_REGS" \ } @@ -514,7 +549,10 @@ enum reg_class { 0x00000000, 0x0000ffff, 0x00000000 }, /* FP_LO_REGS */ \ { 0x00000000, 0xffffffff, 0x00000000 }, /* FP_REGS */ \ { 0xffffffff, 0xffffffff, 0x00000003 }, /* POINTER_AND_FP_REGS */\ - { 0xffffffff, 0xffffffff, 0x00000007 } /* ALL_REGS */ \ + { 0x00000000, 0x00000000, 0x00000ff0 }, /* PR_LO_REGS */ \ + { 0x00000000, 0x00000000, 0x000ff000 }, /* PR_HI_REGS */ \ + { 0x00000000, 0x00000000, 0x000ffff0 }, /* PR_REGS */ \ + { 0xffffffff, 0xffffffff, 0x000fffff } /* ALL_REGS */ \ } #define REGNO_REG_CLASS(REGNO) aarch64_regno_regclass (REGNO) @@ -998,4 +1036,28 @@ extern tree aarch64_fp16_ptr_type_node; #define LIBGCC2_UNWIND_ATTRIBUTE \ __attribute__((optimize ("no-omit-frame-pointer"))) +#ifndef USED_FOR_TARGET +extern poly_uint16 aarch64_sve_vg; + +/* The number of bits and bytes in an SVE vector. */ +#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64)) +#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8)) + +/* The number of bytes in an SVE predicate. */ +#define BYTES_PER_SVE_PRED aarch64_sve_vg + +/* The SVE mode for a vector of bytes. */ +#define SVE_BYTE_MODE VNx16QImode + +/* The maximum number of bytes in a fixed-size vector. This is 256 bytes + (for -msve-vector-bits=2048) multiplied by the maximum number of + vectors in a structure mode (4). + + This limit must not be used for variable-size vectors, since + VL-agnostic code must work with arbitary vector lengths. */ +#define MAX_COMPILE_TIME_VEC_BYTES (256 * 4) +#endif + +#define REGMODE_NATURAL_SIZE(MODE) aarch64_regmode_natural_size (MODE) + #endif /* GCC_AARCH64_H */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 854c44830e6..728136a7fba 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -63,6 +63,11 @@ (SFP_REGNUM 64) (AP_REGNUM 65) (CC_REGNUM 66) + ;; Defined only to make the DWARF description simpler. + (VG_REGNUM 67) + (P0_REGNUM 68) + (P7_REGNUM 75) + (P15_REGNUM 83) ] ) @@ -114,6 +119,7 @@ UNSPEC_PACI1716 UNSPEC_PACISP UNSPEC_PRLG_STK + UNSPEC_REV UNSPEC_RBIT UNSPEC_SCVTF UNSPEC_SISD_NEG @@ -143,6 +149,18 @@ UNSPEC_RSQRTS UNSPEC_NZCV UNSPEC_XPACLRI + UNSPEC_LD1_SVE + UNSPEC_ST1_SVE + UNSPEC_LD1RQ + UNSPEC_MERGE_PTRUE + UNSPEC_PTEST_PTRUE + UNSPEC_UNPACKSHI + UNSPEC_UNPACKUHI + UNSPEC_UNPACKSLO + UNSPEC_UNPACKULO + UNSPEC_PACK + UNSPEC_FLOAT_CONVERT + UNSPEC_WHILE_LO ]) (define_c_enum "unspecv" [ @@ -194,6 +212,11 @@ ;; will be disabled when !TARGET_SIMD. (define_attr "simd" "no,yes" (const_string "no")) +;; Attribute that specifies whether or not the instruction uses SVE. +;; When this is set to yes for an alternative, that alternative +;; will be disabled when !TARGET_SVE. +(define_attr "sve" "no,yes" (const_string "no")) + (define_attr "length" "" (const_int 4)) @@ -202,13 +225,14 @@ ;; registers when -mgeneral-regs-only is specified. (define_attr "enabled" "no,yes" (cond [(ior - (ior - (and (eq_attr "fp" "yes") - (eq (symbol_ref "TARGET_FLOAT") (const_int 0))) - (and (eq_attr "simd" "yes") - (eq (symbol_ref "TARGET_SIMD") (const_int 0)))) + (and (eq_attr "fp" "yes") + (eq (symbol_ref "TARGET_FLOAT") (const_int 0))) + (and (eq_attr "simd" "yes") + (eq (symbol_ref "TARGET_SIMD") (const_int 0))) (and (eq_attr "fp16" "yes") - (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0)))) + (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0))) + (and (eq_attr "sve" "yes") + (eq (symbol_ref "TARGET_SVE") (const_int 0)))) (const_string "no") ] (const_string "yes"))) @@ -866,12 +890,18 @@ " if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx) operands[1] = force_reg (mode, operands[1]); + + if (GET_CODE (operands[1]) == CONST_POLY_INT) + { + aarch64_expand_mov_immediate (operands[0], operands[1]); + DONE; + } " ) (define_insn "*mov_aarch64" - [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r,*w, m, m, r,*w,*w") - (match_operand:SHORT 1 "general_operand" " r,M,D,m, m,rZ,*w,*w, r,*w"))] + [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r ,r,*w, m, m, r,*w,*w") + (match_operand:SHORT 1 "aarch64_mov_operand" " r,M,D,Usv,m, m,rZ,*w,*w, r,*w"))] "(register_operand (operands[0], mode) || aarch64_reg_or_zero (operands[1], mode))" { @@ -885,26 +915,30 @@ return aarch64_output_scalar_simd_mov_immediate (operands[1], mode); case 3: - return "ldr\t%w0, %1"; + return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]); case 4: - return "ldr\t%0, %1"; + return "ldr\t%w0, %1"; case 5: - return "str\t%w1, %0"; + return "ldr\t%0, %1"; case 6: - return "str\t%1, %0"; + return "str\t%w1, %0"; case 7: - return "umov\t%w0, %1.[0]"; + return "str\t%1, %0"; case 8: - return "dup\t%0., %w1"; + return "umov\t%w0, %1.[0]"; case 9: + return "dup\t%0., %w1"; + case 10: return "dup\t%0, %1.[0]"; default: gcc_unreachable (); } } - [(set_attr "type" "mov_reg,mov_imm,neon_move,load_4,load_4,store_4,store_4,\ - neon_to_gp,neon_from_gp,neon_dup") - (set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes")] + ;; The "mov_imm" type for CNT is just a placeholder. + [(set_attr "type" "mov_reg,mov_imm,neon_move,mov_imm,load_4,load_4,store_4, + store_4,neon_to_gp,neon_from_gp,neon_dup") + (set_attr "simd" "*,*,yes,*,*,*,*,*,yes,yes,yes") + (set_attr "sve" "*,*,*,yes,*,*,*,*,*,*,*")] ) (define_expand "mov" @@ -932,8 +966,8 @@ ) (define_insn_and_split "*movsi_aarch64" - [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,w, m, m, r, r, w,r,w, w") - (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))] + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m, r, r, w,r,w, w") + (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))] "(register_operand (operands[0], SImode) || aarch64_reg_or_zero (operands[1], SImode))" "@ @@ -942,6 +976,7 @@ mov\\t%w0, %w1 mov\\t%w0, %1 # + * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]); ldr\\t%w0, %1 ldr\\t%s0, %1 str\\t%w1, %0 @@ -959,15 +994,17 @@ aarch64_expand_mov_immediate (operands[0], operands[1]); DONE; }" - [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load_4,load_4,store_4,store_4,\ - adr,adr,f_mcr,f_mrc,fmov,neon_move") - (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*") - (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")] + ;; The "mov_imm" type for CNT is just a placeholder. + [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4, + load_4,store_4,store_4,adr,adr,f_mcr,f_mrc,fmov,neon_move") + (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*") + (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes") + (set_attr "sve" "*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")] ) (define_insn_and_split "*movdi_aarch64" - [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r,w, m,m, r, r, w,r,w, w") - (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))] + [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r, r,w, m,m, r, r, w,r,w, w") + (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))] "(register_operand (operands[0], DImode) || aarch64_reg_or_zero (operands[1], DImode))" "@ @@ -977,6 +1014,7 @@ mov\\t%x0, %1 mov\\t%w0, %1 # + * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]); ldr\\t%x0, %1 ldr\\t%d0, %1 str\\t%x1, %0 @@ -994,10 +1032,13 @@ aarch64_expand_mov_immediate (operands[0], operands[1]); DONE; }" - [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_8,\ - load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,neon_move") - (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*") - (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")] + ;; The "mov_imm" type for CNTD is just a placeholder. + [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_imm, + load_8,load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov, + neon_move") + (set_attr "fp" "*,*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*") + (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes") + (set_attr "sve" "*,*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")] ) (define_insn "insv_imm" @@ -1018,6 +1059,14 @@ " if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx) operands[1] = force_reg (TImode, operands[1]); + + if (GET_CODE (operands[1]) == CONST_POLY_INT) + { + emit_move_insn (gen_lowpart (DImode, operands[0]), + gen_lowpart (DImode, operands[1])); + emit_move_insn (gen_highpart (DImode, operands[0]), const0_rtx); + DONE; + } " ) @@ -1542,7 +1591,7 @@ [(set (match_operand:GPI 0 "register_operand" "") (plus:GPI (match_operand:GPI 1 "register_operand" "") - (match_operand:GPI 2 "aarch64_pluslong_operand" "")))] + (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "")))] "" { /* If operands[1] is a subreg extract the inner RTX. */ @@ -1555,23 +1604,34 @@ && (!REG_P (op1) || !REGNO_PTR_FRAME_P (REGNO (op1)))) operands[2] = force_reg (mode, operands[2]); + /* Expand polynomial additions now if the destination is the stack + pointer, since we don't want to use that as a temporary. */ + else if (operands[0] == stack_pointer_rtx + && aarch64_split_add_offset_immediate (operands[2], mode)) + { + aarch64_split_add_offset (mode, operands[0], operands[1], + operands[2], NULL_RTX, NULL_RTX); + DONE; + } }) (define_insn "*add3_aarch64" [(set - (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r") + (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,rk") (plus:GPI - (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk") - (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))] + (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,rk") + (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uav")))] "" "@ add\\t%0, %1, %2 add\\t%0, %1, %2 add\\t%0, %1, %2 sub\\t%0, %1, #%n2 - #" - [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple") - (set_attr "simd" "*,*,yes,*,*")] + # + * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);" + ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder. + [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm") + (set_attr "simd" "*,*,yes,*,*,*")] ) ;; zero_extend version of above @@ -1633,6 +1693,48 @@ } ) +;; Match addition of polynomial offsets that require one temporary, for which +;; we can use the early-clobbered destination register. This is a separate +;; pattern so that the early clobber doesn't affect register allocation +;; for other forms of addition. However, we still need to provide an +;; all-register alternative, in case the offset goes out of range after +;; elimination. For completeness we might as well provide all GPR-based +;; alternatives from the main pattern. +;; +;; We don't have a pattern for additions requiring two temporaries since at +;; present LRA doesn't allow new scratches to be added during elimination. +;; Such offsets should be rare anyway. +;; +;; ??? But if we added LRA support for new scratches, much of the ugliness +;; here would go away. We could just handle all polynomial constants in +;; this pattern. +(define_insn_and_split "*add3_poly_1" + [(set + (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,&r") + (plus:GPI + (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,rk") + (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "I,r,J,Uaa,Uav,Uat")))] + "TARGET_SVE && operands[0] != stack_pointer_rtx" + "@ + add\\t%0, %1, %2 + add\\t%0, %1, %2 + sub\\t%0, %1, #%n2 + # + * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]); + #" + "&& epilogue_completed + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && aarch64_split_add_offset_immediate (operands[2], mode)" + [(const_int 0)] + { + aarch64_split_add_offset (mode, operands[0], operands[1], + operands[2], operands[0], NULL_RTX); + DONE; + } + ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder. + [(set_attr "type" "alu_imm,alu_sreg,alu_imm,multiple,alu_imm,multiple")] +) + (define_split [(set (match_operand:DI 0 "register_operand") (zero_extend:DI @@ -5797,6 +5899,12 @@ DONE; }) +;; Helper for aarch64.c code. +(define_expand "set_clobber_cc" + [(parallel [(set (match_operand 0) + (match_operand 1)) + (clobber (reg:CC CC_REGNUM))])]) + ;; AdvSIMD Stuff (include "aarch64-simd.md") @@ -5805,3 +5913,6 @@ ;; ldp/stp peephole patterns (include "aarch64-ldpstp.md") + +;; SVE. +(include "aarch64-sve.md") diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 18bf0e30fd1..52eaf8c6f40 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -185,6 +185,32 @@ Enable the division approximation. Enabling this reduces precision of division results to about 16 bits for single precision and to 32 bits for double precision. +Enum +Name(sve_vector_bits) Type(enum aarch64_sve_vector_bits_enum) +The possible SVE vector lengths: + +EnumValue +Enum(sve_vector_bits) String(scalable) Value(SVE_SCALABLE) + +EnumValue +Enum(sve_vector_bits) String(128) Value(SVE_128) + +EnumValue +Enum(sve_vector_bits) String(256) Value(SVE_256) + +EnumValue +Enum(sve_vector_bits) String(512) Value(SVE_512) + +EnumValue +Enum(sve_vector_bits) String(1024) Value(SVE_1024) + +EnumValue +Enum(sve_vector_bits) String(2048) Value(SVE_2048) + +msve-vector-bits= +Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE) +-msve-vector-bits=N Set the number of bits in an SVE vector register to N. + mverbose-cost-dump Common Undocumented Var(flag_aarch64_verbose_cost) Enables verbose cost model dumping in the debug dump files. diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 18adbc691ec..b004f7888e1 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -27,6 +27,12 @@ (define_register_constraint "w" "FP_REGS" "Floating point and SIMD vector registers.") +(define_register_constraint "Upa" "PR_REGS" + "SVE predicate registers p0 - p15.") + +(define_register_constraint "Upl" "PR_LO_REGS" + "SVE predicate registers p0 - p7.") + (define_register_constraint "x" "FP_LO_REGS" "Floating point and SIMD vector registers V0 - V15.") @@ -40,6 +46,18 @@ (and (match_code "const_int") (match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)"))) +(define_constraint "Uav" + "@internal + A constraint that matches a VG-based constant that can be added by + a single ADDVL or ADDPL." + (match_operand 0 "aarch64_sve_addvl_addpl_immediate")) + +(define_constraint "Uat" + "@internal + A constraint that matches a VG-based constant that can be added by + using multiple instructions, with one temporary register." + (match_operand 0 "aarch64_split_add_offset_immediate")) + (define_constraint "J" "A constant that can be used with a SUB operation (once negated)." (and (match_code "const_int") @@ -134,6 +152,18 @@ A constraint that matches the immediate constant -1." (match_test "op == constm1_rtx")) +(define_constraint "Usv" + "@internal + A constraint that matches a VG-based constant that can be loaded by + a single CNT[BHWD]." + (match_operand 0 "aarch64_sve_cnt_immediate")) + +(define_constraint "Usi" + "@internal + A constraint that matches an immediate operand valid for + the SVE INDEX instruction." + (match_operand 0 "aarch64_sve_index_immediate")) + (define_constraint "Ui1" "@internal A constraint that matches the immediate constant +1." @@ -192,6 +222,13 @@ (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 1, ADDR_QUERY_LDP_STP)"))) +(define_memory_constraint "Utr" + "@internal + An address valid for SVE LDR and STR instructions (as distinct from + LD[1234] and ST[1234] patterns)." + (and (match_code "mem") + (match_test "aarch64_sve_ldr_operand_p (op)"))) + (define_memory_constraint "Utv" "@internal An address valid for loading/storing opaque structure @@ -206,6 +243,12 @@ (match_test "aarch64_legitimate_address_p (V2DImode, XEXP (op, 0), 1)"))) +(define_memory_constraint "Uty" + "@internal + An address valid for SVE LD1Rs." + (and (match_code "mem") + (match_test "aarch64_sve_ld1r_operand_p (op)"))) + (define_constraint "Ufc" "A floating point constant which can be used with an\ FMOV immediate operation." @@ -235,7 +278,7 @@ (define_constraint "Dn" "@internal A constraint that matches vector of immediates." - (and (match_code "const_vector") + (and (match_code "const,const_vector") (match_test "aarch64_simd_valid_immediate (op, NULL)"))) (define_constraint "Dh" @@ -257,21 +300,27 @@ (define_constraint "Dl" "@internal A constraint that matches vector of immediates for left shifts." - (and (match_code "const_vector") + (and (match_code "const,const_vector") (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op), true)"))) (define_constraint "Dr" "@internal A constraint that matches vector of immediates for right shifts." - (and (match_code "const_vector") + (and (match_code "const,const_vector") (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op), false)"))) (define_constraint "Dz" "@internal - A constraint that matches vector of immediate zero." - (and (match_code "const_vector") - (match_test "aarch64_simd_imm_zero_p (op, GET_MODE (op))"))) + A constraint that matches a vector of immediate zero." + (and (match_code "const,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) + +(define_constraint "Dm" + "@internal + A constraint that matches a vector of immediate minus one." + (and (match_code "const,const_vector") + (match_test "op == CONST1_RTX (GET_MODE (op))"))) (define_constraint "Dd" "@internal @@ -291,3 +340,62 @@ "@internal An address valid for a prefetch instruction." (match_test "aarch64_address_valid_for_prefetch_p (op, true)")) + +(define_constraint "vsa" + "@internal + A constraint that matches an immediate operand valid for SVE + arithmetic instructions." + (match_operand 0 "aarch64_sve_arith_immediate")) + +(define_constraint "vsc" + "@internal + A constraint that matches a signed immediate operand valid for SVE + CMP instructions." + (match_operand 0 "aarch64_sve_cmp_vsc_immediate")) + +(define_constraint "vsd" + "@internal + A constraint that matches an unsigned immediate operand valid for SVE + CMP instructions." + (match_operand 0 "aarch64_sve_cmp_vsd_immediate")) + +(define_constraint "vsi" + "@internal + A constraint that matches a vector count operand valid for SVE INC and + DEC instructions." + (match_operand 0 "aarch64_sve_inc_dec_immediate")) + +(define_constraint "vsn" + "@internal + A constraint that matches an immediate operand whose negative + is valid for SVE SUB instructions." + (match_operand 0 "aarch64_sve_sub_arith_immediate")) + +(define_constraint "vsl" + "@internal + A constraint that matches an immediate operand valid for SVE logical + operations." + (match_operand 0 "aarch64_sve_logical_immediate")) + +(define_constraint "vsm" + "@internal + A constraint that matches an immediate operand valid for SVE MUL + operations." + (match_operand 0 "aarch64_sve_mul_immediate")) + +(define_constraint "vsA" + "@internal + A constraint that matches an immediate operand valid for SVE FADD + and FSUB operations." + (match_operand 0 "aarch64_sve_float_arith_immediate")) + +(define_constraint "vsM" + "@internal + A constraint that matches an imediate operand valid for SVE FMUL + operations." + (match_operand 0 "aarch64_sve_float_mul_immediate")) + +(define_constraint "vsN" + "@internal + A constraint that matches the negative of vsA" + (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate")) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e199dfdb4ea..0fe42edbc61 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -56,20 +56,20 @@ ;; Iterator for all scalar floating point modes (SF, DF and TF) (define_mode_iterator GPF_TF [SF DF TF]) -;; Integer vector modes. +;; Integer Advanced SIMD modes. (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI]) -;; vector and scalar, 64 & 128-bit container, all integer modes +;; Advanced SIMD and scalar, 64 & 128-bit container, all integer modes. (define_mode_iterator VSDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI QI HI SI DI]) -;; vector and scalar, 64 & 128-bit container: all vector integer modes; -;; 64-bit scalar integer mode +;; Advanced SIMD and scalar, 64 & 128-bit container: all Advanced SIMD +;; integer modes; 64-bit scalar integer mode. (define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI]) ;; Double vector modes. (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF]) -;; vector, 64-bit container, all integer modes +;; Advanced SIMD, 64-bit container, all integer modes. (define_mode_iterator VD_BHSI [V8QI V4HI V2SI]) ;; 128 and 64-bit container; 8, 16, 32-bit vector integer modes @@ -94,16 +94,16 @@ ;; pointer-sized quantities. Exactly one of the two alternatives will match. (define_mode_iterator PTR [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")]) -;; Vector Float modes suitable for moving, loading and storing. +;; Advanced SIMD Float modes suitable for moving, loading and storing. (define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF]) -;; Vector Float modes. +;; Advanced SIMD Float modes. (define_mode_iterator VDQF [V2SF V4SF V2DF]) (define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") V2SF V4SF V2DF]) -;; Vector Float modes, and DF. +;; Advanced SIMD Float modes, and DF. (define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") V2SF V4SF V2DF DF]) @@ -113,7 +113,7 @@ (HF "TARGET_SIMD_F16INST") SF DF]) -;; Vector single Float modes. +;; Advanced SIMD single Float modes. (define_mode_iterator VDQSF [V2SF V4SF]) ;; Quad vector Float modes with half/single elements. @@ -122,16 +122,16 @@ ;; Modes suitable to use as the return type of a vcond expression. (define_mode_iterator VDQF_COND [V2SF V2SI V4SF V4SI V2DF V2DI]) -;; All Float modes. +;; All scalar and Advanced SIMD Float modes. (define_mode_iterator VALLF [V2SF V4SF V2DF SF DF]) -;; Vector Float modes with 2 elements. +;; Advanced SIMD Float modes with 2 elements. (define_mode_iterator V2F [V2SF V2DF]) -;; All vector modes on which we support any arithmetic operations. +;; All Advanced SIMD modes on which we support any arithmetic operations. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) -;; All vector modes suitable for moving, loading, and storing. +;; All Advanced SIMD modes suitable for moving, loading, and storing. (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V2SF V4SF V2DF]) @@ -139,21 +139,21 @@ (define_mode_iterator VALL_F16_NO_V2Q [V8QI V16QI V4HI V8HI V2SI V4SI V4HF V8HF V2SF V4SF]) -;; All vector modes barring HF modes, plus DI. +;; All Advanced SIMD modes barring HF modes, plus DI. (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI]) -;; All vector modes and DI. +;; All Advanced SIMD modes and DI. (define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V2SF V4SF V2DF DI]) -;; All vector modes, plus DI and DF. +;; All Advanced SIMD modes, plus DI and DF. (define_mode_iterator VALLDIF [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V4HF V8HF V2SF V4SF V2DF DI DF]) -;; Vector modes for Integer reduction across lanes. +;; Advanced SIMD modes for Integer reduction across lanes. (define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI V2DI]) -;; Vector modes(except V2DI) for Integer reduction across lanes. +;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes. (define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI]) ;; All double integer narrow-able modes. @@ -162,7 +162,8 @@ ;; All quad integer narrow-able modes. (define_mode_iterator VQN [V8HI V4SI V2DI]) -;; Vector and scalar 128-bit container: narrowable 16, 32, 64-bit integer modes +;; Advanced SIMD and scalar 128-bit container: narrowable 16, 32, 64-bit +;; integer modes (define_mode_iterator VSQN_HSDI [V8HI V4SI V2DI HI SI DI]) ;; All quad integer widen-able modes. @@ -171,54 +172,54 @@ ;; Double vector modes for combines. (define_mode_iterator VDC [V8QI V4HI V4HF V2SI V2SF DI DF]) -;; Vector modes except double int. +;; Advanced SIMD modes except double int. (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF]) (define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V4HF V8HF V2SF V4SF V2DF]) -;; Vector modes for S type. +;; Advanced SIMD modes for S type. (define_mode_iterator VDQ_SI [V2SI V4SI]) -;; Vector modes for S and D +;; Advanced SIMD modes for S and D. (define_mode_iterator VDQ_SDI [V2SI V4SI V2DI]) -;; Vector modes for H, S and D +;; Advanced SIMD modes for H, S and D. (define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST") (V8HI "TARGET_SIMD_F16INST") V2SI V4SI V2DI]) -;; Scalar and Vector modes for S and D +;; Scalar and Advanced SIMD modes for S and D. (define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI]) -;; Scalar and Vector modes for S and D, Vector modes for H. +;; Scalar and Advanced SIMD modes for S and D, Advanced SIMD modes for H. (define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST") (V8HI "TARGET_SIMD_F16INST") V2SI V4SI V2DI (HI "TARGET_SIMD_F16INST") SI DI]) -;; Vector modes for Q and H types. +;; Advanced SIMD modes for Q and H types. (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI]) -;; Vector modes for H and S types. +;; Advanced SIMD modes for H and S types. (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI]) -;; Vector modes for H, S and D types. +;; Advanced SIMD modes for H, S and D types. (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI]) -;; Vector and scalar integer modes for H and S +;; Advanced SIMD and scalar integer modes for H and S. (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI]) -;; Vector and scalar 64-bit container: 16, 32-bit integer modes +;; Advanced SIMD and scalar 64-bit container: 16, 32-bit integer modes. (define_mode_iterator VSD_HSI [V4HI V2SI HI SI]) -;; Vector 64-bit container: 16, 32-bit integer modes +;; Advanced SIMD 64-bit container: 16, 32-bit integer modes. (define_mode_iterator VD_HSI [V4HI V2SI]) ;; Scalar 64-bit container: 16, 32-bit integer modes (define_mode_iterator SD_HSI [HI SI]) -;; Vector 64-bit container: 16, 32-bit integer modes +;; Advanced SIMD 64-bit container: 16, 32-bit integer modes. (define_mode_iterator VQ_HSI [V8HI V4SI]) ;; All byte modes. @@ -229,21 +230,59 @@ (define_mode_iterator TX [TI TF]) -;; Opaque structure modes. +;; Advanced SIMD opaque structure modes. (define_mode_iterator VSTRUCT [OI CI XI]) ;; Double scalar modes (define_mode_iterator DX [DI DF]) -;; Modes available for mul lane operations. +;; Modes available for Advanced SIMD mul lane operations. (define_mode_iterator VMUL [V4HI V8HI V2SI V4SI (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST") V2SF V4SF V2DF]) -;; Modes available for mul lane operations changing lane count. +;; Modes available for Advanced SIMD mul lane operations changing lane +;; count. (define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF]) +;; All SVE vector modes. +(define_mode_iterator SVE_ALL [VNx16QI VNx8HI VNx4SI VNx2DI + VNx8HF VNx4SF VNx2DF]) + +;; All SVE vector modes that have 8-bit or 16-bit elements. +(define_mode_iterator SVE_BH [VNx16QI VNx8HI VNx8HF]) + +;; All SVE vector modes that have 8-bit, 16-bit or 32-bit elements. +(define_mode_iterator SVE_BHS [VNx16QI VNx8HI VNx4SI VNx8HF VNx4SF]) + +;; All SVE integer vector modes that have 8-bit, 16-bit or 32-bit elements. +(define_mode_iterator SVE_BHSI [VNx16QI VNx8HI VNx4SI]) + +;; All SVE integer vector modes that have 16-bit, 32-bit or 64-bit elements. +(define_mode_iterator SVE_HSDI [VNx16QI VNx8HI VNx4SI]) + +;; All SVE floating-point vector modes that have 16-bit or 32-bit elements. +(define_mode_iterator SVE_HSF [VNx8HF VNx4SF]) + +;; All SVE vector modes that have 32-bit or 64-bit elements. +(define_mode_iterator SVE_SD [VNx4SI VNx2DI VNx4SF VNx2DF]) + +;; All SVE integer vector modes that have 32-bit or 64-bit elements. +(define_mode_iterator SVE_SDI [VNx4SI VNx2DI]) + +;; All SVE integer vector modes. +(define_mode_iterator SVE_I [VNx16QI VNx8HI VNx4SI VNx2DI]) + +;; All SVE floating-point vector modes. +(define_mode_iterator SVE_F [VNx8HF VNx4SF VNx2DF]) + +;; All SVE predicate modes. +(define_mode_iterator PRED_ALL [VNx16BI VNx8BI VNx4BI VNx2BI]) + +;; SVE predicate modes that control 8-bit, 16-bit or 32-bit elements. +(define_mode_iterator PRED_BHS [VNx16BI VNx8BI VNx4BI]) + ;; ------------------------------------------------------------------ ;; Unspec enumerations for Advance SIMD. These could well go into ;; aarch64.md but for their use in int_iterators here. @@ -378,6 +417,22 @@ UNSPEC_FMLSL ; Used in aarch64-simd.md. UNSPEC_FMLAL2 ; Used in aarch64-simd.md. UNSPEC_FMLSL2 ; Used in aarch64-simd.md. + UNSPEC_SEL ; Used in aarch64-sve.md. + UNSPEC_ANDF ; Used in aarch64-sve.md. + UNSPEC_IORF ; Used in aarch64-sve.md. + UNSPEC_XORF ; Used in aarch64-sve.md. + UNSPEC_COND_LT ; Used in aarch64-sve.md. + UNSPEC_COND_LE ; Used in aarch64-sve.md. + UNSPEC_COND_EQ ; Used in aarch64-sve.md. + UNSPEC_COND_NE ; Used in aarch64-sve.md. + UNSPEC_COND_GE ; Used in aarch64-sve.md. + UNSPEC_COND_GT ; Used in aarch64-sve.md. + UNSPEC_COND_LO ; Used in aarch64-sve.md. + UNSPEC_COND_LS ; Used in aarch64-sve.md. + UNSPEC_COND_HS ; Used in aarch64-sve.md. + UNSPEC_COND_HI ; Used in aarch64-sve.md. + UNSPEC_COND_UO ; Used in aarch64-sve.md. + UNSPEC_LASTB ; Used in aarch64-sve.md. ]) ;; ------------------------------------------------------------------ @@ -535,17 +590,24 @@ (HI "")]) ;; Mode-to-individual element type mapping. -(define_mode_attr Vetype [(V8QI "b") (V16QI "b") - (V4HI "h") (V8HI "h") - (V2SI "s") (V4SI "s") - (V2DI "d") (V4HF "h") - (V8HF "h") (V2SF "s") - (V4SF "s") (V2DF "d") +(define_mode_attr Vetype [(V8QI "b") (V16QI "b") (VNx16QI "b") (VNx16BI "b") + (V4HI "h") (V8HI "h") (VNx8HI "h") (VNx8BI "h") + (V2SI "s") (V4SI "s") (VNx4SI "s") (VNx4BI "s") + (V2DI "d") (VNx2DI "d") (VNx2BI "d") + (V4HF "h") (V8HF "h") (VNx8HF "h") + (V2SF "s") (V4SF "s") (VNx4SF "s") + (V2DF "d") (VNx2DF "d") (HF "h") (SF "s") (DF "d") (QI "b") (HI "h") (SI "s") (DI "d")]) +;; Equivalent of "size" for a vector element. +(define_mode_attr Vesize [(VNx16QI "b") + (VNx8HI "h") (VNx8HF "h") + (VNx4SI "w") (VNx4SF "w") + (VNx2DI "d") (VNx2DF "d")]) + ;; Vetype is used everywhere in scheduling type and assembly output, ;; sometimes they are not the same, for example HF modes on some ;; instructions. stype is defined to represent scheduling type @@ -567,27 +629,45 @@ (SI "8b")]) ;; Define element mode for each vector mode. -(define_mode_attr VEL [(V8QI "QI") (V16QI "QI") - (V4HI "HI") (V8HI "HI") - (V2SI "SI") (V4SI "SI") - (DI "DI") (V2DI "DI") - (V4HF "HF") (V8HF "HF") - (V2SF "SF") (V4SF "SF") - (V2DF "DF") (DF "DF") - (SI "SI") (HI "HI") +(define_mode_attr VEL [(V8QI "QI") (V16QI "QI") (VNx16QI "QI") + (V4HI "HI") (V8HI "HI") (VNx8HI "HI") + (V2SI "SI") (V4SI "SI") (VNx4SI "SI") + (DI "DI") (V2DI "DI") (VNx2DI "DI") + (V4HF "HF") (V8HF "HF") (VNx8HF "HF") + (V2SF "SF") (V4SF "SF") (VNx4SF "SF") + (DF "DF") (V2DF "DF") (VNx2DF "DF") + (SI "SI") (HI "HI") (QI "QI")]) ;; Define element mode for each vector mode (lower case). -(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") - (V4HI "hi") (V8HI "hi") - (V2SI "si") (V4SI "si") - (DI "di") (V2DI "di") - (V4HF "hf") (V8HF "hf") - (V2SF "sf") (V4SF "sf") - (V2DF "df") (DF "df") +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi") + (V4HI "hi") (V8HI "hi") (VNx8HI "hi") + (V2SI "si") (V4SI "si") (VNx4SI "si") + (DI "di") (V2DI "di") (VNx2DI "di") + (V4HF "hf") (V8HF "hf") (VNx8HF "hf") + (V2SF "sf") (V4SF "sf") (VNx4SF "sf") + (V2DF "df") (DF "df") (VNx2DF "df") (SI "si") (HI "hi") (QI "qi")]) +;; Element mode with floating-point values replaced by like-sized integers. +(define_mode_attr VEL_INT [(VNx16QI "QI") + (VNx8HI "HI") (VNx8HF "HI") + (VNx4SI "SI") (VNx4SF "SI") + (VNx2DI "DI") (VNx2DF "DI")]) + +;; Gives the mode of the 128-bit lowpart of an SVE vector. +(define_mode_attr V128 [(VNx16QI "V16QI") + (VNx8HI "V8HI") (VNx8HF "V8HF") + (VNx4SI "V4SI") (VNx4SF "V4SF") + (VNx2DI "V2DI") (VNx2DF "V2DF")]) + +;; ...and again in lower case. +(define_mode_attr v128 [(VNx16QI "v16qi") + (VNx8HI "v8hi") (VNx8HF "v8hf") + (VNx4SI "v4si") (VNx4SF "v4sf") + (VNx2DI "v2di") (VNx2DF "v2df")]) + ;; 64-bit container modes the inner or scalar source mode. (define_mode_attr VCOND [(HI "V4HI") (SI "V2SI") (V4HI "V4HI") (V8HI "V4HI") @@ -666,16 +746,28 @@ (V2DI "4s")]) ;; Widened modes of vector modes. -(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI") - (V2SI "V2DI") (V16QI "V8HI") - (V8HI "V4SI") (V4SI "V2DI") - (HI "SI") (SI "DI") - (V8HF "V4SF") (V4SF "V2DF") - (V4HF "V4SF") (V2SF "V2DF")] -) +(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI") + (V2SI "V2DI") (V16QI "V8HI") + (V8HI "V4SI") (V4SI "V2DI") + (HI "SI") (SI "DI") + (V8HF "V4SF") (V4SF "V2DF") + (V4HF "V4SF") (V2SF "V2DF") + (VNx8HF "VNx4SF") (VNx4SF "VNx2DF") + (VNx16QI "VNx8HI") (VNx8HI "VNx4SI") + (VNx4SI "VNx2DI") + (VNx16BI "VNx8BI") (VNx8BI "VNx4BI") + (VNx4BI "VNx2BI")]) + +;; Predicate mode associated with VWIDE. +(define_mode_attr VWIDE_PRED [(VNx8HF "VNx4BI") (VNx4SF "VNx2BI")]) ;; Widened modes of vector modes, lowercase -(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")]) +(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf") + (VNx16QI "vnx8hi") (VNx8HI "vnx4si") + (VNx4SI "vnx2di") + (VNx8HF "vnx4sf") (VNx4SF "vnx2df") + (VNx16BI "vnx8bi") (VNx8BI "vnx4bi") + (VNx4BI "vnx2bi")]) ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF. (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s") @@ -683,6 +775,11 @@ (V8HI "4s") (V4SI "2d") (V8HF "4s") (V4SF "2d")]) +;; SVE vector after widening +(define_mode_attr Vewtype [(VNx16QI "h") + (VNx8HI "s") (VNx8HF "s") + (VNx4SI "d") (VNx4SF "d")]) + ;; Widened mode register suffixes for VDW/VQW. (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s") (V2SI ".2d") (V16QI ".8h") @@ -696,22 +793,23 @@ (V4SF "2s")]) ;; Define corresponding core/FP element mode for each vector mode. -(define_mode_attr vw [(V8QI "w") (V16QI "w") - (V4HI "w") (V8HI "w") - (V2SI "w") (V4SI "w") - (DI "x") (V2DI "x") - (V2SF "s") (V4SF "s") - (V2DF "d")]) +(define_mode_attr vw [(V8QI "w") (V16QI "w") (VNx16QI "w") + (V4HI "w") (V8HI "w") (VNx8HI "w") + (V2SI "w") (V4SI "w") (VNx4SI "w") + (DI "x") (V2DI "x") (VNx2DI "x") + (VNx8HF "h") + (V2SF "s") (V4SF "s") (VNx4SF "s") + (V2DF "d") (VNx2DF "d")]) ;; Corresponding core element mode for each vector mode. This is a ;; variation on mapping FP modes to GP regs. -(define_mode_attr vwcore [(V8QI "w") (V16QI "w") - (V4HI "w") (V8HI "w") - (V2SI "w") (V4SI "w") - (DI "x") (V2DI "x") - (V4HF "w") (V8HF "w") - (V2SF "w") (V4SF "w") - (V2DF "x")]) +(define_mode_attr vwcore [(V8QI "w") (V16QI "w") (VNx16QI "w") + (V4HI "w") (V8HI "w") (VNx8HI "w") + (V2SI "w") (V4SI "w") (VNx4SI "w") + (DI "x") (V2DI "x") (VNx2DI "x") + (V4HF "w") (V8HF "w") (VNx8HF "w") + (V2SF "w") (V4SF "w") (VNx4SF "w") + (V2DF "x") (VNx2DF "x")]) ;; Double vector types for ALLX. (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")]) @@ -723,8 +821,13 @@ (DI "DI") (V2DI "V2DI") (V4HF "V4HI") (V8HF "V8HI") (V2SF "V2SI") (V4SF "V4SI") - (V2DF "V2DI") (DF "DI") - (SF "SI") (HF "HI")]) + (DF "DI") (V2DF "V2DI") + (SF "SI") (HF "HI") + (VNx16QI "VNx16QI") + (VNx8HI "VNx8HI") (VNx8HF "VNx8HI") + (VNx4SI "VNx4SI") (VNx4SF "VNx4SI") + (VNx2DI "VNx2DI") (VNx2DF "VNx2DI") +]) ;; Lower case mode with floating-point values replaced by like-sized integers. (define_mode_attr v_int_equiv [(V8QI "v8qi") (V16QI "v16qi") @@ -733,8 +836,19 @@ (DI "di") (V2DI "v2di") (V4HF "v4hi") (V8HF "v8hi") (V2SF "v2si") (V4SF "v4si") - (V2DF "v2di") (DF "di") - (SF "si")]) + (DF "di") (V2DF "v2di") + (SF "si") + (VNx16QI "vnx16qi") + (VNx8HI "vnx8hi") (VNx8HF "vnx8hi") + (VNx4SI "vnx4si") (VNx4SF "vnx4si") + (VNx2DI "vnx2di") (VNx2DF "vnx2di") +]) + +;; Floating-point equivalent of selected modes. +(define_mode_attr V_FP_EQUIV [(VNx4SI "VNx4SF") (VNx4SF "VNx4SF") + (VNx2DI "VNx2DF") (VNx2DF "VNx2DF")]) +(define_mode_attr v_fp_equiv [(VNx4SI "vnx4sf") (VNx4SF "vnx4sf") + (VNx2DI "vnx2df") (VNx2DF "vnx2df")]) ;; Mode for vector conditional operations where the comparison has ;; different type from the lhs. @@ -869,6 +983,18 @@ (define_code_attr f16mac [(plus "a") (minus "s")]) +;; The predicate mode associated with an SVE data mode. +(define_mode_attr VPRED [(VNx16QI "VNx16BI") + (VNx8HI "VNx8BI") (VNx8HF "VNx8BI") + (VNx4SI "VNx4BI") (VNx4SF "VNx4BI") + (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")]) + +;; ...and again in lower case. +(define_mode_attr vpred [(VNx16QI "vnx16bi") + (VNx8HI "vnx8bi") (VNx8HF "vnx8bi") + (VNx4SI "vnx4bi") (VNx4SF "vnx4bi") + (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")]) + ;; ------------------------------------------------------------------- ;; Code Iterators ;; ------------------------------------------------------------------- @@ -882,6 +1008,9 @@ ;; Code iterator for logical operations (define_code_iterator LOGICAL [and ior xor]) +;; LOGICAL without AND. +(define_code_iterator LOGICAL_OR [ior xor]) + ;; Code iterator for logical operations whose :nlogical works on SIMD registers. (define_code_iterator NLOGICAL [and ior]) @@ -940,6 +1069,12 @@ ;; Unsigned comparison operators. (define_code_iterator FAC_COMPARISONS [lt le ge gt]) +;; SVE integer unary operations. +(define_code_iterator SVE_INT_UNARY [neg not popcount]) + +;; SVE floating-point unary operations. +(define_code_iterator SVE_FP_UNARY [neg abs sqrt]) + ;; ------------------------------------------------------------------- ;; Code Attributes ;; ------------------------------------------------------------------- @@ -956,6 +1091,7 @@ (unsigned_fix "fixuns") (float "float") (unsigned_float "floatuns") + (popcount "popcount") (and "and") (ior "ior") (xor "xor") @@ -969,6 +1105,10 @@ (us_minus "qsub") (ss_neg "qneg") (ss_abs "qabs") + (smin "smin") + (smax "smax") + (umin "umin") + (umax "umax") (eq "eq") (ne "ne") (lt "lt") @@ -978,7 +1118,9 @@ (ltu "ltu") (leu "leu") (geu "geu") - (gtu "gtu")]) + (gtu "gtu") + (abs "abs") + (sqrt "sqrt")]) ;; For comparison operators we use the FCM* and CM* instructions. ;; As there are no CMLE or CMLT instructions which act on 3 vector @@ -1021,9 +1163,12 @@ ;; Operation names for negate and bitwise complement. (define_code_attr neg_not_op [(neg "neg") (not "not")]) -;; Similar, but when not(op) +;; Similar, but when the second operand is inverted. (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")]) +;; Similar, but when both operands are inverted. +(define_code_attr logical_nn [(and "nor") (ior "nand")]) + ;; Sign- or zero-extending data-op (define_code_attr su [(sign_extend "s") (zero_extend "u") (sign_extract "s") (zero_extract "u") @@ -1032,6 +1177,9 @@ (smax "s") (umax "u") (smin "s") (umin "u")]) +;; Whether a shift is left or right. +(define_code_attr lr [(ashift "l") (ashiftrt "r") (lshiftrt "r")]) + ;; Emit conditional branch instructions. (define_code_attr bcond [(eq "beq") (ne "bne") (lt "bne") (ge "beq")]) @@ -1077,6 +1225,25 @@ ;; Attribute to describe constants acceptable in atomic logical operations (define_mode_attr lconst_atomic [(QI "K") (HI "K") (SI "K") (DI "L")]) +;; The integer SVE instruction that implements an rtx code. +(define_code_attr sve_int_op [(plus "add") + (neg "neg") + (smin "smin") + (smax "smax") + (umin "umin") + (umax "umax") + (and "and") + (ior "orr") + (xor "eor") + (not "not") + (popcount "cnt")]) + +;; The floating-point SVE instruction that implements an rtx code. +(define_code_attr sve_fp_op [(plus "fadd") + (neg "fneg") + (abs "fabs") + (sqrt "fsqrt")]) + ;; ------------------------------------------------------------------- ;; Int Iterators. ;; ------------------------------------------------------------------- @@ -1086,6 +1253,8 @@ (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV UNSPEC_FMAXNMV UNSPEC_FMINNMV]) +(define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF]) + (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD UNSPEC_SRHADD UNSPEC_URHADD UNSPEC_SHSUB UNSPEC_UHSUB @@ -1141,6 +1310,9 @@ UNSPEC_TRN1 UNSPEC_TRN2 UNSPEC_UZP1 UNSPEC_UZP2]) +(define_int_iterator OPTAB_PERMUTE [UNSPEC_ZIP1 UNSPEC_ZIP2 + UNSPEC_UZP1 UNSPEC_UZP2]) + (define_int_iterator REVERSE [UNSPEC_REV64 UNSPEC_REV32 UNSPEC_REV16]) (define_int_iterator FRINT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM @@ -1179,6 +1351,21 @@ (define_int_iterator VFMLA16_HIGH [UNSPEC_FMLAL2 UNSPEC_FMLSL2]) +(define_int_iterator UNPACK [UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI + UNSPEC_UNPACKSLO UNSPEC_UNPACKULO]) + +(define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI]) + +(define_int_iterator SVE_COND_INT_CMP [UNSPEC_COND_LT UNSPEC_COND_LE + UNSPEC_COND_EQ UNSPEC_COND_NE + UNSPEC_COND_GE UNSPEC_COND_GT + UNSPEC_COND_LO UNSPEC_COND_LS + UNSPEC_COND_HS UNSPEC_COND_HI]) + +(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE + UNSPEC_COND_EQ UNSPEC_COND_NE + UNSPEC_COND_GE UNSPEC_COND_GT]) + ;; Iterators for atomic operations. (define_int_iterator ATOMIC_LDOP @@ -1192,6 +1379,14 @@ ;; ------------------------------------------------------------------- ;; Int Iterators Attributes. ;; ------------------------------------------------------------------- + +;; The optab associated with an operation. Note that for ANDF, IORF +;; and XORF, the optab pattern is not actually defined; we just use this +;; name for consistency with the integer patterns. +(define_int_attr optab [(UNSPEC_ANDF "and") + (UNSPEC_IORF "ior") + (UNSPEC_XORF "xor")]) + (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (UNSPEC_UMINV "umin") (UNSPEC_SMAXV "smax") @@ -1218,6 +1413,17 @@ (UNSPEC_FMAXNM "fmaxnm") (UNSPEC_FMINNM "fminnm")]) +;; The SVE logical instruction that implements an unspec. +(define_int_attr logicalf_op [(UNSPEC_ANDF "and") + (UNSPEC_IORF "orr") + (UNSPEC_XORF "eor")]) + +;; "s" for signed operations and "u" for unsigned ones. +(define_int_attr su [(UNSPEC_UNPACKSHI "s") + (UNSPEC_UNPACKUHI "u") + (UNSPEC_UNPACKSLO "s") + (UNSPEC_UNPACKULO "u")]) + (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u") (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur") (UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u") @@ -1328,7 +1534,9 @@ (define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2") (UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2") - (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")]) + (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2") + (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi") + (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")]) (define_int_attr frecp_suffix [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")]) @@ -1361,3 +1569,27 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s") (UNSPEC_FMLAL2 "a") (UNSPEC_FMLSL2 "s")]) + +;; The condition associated with an UNSPEC_COND_. +(define_int_attr cmp_op [(UNSPEC_COND_LT "lt") + (UNSPEC_COND_LE "le") + (UNSPEC_COND_EQ "eq") + (UNSPEC_COND_NE "ne") + (UNSPEC_COND_GE "ge") + (UNSPEC_COND_GT "gt") + (UNSPEC_COND_LO "lo") + (UNSPEC_COND_LS "ls") + (UNSPEC_COND_HS "hs") + (UNSPEC_COND_HI "hi")]) + +;; The constraint to use for an UNSPEC_COND_. +(define_int_attr imm_con [(UNSPEC_COND_EQ "vsc") + (UNSPEC_COND_NE "vsc") + (UNSPEC_COND_LT "vsc") + (UNSPEC_COND_GE "vsc") + (UNSPEC_COND_LE "vsc") + (UNSPEC_COND_GT "vsc") + (UNSPEC_COND_LO "vsd") + (UNSPEC_COND_LS "vsd") + (UNSPEC_COND_HS "vsd") + (UNSPEC_COND_HI "vsd")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 65b2df6ed1a..7424f506a5c 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -93,6 +93,10 @@ (define_predicate "aarch64_fp_vec_pow2" (match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0")) +(define_predicate "aarch64_sve_cnt_immediate" + (and (match_code "const_poly_int") + (match_test "aarch64_sve_cnt_immediate_p (op)"))) + (define_predicate "aarch64_sub_immediate" (and (match_code "const_int") (match_test "aarch64_uimm12_shift (-INTVAL (op))"))) @@ -114,9 +118,22 @@ (and (match_operand 0 "aarch64_pluslong_immediate") (not (match_operand 0 "aarch64_plus_immediate")))) +(define_predicate "aarch64_sve_addvl_addpl_immediate" + (and (match_code "const_poly_int") + (match_test "aarch64_sve_addvl_addpl_immediate_p (op)"))) + +(define_predicate "aarch64_split_add_offset_immediate" + (and (match_code "const_poly_int") + (match_test "aarch64_add_offset_temporaries (op) == 1"))) + (define_predicate "aarch64_pluslong_operand" (ior (match_operand 0 "register_operand") - (match_operand 0 "aarch64_pluslong_immediate"))) + (match_operand 0 "aarch64_pluslong_immediate") + (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))) + +(define_predicate "aarch64_pluslong_or_poly_operand" + (ior (match_operand 0 "aarch64_pluslong_operand") + (match_operand 0 "aarch64_split_add_offset_immediate"))) (define_predicate "aarch64_logical_immediate" (and (match_code "const_int") @@ -263,11 +280,18 @@ }) (define_predicate "aarch64_mov_operand" - (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high") + (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high, + const_poly_int,const_vector") (ior (match_operand 0 "register_operand") (ior (match_operand 0 "memory_operand") (match_test "aarch64_mov_operand_p (op, mode)"))))) +(define_predicate "aarch64_nonmemory_operand" + (and (match_code "reg,subreg,const,const_int,symbol_ref,label_ref,high, + const_poly_int,const_vector") + (ior (match_operand 0 "register_operand") + (match_test "aarch64_mov_operand_p (op, mode)")))) + (define_predicate "aarch64_movti_operand" (and (match_code "reg,subreg,mem,const_int") (ior (match_operand 0 "register_operand") @@ -303,6 +327,9 @@ return aarch64_get_condition_code (op) >= 0; }) +(define_special_predicate "aarch64_equality_operator" + (match_code "eq,ne")) + (define_special_predicate "aarch64_carry_operation" (match_code "ne,geu") { @@ -342,22 +369,34 @@ }) (define_special_predicate "aarch64_simd_lshift_imm" - (match_code "const_vector") + (match_code "const,const_vector") { return aarch64_simd_shift_imm_p (op, mode, true); }) (define_special_predicate "aarch64_simd_rshift_imm" - (match_code "const_vector") + (match_code "const,const_vector") { return aarch64_simd_shift_imm_p (op, mode, false); }) +(define_predicate "aarch64_simd_imm_zero" + (and (match_code "const,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) + +(define_predicate "aarch64_simd_or_scalar_imm_zero" + (and (match_code "const_int,const_double,const,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) + +(define_predicate "aarch64_simd_imm_minus_one" + (and (match_code "const,const_vector") + (match_test "op == CONSTM1_RTX (GET_MODE (op))"))) + (define_predicate "aarch64_simd_reg_or_zero" - (and (match_code "reg,subreg,const_int,const_double,const_vector") + (and (match_code "reg,subreg,const_int,const_double,const,const_vector") (ior (match_operand 0 "register_operand") - (ior (match_test "op == const0_rtx") - (match_test "aarch64_simd_imm_zero_p (op, mode)"))))) + (match_test "op == const0_rtx") + (match_operand 0 "aarch64_simd_imm_zero")))) (define_predicate "aarch64_simd_struct_operand" (and (match_code "mem") @@ -377,21 +416,6 @@ || GET_CODE (XEXP (op, 0)) == POST_INC || GET_CODE (XEXP (op, 0)) == REG"))) -(define_special_predicate "aarch64_simd_imm_zero" - (match_code "const_vector") -{ - return aarch64_simd_imm_zero_p (op, mode); -}) - -(define_special_predicate "aarch64_simd_or_scalar_imm_zero" - (match_test "aarch64_simd_imm_zero_p (op, mode)")) - -(define_special_predicate "aarch64_simd_imm_minus_one" - (match_code "const_vector") -{ - return aarch64_const_vec_all_same_int_p (op, -1); -}) - ;; Predicates used by the various SIMD shift operations. These ;; fall in to 3 categories. ;; Shifts with a range 0-(bit_size - 1) (aarch64_simd_shift_imm) @@ -448,3 +472,133 @@ (define_predicate "aarch64_constant_pool_symref" (and (match_code "symbol_ref") (match_test "CONSTANT_POOL_ADDRESS_P (op)"))) + +(define_predicate "aarch64_constant_vector_operand" + (match_code "const,const_vector")) + +(define_predicate "aarch64_sve_ld1r_operand" + (and (match_operand 0 "memory_operand") + (match_test "aarch64_sve_ld1r_operand_p (op)"))) + +;; Like memory_operand, but restricted to addresses that are valid for +;; SVE LDR and STR instructions. +(define_predicate "aarch64_sve_ldr_operand" + (and (match_code "mem") + (match_test "aarch64_sve_ldr_operand_p (op)"))) + +(define_predicate "aarch64_sve_nonimmediate_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_ldr_operand"))) + +(define_predicate "aarch64_sve_general_operand" + (and (match_code "reg,subreg,mem,const,const_vector") + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_ldr_operand") + (match_test "aarch64_mov_operand_p (op, mode)")))) + +;; Doesn't include immediates, since those are handled by the move +;; patterns instead. +(define_predicate "aarch64_sve_dup_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_ld1r_operand"))) + +(define_predicate "aarch64_sve_arith_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_arith_immediate_p (op, false)"))) + +(define_predicate "aarch64_sve_sub_arith_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_arith_immediate_p (op, true)"))) + +(define_predicate "aarch64_sve_inc_dec_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_inc_dec_immediate_p (op)"))) + +(define_predicate "aarch64_sve_logical_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_bitmask_immediate_p (op)"))) + +(define_predicate "aarch64_sve_mul_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_const_vec_all_same_in_range_p (op, -128, 127)"))) + +(define_predicate "aarch64_sve_dup_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_dup_immediate_p (op)"))) + +(define_predicate "aarch64_sve_cmp_vsc_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_cmp_immediate_p (op, true)"))) + +(define_predicate "aarch64_sve_cmp_vsd_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_cmp_immediate_p (op, false)"))) + +(define_predicate "aarch64_sve_index_immediate" + (and (match_code "const_int") + (match_test "aarch64_sve_index_immediate_p (op)"))) + +(define_predicate "aarch64_sve_float_arith_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_float_arith_immediate_p (op, false)"))) + +(define_predicate "aarch64_sve_float_arith_with_sub_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_float_arith_immediate_p (op, true)"))) + +(define_predicate "aarch64_sve_float_mul_immediate" + (and (match_code "const,const_vector") + (match_test "aarch64_sve_float_mul_immediate_p (op)"))) + +(define_predicate "aarch64_sve_arith_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_arith_immediate"))) + +(define_predicate "aarch64_sve_add_operand" + (ior (match_operand 0 "aarch64_sve_arith_operand") + (match_operand 0 "aarch64_sve_sub_arith_immediate") + (match_operand 0 "aarch64_sve_inc_dec_immediate"))) + +(define_predicate "aarch64_sve_logical_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_logical_immediate"))) + +(define_predicate "aarch64_sve_lshift_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_simd_lshift_imm"))) + +(define_predicate "aarch64_sve_rshift_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_simd_rshift_imm"))) + +(define_predicate "aarch64_sve_mul_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_mul_immediate"))) + +(define_predicate "aarch64_sve_cmp_vsc_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_cmp_vsc_immediate"))) + +(define_predicate "aarch64_sve_cmp_vsd_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_cmp_vsd_immediate"))) + +(define_predicate "aarch64_sve_index_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_index_immediate"))) + +(define_predicate "aarch64_sve_float_arith_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_float_arith_immediate"))) + +(define_predicate "aarch64_sve_float_arith_with_sub_operand" + (ior (match_operand 0 "aarch64_sve_float_arith_operand") + (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate"))) + +(define_predicate "aarch64_sve_float_mul_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_float_mul_immediate"))) + +(define_predicate "aarch64_sve_vec_perm_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_constant_vector_operand"))) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89a4727ecdf..28c61a078d2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14594,6 +14594,23 @@ Permissible values are @samp{none}, which disables return address signing, functions, and @samp{all}, which enables pointer signing for all functions. The default value is @samp{none}. +@item -msve-vector-bits=@var{bits} +@opindex msve-vector-bits +Specify the number of bits in an SVE vector register. This option only has +an effect when SVE is enabled. + +GCC supports two forms of SVE code generation: ``vector-length +agnostic'' output that works with any size of vector register and +``vector-length specific'' output that only works when the vector +registers are a particular size. Replacing @var{bits} with +@samp{scalable} selects vector-length agnostic output while +replacing it with a number selects vector-length specific output. +The possible lengths in the latter case are: 128, 256, 512, 1024 +and 2048. @samp{scalable} is the default. + +At present, @samp{-msve-vector-bits=128} produces the same output +as @samp{-msve-vector-bits=scalable}. + @end table @subsubsection @option{-march} and @option{-mcpu} Feature Modifiers @@ -14617,6 +14634,9 @@ values for options @option{-march} and @option{-mcpu}. Enable Advanced SIMD instructions. This also enables floating-point instructions. This is on by default for all possible values for options @option{-march} and @option{-mcpu}. +@item sve +Enable Scalable Vector Extension instructions. This also enables Advanced +SIMD and floating-point instructions. @item lse Enable Large System Extension instructions. This is on by default for @option{-march=armv8.1-a}. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 497df1bb501..e956c751b57 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1735,7 +1735,13 @@ the meanings of that architecture's constraints. The stack pointer register (@code{SP}) @item w -Floating point or SIMD vector register +Floating point register, Advanced SIMD vector register or SVE vector register + +@item Upl +One of the low eight SVE predicate registers (@code{P0} to @code{P7}) + +@item Upa +Any of the SVE predicate registers (@code{P0} to @code{P15}) @item I Integer constant that is valid as an immediate operand in an @code{ADD} -- 2.11.4.GIT