aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristophe Lyon <christophe.lyon@linaro.org>2016-08-25 15:38:54 +0200
committerYvan Roux <yvan.roux@linaro.org>2016-09-07 22:08:35 +0200
commite59b2ff1fdebf862212b8cefd8e58a7ee73fabe0 (patch)
tree3fce536ab814a5871acd9655ca71137a60e98940
parent3046e9ae43d584f70c7d979634243fee50f7cecb (diff)
gcc/
Backport from trunk r237956. 2016-07-04 Matthew Wahab <matthew.wahab@arm.com> Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-arches.def: Add "armv8.2-a". * config/aarch64/aarch64.h (AARCH64_FL_V8_2): New. (AARCH64_FL_F16): New. (AARCH64_FL_FOR_ARCH8_2): New. (AARCH64_ISA_8_2): New. (AARCH64_ISA_F16): New. (TARGET_FP_F16INST): New. (TARGET_SIMD_F16INST): New. * config/aarch64/aarch64-option-extensions.def ("fp16"): New entry. ("fp"): Disabling "fp" also disables "fp16". * config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): Conditionally define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC and __ARM_FEATURE_FP16_VECTOR_ARITHMETIC. * doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16". gcc/ Backport from trunk r238715. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd.md (aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>): Use VALL_F16. (aarch64_ext<mode>): Likewise. (aarch64_rev<REVERSE:rev_op><mode>): Likewise. * config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev): Support V4HFmode and V8HFmode. * config/aarch64/arm_neon.h (__INTERLEAVE_LIST): Support float16x4_t, float16x8_t. (__aarch64_vdup_lane_f16, __aarch64_vdup_laneq_f16, __aarch64_vdupq_lane_f16, __aarch64_vdupq_laneq_f16, vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16, vdup_laneq_f16, vdupq_lane_f16, vdupq_laneq_f16, vduph_lane_f16, vduph_laneq_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16, vrev64_f16, vrev64q_f16, vtrn1_f16, vtrn1q_f16, vtrn2_f16, vtrn2q_f16, vtrn_f16, vtrnq_f16, vuzp1_f16, vuzp1q_f16, vuzp2_f16, vuzp2q_f16, vzip1_f16, vzip2q_f16): New. (vmov_n_f16): Reimplement using vdup_n_f16. (vmovq_n_f16): Reimplement using vdupq_n_f16. gcc/ Backport from trunk r238716. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New. * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrte<mode>): Extend to HF modes. (neg<mode>2): Likewise. (abs<mode>2): Likewise. (<frint_pattern><mode>2): Likewise. (l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2): Likewise. (<optab><VDQF:mode><fcvt_target>2): Likewise. (<fix_trunc_optab><VDQF:mode><fcvt_target>2): Likewise. (ftrunc<VDQF:mode>2): Likewise. (<optab><fcvt_target><VDQF:mode>2): Likewise. (sqrt<mode>2): Likewise. (*sqrt<mode>2): Likewise. (aarch64_frecpe<mode>): Likewise. (aarch64_cm<optab><mode>): Likewise. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return false for HF, V4HF and V8HF. * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New. (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to HF modes. (stype): New. * config/aarch64/arm_neon.h (vdup_n_f16): New. (vdupq_n_f16): Likewise. (vld1_dup_f16): Use vdup_n_f16. (vld1q_dup_f16): Use vdupq_n_f16. (vabs_f16, vabsq_f16, vceqz_f16, vceqzq_f16, vcgez_f16, vcgezq_f16, vcgtz_f16, vcgtzq_f16, vclez_f16, vclezq_f16, vcltz_f16, vcltzq_f16, vcvt_f16_s16, vcvtq_f16_s16, vcvt_f16_u16, vcvtq_f16_u16, vcvt_s16_f16, vcvtq_s16_f16, vcvt_u16_f16, vcvtq_u16_f16, vcvta_s16_f16, vcvtaq_s16_f16, vcvta_u16_f16, vcvtaq_u16_f16, vcvtm_s16_f16, vcvtmq_s16_f16, vcvtm_u16_f16, vcvtmq_u16_f16, vcvtn_s16_f16, vcvtnq_s16_f16, vcvtn_u16_f16, vcvtnq_u16_f16, vcvtp_s16_f16, vcvtpq_s16_f16, vcvtp_u16_f16, vcvtpq_u16_f16, vneg_f16, vnegq_f16, vrecpe_f16, vrecpeq_f16, vrnd_f16, vrndq_f16, vrnda_f16, vrndaq_f16, vrndi_f16, vrndiq_f16, vrndm_f16, vrndmq_f16, vrndn_f16, vrndnq_f16, vrndp_f16, vrndpq_f16, vrndx_f16, vrndxq_f16, vrsqrte_f16, vrsqrteq_f16, vsqrt_f16, vsqrtq_f16): New. gcc/ Backport from trunk r238717. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrts<mode>): Extend to HF modes. (fabd<mode>3): Likewise. (<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_SDF:mode>3): Likewise. (<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_SDI:mode>3): Likewise. (aarch64_<maxmin_uns>p<mode>): Likewise. (<su><maxmin><mode>3): Likewise. (<maxmin_uns><mode>3): Likewise. (<fmaxmin><mode>3): Likewise. (aarch64_faddp<mode>): Likewise. (aarch64_fmulx<mode>): Likewise. (aarch64_frecps<mode>): Likewise. (*aarch64_fac<optab><mode>): Rename to aarch64_fac<optab><mode>. (add<mode>3): Extend to HF modes. (sub<mode>3): Likewise. (mul<mode>3): Likewise. (div<mode>3): Likewise. (*div<mode>3): Likewise. * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return false for HF, V4HF and V8HF. * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode iterator. * config/aarch64/arm_neon.h (vadd_f16, vaddq_f16, vabd_f16, vabdq_f16, vcage_f16, vcageq_f16, vcagt_f16, vcagtq_f16, vcale_f16, vcaleq_f16, vcalt_f16, vcaltq_f16, vceq_f16, vceqq_f16, vcge_f16, vcgeq_f16, vcgt_f16, vcgtq_f16, vcle_f16, vcleq_f16, vclt_f16, vcltq_f16, vcvt_n_f16_s16, vcvtq_n_f16_s16, vcvt_n_f16_u16, vcvtq_n_f16_u16, vcvt_n_s16_f16, vcvtq_n_s16_f16, vcvt_n_u16_f16, vcvtq_n_u16_f16, vdiv_f16, vdivq_f16, vdup_lane_f16, vdup_laneq_f16, vdupq_lane_f16, vdupq_laneq_f16, vdups_lane_f16, vdups_laneq_f16, vmax_f16, vmaxq_f16, vmaxnm_f16, vmaxnmq_f16, vmin_f16, vminq_f16, vminnm_f16, vminnmq_f16, vmul_f16, vmulq_f16, vmulx_f16, vmulxq_f16, vpadd_f16, vpaddq_f16, vpmax_f16, vpmaxq_f16, vpmaxnm_f16, vpmaxnmq_f16, vpmin_f16, vpminq_f16, vpminnm_f16, vpminnmq_f16, vrecps_f16, vrecpsq_f16, vrsqrts_f16, vrsqrtsq_f16, vsub_f16, vsubq_f16): New. gcc/ Backport from trunk r238718. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (fma<mode>4, fnma<mode>4): Extend to HF modes. * config/aarch64/arm_neon.h (vfma_f16, vfmaq_f16, vfms_f16, vfmsq_f16): New. gcc/ Backport from trunk r238719. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_to_64v2df): Rename to "*aarch64_mulx_elt_from_dup<mode>". (*aarch64_mul3_elt<mode>): Update schedule type. (*aarch64_mul3_elt_from_dup<mode>): Likewise. (*aarch64_fma4_elt_from_dup<mode>): Likewise. (*aarch64_fnma4_elt_from_dup<mode>): Likewise. * config/aarch64/iterators.md (VMUL): Supprt half precision float modes. (f, fp): Support HF modes. * config/aarch64/arm_neon.h (vfma_lane_f16, vfmaq_lane_f16, vfma_laneq_f16, vfmaq_laneq_f16, vfma_n_f16, vfmaq_n_f16, vfms_lane_f16, vfmsq_lane_f16, vfms_laneq_f16, vfmsq_laneq_f16, vfms_n_f16, vfmsq_n_f16, vmul_lane_f16, vmulq_lane_f16, vmul_laneq_f16, vmulq_laneq_f16, vmul_n_f16, vmulq_n_f16, vmulx_lane_f16, vmulxq_lane_f16, vmulx_laneq_f16, vmulxq_laneq_f16): New. gcc/ Backport from trunk r238721. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd-builtins.def (reduc_smax_scal_, reduc_smin_scal_): Use VDQIF_F16. (reduc_smax_nan_scal_, reduc_smin_nan_scal_): Use VHSDF. * config/aarch64/aarch64-simd.md (reduc_<maxmin_uns>_scal_<mode>): Use VHSDF. (aarch64_reduc_<maxmin_uns>_internal<mode>): Likewise. * config/aarch64/iterators.md (VDQIF_F16): New. (vp): Support HF modes. * config/aarch64/arm_neon.h (vmaxv_f16, vmaxvq_f16, vminv_f16, vminvq_f16, vmaxnmv_f16, vmaxnmvq_f16, vminnmv_f16, vminnmvq_f16): New. gcc/ Backport from trunk r238722. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config.gcc (aarch64*-*-*): Install arm_fp16.h. * config/aarch64/aarch64-builtins.c (hi_UP): New. * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_frsqrte<mode>): Extend to HF mode. (aarch64_frecp<FRECP:frecp_suffix><mode>): Likewise. (aarch64_cm<optab><mode>): Likewise. * config/aarch64/aarch64.md (<frint_pattern><mode>2): Likewise. (l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2): Likewise. (fix_trunc<GPF:mode><GPI:mode>2): Likewise. (sqrt<mode>2): Likewise. (*sqrt<mode>2): Likewise. (abs<mode>2): Likewise. (<optab><mode>hf2): New pattern for HF mode. (<optab>hihf2): Likewise. * config/aarch64/arm_neon.h: Include arm_fp16.h. * config/aarch64/iterators.md (GPF_F16, GPI_F16, VHSDF_HSDF): New. (w1, w2, v, s, q, Vmtype, V_cmp_result, fcvt_iesize, FCVT_IESIZE): Support HF mode. * config/aarch64/arm_fp16.h: New file. (vabsh_f16, vceqzh_f16, vcgezh_f16, vcgtzh_f16, vclezh_f16, vcltzh_f16, vcvth_f16_s16, vcvth_f16_s32, vcvth_f16_s64, vcvth_f16_u16, vcvth_f16_u32, vcvth_f16_u64, vcvth_s16_f16, vcvth_s32_f16, vcvth_s64_f16, vcvth_u16_f16, vcvth_u32_f16, vcvth_u64_f16, vcvtah_s16_f16, vcvtah_s32_f16, vcvtah_s64_f16, vcvtah_u16_f16, vcvtah_u32_f16, vcvtah_u64_f16, vcvtmh_s16_f16, vcvtmh_s32_f16, vcvtmh_s64_f16, vcvtmh_u16_f16, vcvtmh_u32_f16, vcvtmh_u64_f16, vcvtnh_s16_f16, vcvtnh_s32_f16, vcvtnh_s64_f16, vcvtnh_u16_f16, vcvtnh_u32_f16, vcvtnh_u64_f16, vcvtph_s16_f16, vcvtph_s32_f16, vcvtph_s64_f16, vcvtph_u16_f16, vcvtph_u32_f16, vcvtph_u64_f16, vnegh_f16, vrecpeh_f16, vrecpxh_f16, vrndh_f16, vrndah_f16, vrndih_f16, vrndmh_f16, vrndnh_f16, vrndph_f16, vrndxh_f16, vrsqrteh_f16, vsqrth_f16): New. gcc/ Backport from trunk r238723. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64.md (<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3): New. (<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3): Likewise. (add<mode>3): Likewise. (sub<mode>3): Likewise. (mul<mode>3): Likewise. (div<mode>3): Likewise. (*div<mode>3): Likewise. (<fmaxmin><mode>3): Extend to HF. * config/aarch64/aarch64-simd.md (aarch64_rsqrts<mode>): Likewise. (fabd<mode>3): Likewise. (<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_HSDF:mode>3): Likewise. (<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_HSDI:mode>3): Likewise. (aarch64_fmulx<mode>): Likewise. (aarch64_fac<optab><mode>): Likewise. (aarch64_frecps<mode>): Likewise. (<FCVT_F2FIXED:fcvt_fixed_insn>hfhi3): New. (<FCVT_FIXED2F:fcvt_fixed_insn>hihf3): Likewise. * config/aarch64/iterators.md (VHSDF_SDF): Delete. (VSDQ_HSDI): Support HI. (fcvt_target, FCVT_TARGET): Likewise. * config/aarch64/arm_fp16.h (vaddh_f16, vsubh_f16, vabdh_f16, vcageh_f16, vcagth_f16, vcaleh_f16, vcalth_f16, vceqh_f16, vcgeh_f16, vcgth_f16, vcleh_f16, vclth_f16, vcvth_n_f16_s16, vcvth_n_f16_s32, vcvth_n_f16_s64, vcvth_n_f16_u16, vcvth_n_f16_u32, vcvth_n_f16_u64, vcvth_n_s16_f16, vcvth_n_s32_f16, vcvth_n_s64_f16, vcvth_n_u16_f16, vcvth_n_u32_f16, vcvth_n_u64_f16, vdivh_f16, vmaxh_f16, vmaxnmh_f16, vminh_f16, vminnmh_f16, vmulh_f16, vmulxh_f16, vrecpsh_f16, vrsqrtsh_f16): New. gcc/ Backport from trunk r238724. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64.md (fma, fnma): Support HF. * config/aarch64/arm_fp16.h (vfmah_f16, vfmsh_f16): New. gcc/ Backport from trunk r238725. 2016-07-25 Jiong Wang <jiong.wang@arm.com> * config/aarch64/arm_neon.h (vfmah_lane_f16, vfmah_laneq_f16, vfmsh_lane_f16, vfmsh_laneq_f16, vmulh_lane_f16, vmulh_laneq_f16, vmulxh_lane_f16, vmulxh_laneq_f16): New. Change-Id: I8118d32ccb84626ad42afc4181334258c7fc8e5b
-rw-r--r--gcc/config.gcc2
-rw-r--r--gcc/config/aarch64/aarch64-arches.def1
-rw-r--r--gcc/config/aarch64/aarch64-builtins.c5
-rw-r--r--gcc/config/aarch64/aarch64-c.c5
-rw-r--r--gcc/config/aarch64/aarch64-option-extensions.def8
-rw-r--r--gcc/config/aarch64/aarch64-simd-builtins.def161
-rw-r--r--gcc/config/aarch64/aarch64-simd.md366
-rw-r--r--gcc/config/aarch64/aarch64.c24
-rw-r--r--gcc/config/aarch64/aarch64.h11
-rw-r--r--gcc/config/aarch64/aarch64.md172
-rw-r--r--gcc/config/aarch64/arm_fp16.h579
-rw-r--r--gcc/config/aarch64/arm_neon.h1282
-rw-r--r--gcc/config/aarch64/iterators.md88
-rw-r--r--gcc/doc/invoke.texi7
14 files changed, 2414 insertions, 297 deletions
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f3da546892..0cbf84be1e2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -307,7 +307,7 @@ m32c*-*-*)
;;
aarch64*-*-*)
cpu_type=aarch64
- extra_headers="arm_neon.h arm_acle.h"
+ extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 1e9d90b1b66..7dcf140411f 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -32,4 +32,5 @@
AARCH64_ARCH("armv8-a", generic, 8A, 8, AARCH64_FL_FOR_ARCH8)
AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1)
+AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2)
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 0d51bfa3ee8..1de325a0fc3 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -62,6 +62,7 @@
#define si_UP SImode
#define sf_UP SFmode
#define hi_UP HImode
+#define hf_UP HFmode
#define qi_UP QImode
#define UP(X) X##_UP
@@ -139,6 +140,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_none, qualifier_none, qualifier_unsigned };
#define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_poly, qualifier_poly, qualifier_poly };
#define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index e64dc7676cc..3380ed6f2cd 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -95,6 +95,11 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
else
cpp_undef (pfile, "__ARM_FP");
+ aarch64_def_or_undef (TARGET_FP_F16INST,
+ "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", pfile);
+ aarch64_def_or_undef (TARGET_SIMD_F16INST,
+ "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", pfile);
+
aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile);
aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile);
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index e8706d1c2e7..a10ccf2254c 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -39,8 +39,8 @@
that are required. Their order is not important. */
/* Enabling "fp" just enables "fp".
- Disabling "fp" also disables "simd", "crypto". */
-AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "fp")
+ Disabling "fp" also disables "simd", "crypto" and "fp16". */
+AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp")
/* Enabling "simd" also enables "fp".
Disabling "simd" also disables "crypto". */
@@ -55,3 +55,7 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32")
/* Enabling or disabling "lse" only changes "lse". */
AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
+
+/* Enabling "fp16" also enables "fp".
+ Disabling "fp16" just disables "fp16". */
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16")
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index df0a7d8ae6e..bc5eda6d6dc 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,8 +41,8 @@
BUILTIN_VDC (COMBINE, combine, 0)
BUILTIN_VB (BINOP, pmul, 0)
- BUILTIN_VALLF (BINOP, fmulx, 0)
- BUILTIN_VDQF_DF (UNOP, sqrt, 2)
+ BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0)
+ BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
BUILTIN_VD_BHSI (BINOP, addp, 0)
VAR1 (UNOP, addp, 0, di)
BUILTIN_VDQ_BHSI (UNOP, clrsb, 2)
@@ -234,12 +234,12 @@
BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
/* Implemented by reduc_<maxmin_uns>_scal_<mode> (producing scalar). */
- BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
- BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
+ BUILTIN_VDQIF_F16 (UNOP, reduc_smax_scal_, 10)
+ BUILTIN_VDQIF_F16 (UNOP, reduc_smin_scal_, 10)
BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
- BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10)
- BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10)
+ BUILTIN_VHSDF (UNOP, reduc_smax_nan_scal_, 10)
+ BUILTIN_VHSDF (UNOP, reduc_smin_nan_scal_, 10)
/* Implemented by <maxmin><mode>3.
smax variants map to fmaxnm,
@@ -248,91 +248,131 @@
BUILTIN_VDQ_BHSI (BINOP, smin, 3)
BUILTIN_VDQ_BHSI (BINOP, umax, 3)
BUILTIN_VDQ_BHSI (BINOP, umin, 3)
- BUILTIN_VDQF (BINOP, smax_nan, 3)
- BUILTIN_VDQF (BINOP, smin_nan, 3)
+ BUILTIN_VHSDF (BINOP, smax_nan, 3)
+ BUILTIN_VHSDF (BINOP, smin_nan, 3)
/* Implemented by <fmaxmin><mode>3. */
- BUILTIN_VDQF (BINOP, fmax, 3)
- BUILTIN_VDQF (BINOP, fmin, 3)
+ BUILTIN_VHSDF (BINOP, fmax, 3)
+ BUILTIN_VHSDF (BINOP, fmin, 3)
/* Implemented by aarch64_<maxmin_uns>p<mode>. */
BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
- BUILTIN_VDQF (BINOP, smaxp, 0)
- BUILTIN_VDQF (BINOP, sminp, 0)
- BUILTIN_VDQF (BINOP, smax_nanp, 0)
- BUILTIN_VDQF (BINOP, smin_nanp, 0)
+ BUILTIN_VHSDF (BINOP, smaxp, 0)
+ BUILTIN_VHSDF (BINOP, sminp, 0)
+ BUILTIN_VHSDF (BINOP, smax_nanp, 0)
+ BUILTIN_VHSDF (BINOP, smin_nanp, 0)
/* Implemented by <frint_pattern><mode>2. */
- BUILTIN_VDQF (UNOP, btrunc, 2)
- BUILTIN_VDQF (UNOP, ceil, 2)
- BUILTIN_VDQF (UNOP, floor, 2)
- BUILTIN_VDQF (UNOP, nearbyint, 2)
- BUILTIN_VDQF (UNOP, rint, 2)
- BUILTIN_VDQF (UNOP, round, 2)
- BUILTIN_VDQF_DF (UNOP, frintn, 2)
+ BUILTIN_VHSDF (UNOP, btrunc, 2)
+ BUILTIN_VHSDF (UNOP, ceil, 2)
+ BUILTIN_VHSDF (UNOP, floor, 2)
+ BUILTIN_VHSDF (UNOP, nearbyint, 2)
+ BUILTIN_VHSDF (UNOP, rint, 2)
+ BUILTIN_VHSDF (UNOP, round, 2)
+ BUILTIN_VHSDF_DF (UNOP, frintn, 2)
+
+ VAR1 (UNOP, btrunc, 2, hf)
+ VAR1 (UNOP, ceil, 2, hf)
+ VAR1 (UNOP, floor, 2, hf)
+ VAR1 (UNOP, frintn, 2, hf)
+ VAR1 (UNOP, nearbyint, 2, hf)
+ VAR1 (UNOP, rint, 2, hf)
+ VAR1 (UNOP, round, 2, hf)
/* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2. */
+ VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
+ VAR1 (UNOP, lbtruncv8hf, 2, v8hi)
VAR1 (UNOP, lbtruncv2sf, 2, v2si)
VAR1 (UNOP, lbtruncv4sf, 2, v4si)
VAR1 (UNOP, lbtruncv2df, 2, v2di)
+ VAR1 (UNOPUS, lbtruncuv4hf, 2, v4hi)
+ VAR1 (UNOPUS, lbtruncuv8hf, 2, v8hi)
VAR1 (UNOPUS, lbtruncuv2sf, 2, v2si)
VAR1 (UNOPUS, lbtruncuv4sf, 2, v4si)
VAR1 (UNOPUS, lbtruncuv2df, 2, v2di)
+ VAR1 (UNOP, lroundv4hf, 2, v4hi)
+ VAR1 (UNOP, lroundv8hf, 2, v8hi)
VAR1 (UNOP, lroundv2sf, 2, v2si)
VAR1 (UNOP, lroundv4sf, 2, v4si)
VAR1 (UNOP, lroundv2df, 2, v2di)
- /* Implemented by l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2. */
+ /* Implemented by l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2. */
+ BUILTIN_GPI_I16 (UNOP, lroundhf, 2)
VAR1 (UNOP, lroundsf, 2, si)
VAR1 (UNOP, lrounddf, 2, di)
+ VAR1 (UNOPUS, lrounduv4hf, 2, v4hi)
+ VAR1 (UNOPUS, lrounduv8hf, 2, v8hi)
VAR1 (UNOPUS, lrounduv2sf, 2, v2si)
VAR1 (UNOPUS, lrounduv4sf, 2, v4si)
VAR1 (UNOPUS, lrounduv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOPUS, lrounduhf, 2)
VAR1 (UNOPUS, lroundusf, 2, si)
VAR1 (UNOPUS, lroundudf, 2, di)
+ VAR1 (UNOP, lceilv4hf, 2, v4hi)
+ VAR1 (UNOP, lceilv8hf, 2, v8hi)
VAR1 (UNOP, lceilv2sf, 2, v2si)
VAR1 (UNOP, lceilv4sf, 2, v4si)
VAR1 (UNOP, lceilv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOP, lceilhf, 2)
+ VAR1 (UNOPUS, lceiluv4hf, 2, v4hi)
+ VAR1 (UNOPUS, lceiluv8hf, 2, v8hi)
VAR1 (UNOPUS, lceiluv2sf, 2, v2si)
VAR1 (UNOPUS, lceiluv4sf, 2, v4si)
VAR1 (UNOPUS, lceiluv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOPUS, lceiluhf, 2)
VAR1 (UNOPUS, lceilusf, 2, si)
VAR1 (UNOPUS, lceiludf, 2, di)
+ VAR1 (UNOP, lfloorv4hf, 2, v4hi)
+ VAR1 (UNOP, lfloorv8hf, 2, v8hi)
VAR1 (UNOP, lfloorv2sf, 2, v2si)
VAR1 (UNOP, lfloorv4sf, 2, v4si)
VAR1 (UNOP, lfloorv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOP, lfloorhf, 2)
+ VAR1 (UNOPUS, lflooruv4hf, 2, v4hi)
+ VAR1 (UNOPUS, lflooruv8hf, 2, v8hi)
VAR1 (UNOPUS, lflooruv2sf, 2, v2si)
VAR1 (UNOPUS, lflooruv4sf, 2, v4si)
VAR1 (UNOPUS, lflooruv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOPUS, lflooruhf, 2)
VAR1 (UNOPUS, lfloorusf, 2, si)
VAR1 (UNOPUS, lfloorudf, 2, di)
+ VAR1 (UNOP, lfrintnv4hf, 2, v4hi)
+ VAR1 (UNOP, lfrintnv8hf, 2, v8hi)
VAR1 (UNOP, lfrintnv2sf, 2, v2si)
VAR1 (UNOP, lfrintnv4sf, 2, v4si)
VAR1 (UNOP, lfrintnv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOP, lfrintnhf, 2)
VAR1 (UNOP, lfrintnsf, 2, si)
VAR1 (UNOP, lfrintndf, 2, di)
+ VAR1 (UNOPUS, lfrintnuv4hf, 2, v4hi)
+ VAR1 (UNOPUS, lfrintnuv8hf, 2, v8hi)
VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si)
VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si)
VAR1 (UNOPUS, lfrintnuv2df, 2, v2di)
+ BUILTIN_GPI_I16 (UNOPUS, lfrintnuhf, 2)
VAR1 (UNOPUS, lfrintnusf, 2, si)
VAR1 (UNOPUS, lfrintnudf, 2, di)
/* Implemented by <optab><fcvt_target><VDQF:mode>2. */
+ VAR1 (UNOP, floatv4hi, 2, v4hf)
+ VAR1 (UNOP, floatv8hi, 2, v8hf)
VAR1 (UNOP, floatv2si, 2, v2sf)
VAR1 (UNOP, floatv4si, 2, v4sf)
VAR1 (UNOP, floatv2di, 2, v2df)
+ VAR1 (UNOP, floatunsv4hi, 2, v4hf)
+ VAR1 (UNOP, floatunsv8hi, 2, v8hf)
VAR1 (UNOP, floatunsv2si, 2, v2sf)
VAR1 (UNOP, floatunsv4si, 2, v4sf)
VAR1 (UNOP, floatunsv2di, 2, v2df)
@@ -352,19 +392,19 @@
/* Implemented by
aarch64_frecp<FRECP:frecp_suffix><mode>. */
- BUILTIN_GPF (UNOP, frecpe, 0)
- BUILTIN_GPF (BINOP, frecps, 0)
- BUILTIN_GPF (UNOP, frecpx, 0)
+ BUILTIN_GPF_F16 (UNOP, frecpe, 0)
+ BUILTIN_GPF_F16 (UNOP, frecpx, 0)
BUILTIN_VDQ_SI (UNOP, urecpe, 0)
- BUILTIN_VDQF (UNOP, frecpe, 0)
- BUILTIN_VDQF (BINOP, frecps, 0)
+ BUILTIN_VHSDF (UNOP, frecpe, 0)
+ BUILTIN_VHSDF_HSDF (BINOP, frecps, 0)
/* Implemented by a mixture of abs2 patterns. Note the DImode builtin is
only ever used for the int64x1_t intrinsic, there is no scalar version. */
BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
- BUILTIN_VDQF (UNOP, abs, 2)
+ BUILTIN_VHSDF (UNOP, abs, 2)
+ VAR1 (UNOP, abs, 2, hf)
BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
@@ -381,7 +421,11 @@
BUILTIN_VALL_F16 (STORE1, st1, 0)
/* Implemented by fma<mode>4. */
- BUILTIN_VDQF (TERNOP, fma, 4)
+ BUILTIN_VHSDF (TERNOP, fma, 4)
+ VAR1 (TERNOP, fma, 4, hf)
+ /* Implemented by fnma<mode>4. */
+ BUILTIN_VHSDF (TERNOP, fnma, 4)
+ VAR1 (TERNOP, fnma, 4, hf)
/* Implemented by aarch64_simd_bsl<mode>. */
BUILTIN_VDQQH (BSL_P, simd_bsl, 0)
@@ -451,19 +495,62 @@
BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
/* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3. */
- BUILTIN_VSDQ_SDI (SHIFTIMM, scvtf, 3)
- BUILTIN_VSDQ_SDI (FCVTIMM_SUS, ucvtf, 3)
- BUILTIN_VALLF (SHIFTIMM, fcvtzs, 3)
- BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3)
+ BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
+ BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
+ BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3)
+ BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3)
+ VAR1 (SHIFTIMM, scvtfsi, 3, hf)
+ VAR1 (SHIFTIMM, scvtfdi, 3, hf)
+ VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf)
+ VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf)
+ BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3)
+ BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3)
/* Implemented by aarch64_rsqrte<mode>. */
- BUILTIN_VALLF (UNOP, rsqrte, 0)
+ BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
/* Implemented by aarch64_rsqrts<mode>. */
- BUILTIN_VALLF (BINOP, rsqrts, 0)
+ BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0)
/* Implemented by fabd<mode>3. */
- BUILTIN_VALLF (BINOP, fabd, 3)
+ BUILTIN_VHSDF_HSDF (BINOP, fabd, 3)
/* Implemented by aarch64_faddp<mode>. */
- BUILTIN_VDQF (BINOP, faddp, 0)
+ BUILTIN_VHSDF (BINOP, faddp, 0)
+
+ /* Implemented by aarch64_cm<optab><mode>. */
+ BUILTIN_VHSDF_HSDF (BINOP_USS, cmeq, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, cmge, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, cmgt, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, cmle, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, cmlt, 0)
+
+ /* Implemented by neg<mode>2. */
+ BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
+
+ /* Implemented by aarch64_fac<optab><mode>. */
+ BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0)
+ BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0)
+
+ /* Implemented by sqrt<mode>2. */
+ VAR1 (UNOP, sqrt, 2, hf)
+
+ /* Implemented by <optab><mode>hf2. */
+ VAR1 (UNOP, floatdi, 2, hf)
+ VAR1 (UNOP, floatsi, 2, hf)
+ VAR1 (UNOP, floathi, 2, hf)
+ VAR1 (UNOPUS, floatunsdi, 2, hf)
+ VAR1 (UNOPUS, floatunssi, 2, hf)
+ VAR1 (UNOPUS, floatunshi, 2, hf)
+ BUILTIN_GPI_I16 (UNOP, fix_trunchf, 2)
+ BUILTIN_GPI (UNOP, fix_truncsf, 2)
+ BUILTIN_GPI (UNOP, fix_truncdf, 2)
+ BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
+ BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
+ BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
+
+ /* Implemented by <fmaxmin><mode>3. */
+ VAR1 (BINOP, fmax, 3, hf)
+ VAR1 (BINOP, fmin, 3, hf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 3f8289cf7dc..c6af9f36d76 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -351,7 +351,7 @@
operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
}
- [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
+ [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
)
(define_insn "*aarch64_mul3_elt_<vswap_width_name><mode>"
@@ -379,25 +379,25 @@
(match_operand:VMUL 2 "register_operand" "w")))]
"TARGET_SIMD"
"<f>mul\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]";
- [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
+ [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
)
(define_insn "aarch64_rsqrte<mode>"
- [(set (match_operand:VALLF 0 "register_operand" "=w")
- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+ (unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")]
UNSPEC_RSQRTE))]
"TARGET_SIMD"
"frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
- [(set_attr "type" "neon_fp_rsqrte_<Vetype><q>")])
+ [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
(define_insn "aarch64_rsqrts<mode>"
- [(set (match_operand:VALLF 0 "register_operand" "=w")
- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
- (match_operand:VALLF 2 "register_operand" "w")]
- UNSPEC_RSQRTS))]
+ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+ (unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+ (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
+ UNSPEC_RSQRTS))]
"TARGET_SIMD"
"frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_rsqrts_<Vetype><q>")])
+ [(set_attr "type" "neon_fp_rsqrts_<stype><q>")])
(define_expand "rsqrt<mode>2"
[(set (match_operand:VALLF 0 "register_operand" "=w")
@@ -475,14 +475,14 @@
)
(define_insn "fabd<mode>3"
- [(set (match_operand:VALLF 0 "register_operand" "=w")
- (abs:VALLF
- (minus:VALLF
- (match_operand:VALLF 1 "register_operand" "w")
- (match_operand:VALLF 2 "register_operand" "w"))))]
+ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+ (abs:VHSDF_HSDF
+ (minus:VHSDF_HSDF
+ (match_operand:VHSDF_HSDF 1 "register_operand" "w")
+ (match_operand:VHSDF_HSDF 2 "register_operand" "w"))))]
"TARGET_SIMD"
"fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_abd_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_abd_<stype><q>")]
)
(define_insn "and<mode>3"
@@ -1062,10 +1062,10 @@
;; Pairwise FP Max/Min operations.
(define_insn "aarch64_<maxmin_uns>p<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")]
- FMAXMINV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")]
+ FMAXMINV))]
"TARGET_SIMD"
"<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
[(set_attr "type" "neon_minmax<q>")]
@@ -1474,36 +1474,36 @@
;; FP arithmetic operations.
(define_insn "add<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (plus:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")))]
"TARGET_SIMD"
"fadd\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_addsub_<stype><q>")]
)
(define_insn "sub<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (minus:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (minus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")))]
"TARGET_SIMD"
"fsub\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_addsub_<stype><q>")]
)
(define_insn "mul<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (mult:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (mult:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")))]
"TARGET_SIMD"
"fmul\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_mul_<stype><q>")]
)
(define_expand "div<mode>3"
- [(set (match_operand:VDQF 0 "register_operand")
- (div:VDQF (match_operand:VDQF 1 "general_operand")
- (match_operand:VDQF 2 "register_operand")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")))]
"TARGET_SIMD"
{
if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
@@ -1513,38 +1513,38 @@
})
(define_insn "*div<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")))]
"TARGET_SIMD"
"fdiv\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_div_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_div_<stype><q>")]
)
(define_insn "neg<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (neg:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
"TARGET_SIMD"
"fneg\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_fp_neg_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_neg_<stype><q>")]
)
(define_insn "abs<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (abs:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
"TARGET_SIMD"
"fabs\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_fp_abs_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_abs_<stype><q>")]
)
(define_insn "fma<mode>4"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (fma:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")
- (match_operand:VDQF 3 "register_operand" "0")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")
+ (match_operand:VHSDF 3 "register_operand" "0")))]
"TARGET_SIMD"
"fmla\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_mla_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_mla_<stype><q>")]
)
(define_insn "*aarch64_fma4_elt<mode>"
@@ -1591,7 +1591,7 @@
(match_operand:VMUL 3 "register_operand" "0")))]
"TARGET_SIMD"
"fmla\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
- [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")]
+ [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")]
)
(define_insn "*aarch64_fma4_elt_to_64v2df"
@@ -1611,15 +1611,15 @@
)
(define_insn "fnma<mode>4"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (fma:VDQF
- (match_operand:VDQF 1 "register_operand" "w")
- (neg:VDQF
- (match_operand:VDQF 2 "register_operand" "w"))
- (match_operand:VDQF 3 "register_operand" "0")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (fma:VHSDF
+ (match_operand:VHSDF 1 "register_operand" "w")
+ (neg:VHSDF
+ (match_operand:VHSDF 2 "register_operand" "w"))
+ (match_operand:VHSDF 3 "register_operand" "0")))]
"TARGET_SIMD"
- "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_mla_<Vetype><q>")]
+ "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+ [(set_attr "type" "neon_fp_mla_<stype><q>")]
)
(define_insn "*aarch64_fnma4_elt<mode>"
@@ -1669,7 +1669,7 @@
(match_operand:VMUL 3 "register_operand" "0")))]
"TARGET_SIMD"
"fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
- [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")]
+ [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")]
)
(define_insn "*aarch64_fnma4_elt_to_64v2df"
@@ -1692,24 +1692,50 @@
;; Vector versions of the floating-point frint patterns.
;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
(define_insn "<frint_pattern><mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
- FRINT))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+ FRINT))]
"TARGET_SIMD"
"frint<frint_suffix>\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_fp_round_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_round_<stype><q>")]
)
;; Vector versions of the fcvt standard patterns.
;; Expands to lbtrunc, lround, lceil, lfloor
-(define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2"
+(define_insn "l<fcvt_pattern><su_optab><VHSDF:mode><fcvt_target>2"
[(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
- [(match_operand:VDQF 1 "register_operand" "w")]
+ [(match_operand:VHSDF 1 "register_operand" "w")]
FCVT)))]
"TARGET_SIMD"
"fcvt<frint_suffix><su>\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_to_int_<stype><q>")]
+)
+
+;; HF Scalar variants of related SIMD instructions.
+(define_insn "l<fcvt_pattern><su_optab>hfhi2"
+ [(set (match_operand:HI 0 "register_operand" "=w")
+ (FIXUORS:HI (unspec:HF [(match_operand:HF 1 "register_operand" "w")]
+ FCVT)))]
+ "TARGET_SIMD_F16INST"
+ "fcvt<frint_suffix><su>\t%h0, %h1"
+ [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<optab>_trunchfhi2"
+ [(set (match_operand:HI 0 "register_operand" "=w")
+ (FIXUORS:HI (match_operand:HF 1 "register_operand" "w")))]
+ "TARGET_SIMD_F16INST"
+ "fcvtz<su>\t%h0, %h1"
+ [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<optab>hihf2"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+ (FLOATUORS:HF (match_operand:HI 1 "register_operand" "w")))]
+ "TARGET_SIMD_F16INST"
+ "<su_optab>cvtf\t%h0, %h1"
+ [(set_attr "type" "neon_int_to_fp_s")]
)
(define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
@@ -1732,36 +1758,36 @@
[(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
)
-(define_expand "<optab><VDQF:mode><fcvt_target>2"
+(define_expand "<optab><VHSDF:mode><fcvt_target>2"
[(set (match_operand:<FCVT_TARGET> 0 "register_operand")
(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
- [(match_operand:VDQF 1 "register_operand")]
- UNSPEC_FRINTZ)))]
+ [(match_operand:VHSDF 1 "register_operand")]
+ UNSPEC_FRINTZ)))]
"TARGET_SIMD"
{})
-(define_expand "<fix_trunc_optab><VDQF:mode><fcvt_target>2"
+(define_expand "<fix_trunc_optab><VHSDF:mode><fcvt_target>2"
[(set (match_operand:<FCVT_TARGET> 0 "register_operand")
(FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
- [(match_operand:VDQF 1 "register_operand")]
- UNSPEC_FRINTZ)))]
+ [(match_operand:VHSDF 1 "register_operand")]
+ UNSPEC_FRINTZ)))]
"TARGET_SIMD"
{})
-(define_expand "ftrunc<VDQF:mode>2"
- [(set (match_operand:VDQF 0 "register_operand")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
- UNSPEC_FRINTZ))]
+(define_expand "ftrunc<VHSDF:mode>2"
+ [(set (match_operand:VHSDF 0 "register_operand")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
+ UNSPEC_FRINTZ))]
"TARGET_SIMD"
{})
-(define_insn "<optab><fcvt_target><VDQF:mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (FLOATUORS:VDQF
+(define_insn "<optab><fcvt_target><VHSDF:mode>2"
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (FLOATUORS:VHSDF
(match_operand:<FCVT_TARGET> 1 "register_operand" "w")))]
"TARGET_SIMD"
"<su_optab>cvtf\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_int_to_fp_<Vetype><q>")]
+ [(set_attr "type" "neon_int_to_fp_<stype><q>")]
)
;; Conversions between vectors of floats and doubles.
@@ -1783,24 +1809,26 @@
;; Convert between fixed-point and floating-point (vector modes)
-(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VDQF:mode>3"
- [(set (match_operand:<VDQF:FCVT_TARGET> 0 "register_operand" "=w")
- (unspec:<VDQF:FCVT_TARGET> [(match_operand:VDQF 1 "register_operand" "w")
- (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF:mode>3"
+ [(set (match_operand:<VHSDF:FCVT_TARGET> 0 "register_operand" "=w")
+ (unspec:<VHSDF:FCVT_TARGET>
+ [(match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
FCVT_F2FIXED))]
"TARGET_SIMD"
"<FCVT_F2FIXED:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
- [(set_attr "type" "neon_fp_to_int_<VDQF:Vetype><q>")]
+ [(set_attr "type" "neon_fp_to_int_<VHSDF:stype><q>")]
)
-(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_SDI:mode>3"
- [(set (match_operand:<VDQ_SDI:FCVT_TARGET> 0 "register_operand" "=w")
- (unspec:<VDQ_SDI:FCVT_TARGET> [(match_operand:VDQ_SDI 1 "register_operand" "w")
- (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3"
+ [(set (match_operand:<VDQ_HSDI:FCVT_TARGET> 0 "register_operand" "=w")
+ (unspec:<VDQ_HSDI:FCVT_TARGET>
+ [(match_operand:VDQ_HSDI 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
FCVT_FIXED2F))]
"TARGET_SIMD"
"<FCVT_FIXED2F:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
- [(set_attr "type" "neon_int_to_fp_<VDQ_SDI:Vetype><q>")]
+ [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")]
)
;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
@@ -1959,33 +1987,33 @@
;; NaNs.
(define_insn "<su><maxmin><mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (FMAXMIN:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (FMAXMIN:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")))]
"TARGET_SIMD"
"f<maxmin>nm\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_minmax_<stype><q>")]
)
(define_insn "<maxmin_uns><mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")]
- FMAXMIN_UNS))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")]
+ FMAXMIN_UNS))]
"TARGET_SIMD"
"<maxmin_uns_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_minmax_<stype><q>")]
)
;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions
(define_insn "<fmaxmin><mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")]
- FMAXMIN))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")]
+ FMAXMIN))]
"TARGET_SIMD"
"<fmaxmin_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_minmax_<stype><q>")]
)
;; 'across lanes' add.
@@ -2005,13 +2033,13 @@
)
(define_insn "aarch64_faddp<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")]
- UNSPEC_FADDV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+ (match_operand:VHSDF 2 "register_operand" "w")]
+ UNSPEC_FADDV))]
"TARGET_SIMD"
"faddp\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_reduc_add_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
)
(define_insn "aarch64_reduc_plus_internal<mode>"
@@ -2085,8 +2113,8 @@
;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code. (This is FP smax/smin).
(define_expand "reduc_<maxmin_uns>_scal_<mode>"
[(match_operand:<VEL> 0 "register_operand")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
- FMAXMINV)]
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
+ FMAXMINV)]
"TARGET_SIMD"
{
rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
@@ -2133,12 +2161,12 @@
)
(define_insn "aarch64_reduc_<maxmin_uns>_internal<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
- FMAXMINV))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+ FMAXMINV))]
"TARGET_SIMD"
"<maxmin_uns_op><vp>\\t%<Vetype>0, %1.<Vtype>"
- [(set_attr "type" "neon_fp_reduc_minmax_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_reduc_minmax_<stype><q>")]
)
;; aarch64_simd_bsl may compile to any of bsl/bif/bit depending on register
@@ -3007,13 +3035,14 @@
;; fmulx.
(define_insn "aarch64_fmulx<mode>"
- [(set (match_operand:VALLF 0 "register_operand" "=w")
- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
- (match_operand:VALLF 2 "register_operand" "w")]
- UNSPEC_FMULX))]
+ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+ (unspec:VHSDF_HSDF
+ [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+ (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
+ UNSPEC_FMULX))]
"TARGET_SIMD"
"fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_mul_<Vetype>")]
+ [(set_attr "type" "neon_fp_mul_<stype>")]
)
;; vmulxq_lane_f32, and vmulx_laneq_f32
@@ -3055,20 +3084,18 @@
[(set_attr "type" "neon_fp_mul_<Vetype><q>")]
)
-;; vmulxq_lane_f64
+;; vmulxq_lane
-(define_insn "*aarch64_mulx_elt_to_64v2df"
- [(set (match_operand:V2DF 0 "register_operand" "=w")
- (unspec:V2DF
- [(match_operand:V2DF 1 "register_operand" "w")
- (vec_duplicate:V2DF
- (match_operand:DF 2 "register_operand" "w"))]
+(define_insn "*aarch64_mulx_elt_from_dup<mode>"
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF
+ [(match_operand:VHSDF 1 "register_operand" "w")
+ (vec_duplicate:VHSDF
+ (match_operand:<VEL> 2 "register_operand" "w"))]
UNSPEC_FMULX))]
"TARGET_SIMD"
- {
- return "fmulx\t%0.2d, %1.2d, %2.d[0]";
- }
- [(set_attr "type" "neon_fp_mul_d_scalar_q")]
+ "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
+ [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
)
;; vmulxs_lane_f32, vmulxs_laneq_f32
@@ -4253,30 +4280,32 @@
[(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w")
(neg:<V_cmp_result>
(COMPARISONS:<V_cmp_result>
- (match_operand:VALLF 1 "register_operand" "w,w")
- (match_operand:VALLF 2 "aarch64_simd_reg_or_zero" "w,YDz")
+ (match_operand:VHSDF_HSDF 1 "register_operand" "w,w")
+ (match_operand:VHSDF_HSDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
)))]
"TARGET_SIMD"
"@
fcm<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>
fcm<optab>\t%<v>0<Vmtype>, %<v>1<Vmtype>, 0"
- [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_compare_<stype><q>")]
)
;; fac(ge|gt)
;; Note we can also handle what would be fac(le|lt) by
;; generating fac(ge|gt).
-(define_insn "*aarch64_fac<optab><mode>"
+(define_insn "aarch64_fac<optab><mode>"
[(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
(neg:<V_cmp_result>
(FAC_COMPARISONS:<V_cmp_result>
- (abs:VALLF (match_operand:VALLF 1 "register_operand" "w"))
- (abs:VALLF (match_operand:VALLF 2 "register_operand" "w"))
+ (abs:VHSDF_HSDF
+ (match_operand:VHSDF_HSDF 1 "register_operand" "w"))
+ (abs:VHSDF_HSDF
+ (match_operand:VHSDF_HSDF 2 "register_operand" "w"))
)))]
"TARGET_SIMD"
"fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>"
- [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_compare_<stype><q>")]
)
;; addp
@@ -4305,8 +4334,8 @@
;; sqrt
(define_expand "sqrt<mode>2"
- [(set (match_operand:VDQF 0 "register_operand")
- (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
"TARGET_SIMD"
{
if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
@@ -4314,11 +4343,11 @@
})
(define_insn "*sqrt<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
"TARGET_SIMD"
"fsqrt\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_fp_sqrt_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_sqrt_<stype><q>")]
)
;; Patterns for vector struct loads and stores.
@@ -5176,10 +5205,10 @@
)
(define_insn "aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>"
- [(set (match_operand:VALL 0 "register_operand" "=w")
- (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
- (match_operand:VALL 2 "register_operand" "w")]
- PERMUTE))]
+ [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+ (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
+ (match_operand:VALL_F16 2 "register_operand" "w")]
+ PERMUTE))]
"TARGET_SIMD"
"<PERMUTE:perm_insn><PERMUTE:perm_hilo>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
[(set_attr "type" "neon_permute<q>")]
@@ -5187,11 +5216,11 @@
;; Note immediate (third) operand is lane index not byte index.
(define_insn "aarch64_ext<mode>"
- [(set (match_operand:VALL 0 "register_operand" "=w")
- (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
- (match_operand:VALL 2 "register_operand" "w")
- (match_operand:SI 3 "immediate_operand" "i")]
- UNSPEC_EXT))]
+ [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+ (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
+ (match_operand:VALL_F16 2 "register_operand" "w")
+ (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_EXT))]
"TARGET_SIMD"
{
operands[3] = GEN_INT (INTVAL (operands[3])
@@ -5202,8 +5231,8 @@
)
(define_insn "aarch64_rev<REVERSE:rev_op><mode>"
- [(set (match_operand:VALL 0 "register_operand" "=w")
- (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")]
+ [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+ (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")]
REVERSE))]
"TARGET_SIMD"
"rev<REVERSE:rev_op>\\t%0.<Vtype>, %1.<Vtype>"
@@ -5370,31 +5399,32 @@
)
(define_insn "aarch64_frecpe<mode>"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
- UNSPEC_FRECPE))]
+ [(set (match_operand:VHSDF 0 "register_operand" "=w")
+ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
+ UNSPEC_FRECPE))]
"TARGET_SIMD"
"frecpe\\t%0.<Vtype>, %1.<Vtype>"
- [(set_attr "type" "neon_fp_recpe_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_recpe_<stype><q>")]
)
(define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
- FRECP))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
+ FRECP))]
"TARGET_SIMD"
"frecp<FRECP:frecp_suffix>\\t%<s>0, %<s>1"
- [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF:Vetype><GPF:q>")]
+ [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF_F16:stype>")]
)
(define_insn "aarch64_frecps<mode>"
- [(set (match_operand:VALLF 0 "register_operand" "=w")
- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
- (match_operand:VALLF 2 "register_operand" "w")]
- UNSPEC_FRECPS))]
+ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
+ (unspec:VHSDF_HSDF
+ [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
+ (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
+ UNSPEC_FRECPS))]
"TARGET_SIMD"
"frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_recps_<Vetype><q>")]
+ [(set_attr "type" "neon_fp_recps_<stype><q>")]
)
(define_insn "aarch64_urecpe<mode>"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f2424e9e1d3..ba0d3767ff0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7567,6 +7567,10 @@ bool
aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
{
machine_mode mode = GET_MODE (dst);
+
+ if (GET_MODE_INNER (mode) == HFmode)
+ return false;
+
machine_mode mmsk = mode_for_vector
(int_mode_for_mode (GET_MODE_INNER (mode)),
GET_MODE_NUNITS (mode));
@@ -7682,6 +7686,10 @@ bool
aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
{
machine_mode mode = GET_MODE (quo);
+
+ if (GET_MODE_INNER (mode) == HFmode)
+ return false;
+
bool use_approx_division_p = (flag_mlow_precision_div
|| (aarch64_tune_params.approx_modes->division
& AARCH64_APPROX_MODE (mode)));
@@ -12365,6 +12373,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
case V4SImode: gen = gen_aarch64_trn2v4si; break;
case V2SImode: gen = gen_aarch64_trn2v2si; break;
case V2DImode: gen = gen_aarch64_trn2v2di; break;
+ case V4HFmode: gen = gen_aarch64_trn2v4hf; break;
+ case V8HFmode: gen = gen_aarch64_trn2v8hf; break;
case V4SFmode: gen = gen_aarch64_trn2v4sf; break;
case V2SFmode: gen = gen_aarch64_trn2v2sf; break;
case V2DFmode: gen = gen_aarch64_trn2v2df; break;
@@ -12383,6 +12393,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
case V4SImode: gen = gen_aarch64_trn1v4si; break;
case V2SImode: gen = gen_aarch64_trn1v2si; break;
case V2DImode: gen = gen_aarch64_trn1v2di; break;
+ case V4HFmode: gen = gen_aarch64_trn1v4hf; break;
+ case V8HFmode: gen = gen_aarch64_trn1v8hf; break;
case V4SFmode: gen = gen_aarch64_trn1v4sf; break;
case V2SFmode: gen = gen_aarch64_trn1v2sf; break;
case V2DFmode: gen = gen_aarch64_trn1v2df; break;
@@ -12448,6 +12460,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
case V4SImode: gen = gen_aarch64_uzp2v4si; break;
case V2SImode: gen = gen_aarch64_uzp2v2si; break;
case V2DImode: gen = gen_aarch64_uzp2v2di; break;
+ case V4HFmode: gen = gen_aarch64_uzp2v4hf; break;
+ case V8HFmode: gen = gen_aarch64_uzp2v8hf; break;
case V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
case V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
case V2DFmode: gen = gen_aarch64_uzp2v2df; break;
@@ -12466,6 +12480,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
case V4SImode: gen = gen_aarch64_uzp1v4si; break;
case V2SImode: gen = gen_aarch64_uzp1v2si; break;
case V2DImode: gen = gen_aarch64_uzp1v2di; break;
+ case V4HFmode: gen = gen_aarch64_uzp1v4hf; break;
+ case V8HFmode: gen = gen_aarch64_uzp1v8hf; break;
case V4SFmode: gen = gen_aarch64_uzp1v4sf; break;
case V2SFmode: gen = gen_aarch64_uzp1v2sf; break;
case V2DFmode: gen = gen_aarch64_uzp1v2df; break;
@@ -12536,6 +12552,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
case V4SImode: gen = gen_aarch64_zip2v4si; break;
case V2SImode: gen = gen_aarch64_zip2v2si; break;
case V2DImode: gen = gen_aarch64_zip2v2di; break;
+ case V4HFmode: gen = gen_aarch64_zip2v4hf; break;
+ case V8HFmode: gen = gen_aarch64_zip2v8hf; break;
case V4SFmode: gen = gen_aarch64_zip2v4sf; break;
case V2SFmode: gen = gen_aarch64_zip2v2sf; break;
case V2DFmode: gen = gen_aarch64_zip2v2df; break;
@@ -12554,6 +12572,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
case V4SImode: gen = gen_aarch64_zip1v4si; break;
case V2SImode: gen = gen_aarch64_zip1v2si; break;
case V2DImode: gen = gen_aarch64_zip1v2di; break;
+ case V4HFmode: gen = gen_aarch64_zip1v4hf; break;
+ case V8HFmode: gen = gen_aarch64_zip1v8hf; break;
case V4SFmode: gen = gen_aarch64_zip1v4sf; break;
case V2SFmode: gen = gen_aarch64_zip1v2sf; break;
case V2DFmode: gen = gen_aarch64_zip1v2df; break;
@@ -12598,6 +12618,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
case V8HImode: gen = gen_aarch64_extv8hi; break;
case V2SImode: gen = gen_aarch64_extv2si; break;
case V4SImode: gen = gen_aarch64_extv4si; break;
+ case V4HFmode: gen = gen_aarch64_extv4hf; break;
+ case V8HFmode: gen = gen_aarch64_extv8hf; break;
case V2SFmode: gen = gen_aarch64_extv2sf; break;
case V4SFmode: gen = gen_aarch64_extv4sf; break;
case V2DImode: gen = gen_aarch64_extv2di; break;
@@ -12673,6 +12695,8 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d)
case V2SImode: gen = gen_aarch64_rev64v2si; break;
case V4SFmode: gen = gen_aarch64_rev64v4sf; break;
case V2SFmode: gen = gen_aarch64_rev64v2sf; break;
+ case V8HFmode: gen = gen_aarch64_rev64v8hf; break;
+ case V4HFmode: gen = gen_aarch64_rev64v4hf; break;
default:
return false;
}
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index eb81a86e6d8..95b2c7da0e3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -135,6 +135,9 @@ extern unsigned aarch64_architecture_version;
/* ARMv8.1 architecture extensions. */
#define AARCH64_FL_LSE (1 << 4) /* Has Large System Extensions. */
#define AARCH64_FL_V8_1 (1 << 5) /* Has ARMv8.1 extensions. */
+/* ARMv8.2-A architecture extensions. */
+#define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */
+#define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */
/* Has FP and SIMD. */
#define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -146,6 +149,8 @@ extern unsigned aarch64_architecture_version;
#define AARCH64_FL_FOR_ARCH8 (AARCH64_FL_FPSIMD)
#define AARCH64_FL_FOR_ARCH8_1 \
(AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1)
+#define AARCH64_FL_FOR_ARCH8_2 \
+ (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2)
/* Macros to test ISA flags. */
@@ -155,6 +160,8 @@ extern unsigned aarch64_architecture_version;
#define AARCH64_ISA_SIMD (aarch64_isa_flags & AARCH64_FL_SIMD)
#define AARCH64_ISA_LSE (aarch64_isa_flags & AARCH64_FL_LSE)
#define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_V8_1)
+#define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2)
+#define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16)
/* Crypto is an optional extension to AdvSIMD. */
#define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -165,6 +172,10 @@ extern unsigned aarch64_architecture_version;
/* Atomic instructions that can be enabled through the +lse extension. */
#define TARGET_LSE (AARCH64_ISA_LSE)
+/* ARMv8.2-A FP16 support that can be enabled through the +fp16 extension. */
+#define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16)
+#define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16)
+
/* Make sure this is always defined so we don't have to check for ifdefs
but rather use normal ifs. */
#ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 02d71b99a5a..353278890dc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4445,22 +4445,23 @@
;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
(define_insn "<frint_pattern><mode>2"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
FRINT))]
"TARGET_FLOAT"
"frint<frint_suffix>\\t%<s>0, %<s>1"
- [(set_attr "type" "f_rint<s>")]
+ [(set_attr "type" "f_rint<stype>")]
)
;; frcvt floating-point round to integer and convert standard patterns.
;; Expands to lbtrunc, lceil, lfloor, lround.
-(define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2"
+(define_insn "l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2"
[(set (match_operand:GPI 0 "register_operand" "=r")
- (FIXUORS:GPI (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
- FCVT)))]
+ (FIXUORS:GPI
+ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
+ FCVT)))]
"TARGET_FLOAT"
- "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF:s>1"
+ "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF_F16:s>1"
[(set_attr "type" "f_cvtf2i")]
)
@@ -4486,23 +4487,24 @@
;; fma - no throw
(define_insn "fma<mode>4"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (fma:GPF (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w")
- (match_operand:GPF 3 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
+ (match_operand:GPF_F16 2 "register_operand" "w")
+ (match_operand:GPF_F16 3 "register_operand" "w")))]
"TARGET_FLOAT"
"fmadd\\t%<s>0, %<s>1, %<s>2, %<s>3"
- [(set_attr "type" "fmac<s>")]
+ [(set_attr "type" "fmac<stype>")]
)
(define_insn "fnma<mode>4"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w"))
- (match_operand:GPF 2 "register_operand" "w")
- (match_operand:GPF 3 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (fma:GPF_F16
+ (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w"))
+ (match_operand:GPF_F16 2 "register_operand" "w")
+ (match_operand:GPF_F16 3 "register_operand" "w")))]
"TARGET_FLOAT"
"fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3"
- [(set_attr "type" "fmac<s>")]
+ [(set_attr "type" "fmac<stype>")]
)
(define_insn "fms<mode>4"
@@ -4588,19 +4590,11 @@
[(set_attr "type" "f_cvt")]
)
-(define_insn "fix_trunc<GPF:mode><GPI:mode>2"
+(define_insn "<optab>_trunc<GPF_F16:mode><GPI:mode>2"
[(set (match_operand:GPI 0 "register_operand" "=r")
- (fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
+ (FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))]
"TARGET_FLOAT"
- "fcvtzs\\t%<GPI:w>0, %<GPF:s>1"
- [(set_attr "type" "f_cvtf2i")]
-)
-
-(define_insn "fixuns_trunc<GPF:mode><GPI:mode>2"
- [(set (match_operand:GPI 0 "register_operand" "=r")
- (unsigned_fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
- "TARGET_FLOAT"
- "fcvtzu\\t%<GPI:w>0, %<GPF:s>1"
+ "fcvtz<su>\t%<GPI:w>0, %<GPF_F16:s>1"
[(set_attr "type" "f_cvtf2i")]
)
@@ -4624,6 +4618,14 @@
[(set_attr "type" "f_cvti2f")]
)
+(define_insn "<optab><mode>hf2"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+ (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
+ "TARGET_FP_F16INST"
+ "<su_optab>cvtf\t%h0, %<w>1"
+ [(set_attr "type" "f_cvti2f")]
+)
+
;; Convert between fixed-point and floating-point (scalar modes)
(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3"
@@ -4654,38 +4656,78 @@
(set_attr "simd" "*, yes")]
)
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (unspec:GPI [(match_operand:HF 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
+ FCVT_F2FIXED))]
+ "TARGET_FP_F16INST"
+ "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<GPI:w>0, %h1, #%2"
+ [(set_attr "type" "f_cvtf2i")]
+)
+
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+ (unspec:HF [(match_operand:GPI 1 "register_operand" "r")
+ (match_operand:SI 2 "immediate_operand" "i")]
+ FCVT_FIXED2F))]
+ "TARGET_FP_F16INST"
+ "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %<GPI:w>1, #%2"
+ [(set_attr "type" "f_cvti2f")]
+)
+
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf3"
+ [(set (match_operand:HI 0 "register_operand" "=w")
+ (unspec:HI [(match_operand:HF 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
+ FCVT_F2FIXED))]
+ "TARGET_SIMD"
+ "<FCVT_F2FIXED:fcvt_fixed_insn>\t%h0, %h1, #%2"
+ [(set_attr "type" "neon_fp_to_int_s")]
+)
+
+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn>hi3"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+ (unspec:HF [(match_operand:HI 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
+ FCVT_FIXED2F))]
+ "TARGET_SIMD"
+ "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %h1, #%2"
+ [(set_attr "type" "neon_int_to_fp_s")]
+)
+
;; -------------------------------------------------------------------
;; Floating-point arithmetic
;; -------------------------------------------------------------------
(define_insn "add<mode>3"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (plus:GPF
- (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (plus:GPF_F16
+ (match_operand:GPF_F16 1 "register_operand" "w")
+ (match_operand:GPF_F16 2 "register_operand" "w")))]
"TARGET_FLOAT"
"fadd\\t%<s>0, %<s>1, %<s>2"
- [(set_attr "type" "fadd<s>")]
+ [(set_attr "type" "fadd<stype>")]
)
(define_insn "sub<mode>3"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (minus:GPF
- (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (minus:GPF_F16
+ (match_operand:GPF_F16 1 "register_operand" "w")
+ (match_operand:GPF_F16 2 "register_operand" "w")))]
"TARGET_FLOAT"
"fsub\\t%<s>0, %<s>1, %<s>2"
- [(set_attr "type" "fadd<s>")]
+ [(set_attr "type" "fadd<stype>")]
)
(define_insn "mul<mode>3"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (mult:GPF
- (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (mult:GPF_F16
+ (match_operand:GPF_F16 1 "register_operand" "w")
+ (match_operand:GPF_F16 2 "register_operand" "w")))]
"TARGET_FLOAT"
"fmul\\t%<s>0, %<s>1, %<s>2"
- [(set_attr "type" "fmul<s>")]
+ [(set_attr "type" "fmul<stype>")]
)
(define_insn "*fnmul<mode>3"
@@ -4709,9 +4751,9 @@
)
(define_expand "div<mode>3"
- [(set (match_operand:GPF 0 "register_operand")
- (div:GPF (match_operand:GPF 1 "general_operand")
- (match_operand:GPF 2 "register_operand")))]
+ [(set (match_operand:GPF_F16 0 "register_operand")
+ (div:GPF_F16 (match_operand:GPF_F16 1 "general_operand")
+ (match_operand:GPF_F16 2 "register_operand")))]
"TARGET_SIMD"
{
if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
@@ -4721,25 +4763,25 @@
})
(define_insn "*div<mode>3"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (div:GPF (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (div:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
+ (match_operand:GPF_F16 2 "register_operand" "w")))]
"TARGET_FLOAT"
"fdiv\\t%<s>0, %<s>1, %<s>2"
- [(set_attr "type" "fdiv<s>")]
+ [(set_attr "type" "fdiv<stype>")]
)
(define_insn "neg<mode>2"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (neg:GPF (match_operand:GPF 1 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
"TARGET_FLOAT"
"fneg\\t%<s>0, %<s>1"
- [(set_attr "type" "ffarith<s>")]
+ [(set_attr "type" "ffarith<stype>")]
)
(define_expand "sqrt<mode>2"
- [(set (match_operand:GPF 0 "register_operand")
- (sqrt:GPF (match_operand:GPF 1 "register_operand")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
"TARGET_FLOAT"
{
if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
@@ -4747,19 +4789,19 @@
})
(define_insn "*sqrt<mode>2"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
"TARGET_FLOAT"
"fsqrt\\t%<s>0, %<s>1"
- [(set_attr "type" "fsqrt<s>")]
+ [(set_attr "type" "fsqrt<stype>")]
)
(define_insn "abs<mode>2"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (abs:GPF (match_operand:GPF 1 "register_operand" "w")))]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
"TARGET_FLOAT"
"fabs\\t%<s>0, %<s>1"
- [(set_attr "type" "ffarith<s>")]
+ [(set_attr "type" "ffarith<stype>")]
)
;; Given that smax/smin do not specify the result when either input is NaN,
@@ -4786,13 +4828,13 @@
;; Scalar forms for the IEEE-754 fmax()/fmin() functions
(define_insn "<fmaxmin><mode>3"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w")]
+ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
+ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")
+ (match_operand:GPF_F16 2 "register_operand" "w")]
FMAXMIN))]
"TARGET_FLOAT"
"<fmaxmin_op>\\t%<s>0, %<s>1, %<s>2"
- [(set_attr "type" "f_minmax<s>")]
+ [(set_attr "type" "f_minmax<stype>")]
)
;; For copysign (x, y), we want to generate:
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
new file mode 100644
index 00000000000..4b7c2dd3bcc
--- /dev/null
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -0,0 +1,579 @@
+/* ARM FP16 scalar intrinsics include file.
+
+ Copyright (C) 2016 Free Software Foundation, Inc.
+ Contributed by ARM Ltd.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published
+ by the Free Software Foundation; either version 3, or (at your
+ option) any later version.
+
+ GCC is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
+ License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef _AARCH64_FP16_H_
+#define _AARCH64_FP16_H_
+
+#include <stdint.h>
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+typedef __fp16 float16_t;
+
+/* ARMv8.2-A FP16 one operand scalar intrinsics. */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabsh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_abshf (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vceqzh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_cmeqhf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgezh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_cmgehf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgtzh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_cmgthf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vclezh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_cmlehf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcltzh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_cmlthf_uss (__a, 0.0f);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s16 (int16_t __a)
+{
+ return __builtin_aarch64_floathihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s32 (int32_t __a)
+{
+ return __builtin_aarch64_floatsihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s64 (int64_t __a)
+{
+ return __builtin_aarch64_floatdihf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u16 (uint16_t __a)
+{
+ return __builtin_aarch64_floatunshihf_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u32 (uint32_t __a)
+{
+ return __builtin_aarch64_floatunssihf_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u64 (uint64_t __a)
+{
+ return __builtin_aarch64_floatunsdihf_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvth_s16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_fix_trunchfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_s32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_fix_trunchfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvth_s64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_fix_trunchfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvth_u16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_fixuns_trunchfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_u32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_fixuns_trunchfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvth_u64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_fixuns_trunchfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtah_s16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lroundhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtah_s32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lroundhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtah_s64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lroundhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtah_u16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lrounduhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtah_u32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lrounduhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtah_u64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lrounduhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtmh_s16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfloorhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtmh_s32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfloorhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtmh_s64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfloorhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtmh_u16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lflooruhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtmh_u32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lflooruhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtmh_u64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lflooruhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtnh_s16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfrintnhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtnh_s32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfrintnhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtnh_s64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfrintnhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtnh_u16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfrintnuhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtnh_u32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfrintnuhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtnh_u64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lfrintnuhfdi_us (__a);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvtph_s16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lceilhfhi (__a);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtph_s32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lceilhfsi (__a);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvtph_s64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lceilhfdi (__a);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvtph_u16_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lceiluhfhi_us (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtph_u32_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lceiluhfsi_us (__a);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvtph_u64_f16 (float16_t __a)
+{
+ return __builtin_aarch64_lceiluhfdi_us (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vnegh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_neghf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpeh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_frecpehf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpxh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_frecpxhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_btrunchf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndah_f16 (float16_t __a)
+{
+ return __builtin_aarch64_roundhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndih_f16 (float16_t __a)
+{
+ return __builtin_aarch64_nearbyinthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndmh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_floorhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndnh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_frintnhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndph_f16 (float16_t __a)
+{
+ return __builtin_aarch64_ceilhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrndxh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_rinthf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrsqrteh_f16 (float16_t __a)
+{
+ return __builtin_aarch64_rsqrtehf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsqrth_f16 (float16_t __a)
+{
+ return __builtin_aarch64_sqrthf (__a);
+}
+
+/* ARMv8.2-A FP16 two operands scalar intrinsics. */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vaddh_f16 (float16_t __a, float16_t __b)
+{
+ return __a + __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabdh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_fabdhf (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcageh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_facgehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcagth_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_facgthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcaleh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_faclehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcalth_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_faclthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vceqh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_cmeqhf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgeh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_cmgehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcgth_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_cmgthf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcleh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_cmlehf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vclth_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_cmlthf_uss (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s16 (int16_t __a, const int __b)
+{
+ return __builtin_aarch64_scvtfhi (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s32 (int32_t __a, const int __b)
+{
+ return __builtin_aarch64_scvtfsihf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s64 (int64_t __a, const int __b)
+{
+ return __builtin_aarch64_scvtfdihf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u16 (uint16_t __a, const int __b)
+{
+ return __builtin_aarch64_ucvtfhi_sus (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u32 (uint32_t __a, const int __b)
+{
+ return __builtin_aarch64_ucvtfsihf_sus (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u64 (uint64_t __a, const int __b)
+{
+ return __builtin_aarch64_ucvtfdihf_sus (__a, __b);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vcvth_n_s16_f16 (float16_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzshf (__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_n_s32_f16 (float16_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzshfsi (__a, __b);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vcvth_n_s64_f16 (float16_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzshfdi (__a, __b);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vcvth_n_u16_f16 (float16_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzuhf_uss (__a, __b);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvth_n_u32_f16 (float16_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzuhfsi_uss (__a, __b);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcvth_n_u64_f16 (float16_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzuhfdi_uss (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vdivh_f16 (float16_t __a, float16_t __b)
+{
+ return __a / __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_fmaxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_fmaxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_fminhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_fminhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_f16 (float16_t __a, float16_t __b)
+{
+ return __a * __b;
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_fmulxhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrecpsh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_frecpshf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vrsqrtsh_f16 (float16_t __a, float16_t __b)
+{
+ return __builtin_aarch64_rsqrtshf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vsubh_f16 (float16_t __a, float16_t __b)
+{
+ return __a - __b;
+}
+
+/* ARMv8.2-A FP16 three operands scalar intrinsics. */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+ return __builtin_aarch64_fmahf (__b, __c, __a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
+{
+ return __builtin_aarch64_fnmahf (__b, __c, __a);
+}
+
+#pragma GCC pop_options
+
+#endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 702cad69be1..d6e510c8bc4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -466,6 +466,8 @@ typedef struct poly16x8x4_t
#define __aarch64_vdup_lane_any(__size, __q, __a, __b) \
vdup##__q##_n_##__size (__aarch64_vget_lane_any (__a, __b))
+#define __aarch64_vdup_lane_f16(__a, __b) \
+ __aarch64_vdup_lane_any (f16, , __a, __b)
#define __aarch64_vdup_lane_f32(__a, __b) \
__aarch64_vdup_lane_any (f32, , __a, __b)
#define __aarch64_vdup_lane_f64(__a, __b) \
@@ -492,6 +494,8 @@ typedef struct poly16x8x4_t
__aarch64_vdup_lane_any (u64, , __a, __b)
/* __aarch64_vdup_laneq internal macros. */
+#define __aarch64_vdup_laneq_f16(__a, __b) \
+ __aarch64_vdup_lane_any (f16, , __a, __b)
#define __aarch64_vdup_laneq_f32(__a, __b) \
__aarch64_vdup_lane_any (f32, , __a, __b)
#define __aarch64_vdup_laneq_f64(__a, __b) \
@@ -518,6 +522,8 @@ typedef struct poly16x8x4_t
__aarch64_vdup_lane_any (u64, , __a, __b)
/* __aarch64_vdupq_lane internal macros. */
+#define __aarch64_vdupq_lane_f16(__a, __b) \
+ __aarch64_vdup_lane_any (f16, q, __a, __b)
#define __aarch64_vdupq_lane_f32(__a, __b) \
__aarch64_vdup_lane_any (f32, q, __a, __b)
#define __aarch64_vdupq_lane_f64(__a, __b) \
@@ -544,6 +550,8 @@ typedef struct poly16x8x4_t
__aarch64_vdup_lane_any (u64, q, __a, __b)
/* __aarch64_vdupq_laneq internal macros. */
+#define __aarch64_vdupq_laneq_f16(__a, __b) \
+ __aarch64_vdup_lane_any (f16, q, __a, __b)
#define __aarch64_vdupq_laneq_f32(__a, __b) \
__aarch64_vdup_lane_any (f32, q, __a, __b)
#define __aarch64_vdupq_laneq_f64(__a, __b) \
@@ -10369,6 +10377,12 @@ vaddvq_f64 (float64x2_t __a)
/* vbsl */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vbsl_f16 (uint16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+ return __builtin_aarch64_simd_bslv4hf_suss (__a, __b, __c);
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vbsl_f32 (uint32x2_t __a, float32x2_t __b, float32x2_t __c)
{
@@ -10444,6 +10458,12 @@ vbsl_u64 (uint64x1_t __a, uint64x1_t __b, uint64x1_t __c)
{__builtin_aarch64_simd_bsldi_uuuu (__a[0], __b[0], __c[0])};
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vbslq_f16 (uint16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+ return __builtin_aarch64_simd_bslv8hf_suss (__a, __b, __c);
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vbslq_f32 (uint32x4_t __a, float32x4_t __b, float32x4_t __c)
{
@@ -13007,6 +13027,12 @@ vcvtpq_u64_f64 (float64x2_t __a)
/* vdup_n */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_n_f16 (float16_t __a)
+{
+ return (float16x4_t) {__a, __a, __a, __a};
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vdup_n_f32 (float32_t __a)
{
@@ -13081,6 +13107,12 @@ vdup_n_u64 (uint64_t __a)
/* vdupq_n */
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_n_f16 (float16_t __a)
+{
+ return (float16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vdupq_n_f32 (float32_t __a)
{
@@ -13158,6 +13190,12 @@ vdupq_n_u64 (uint64_t __a)
/* vdup_lane */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_lane_f16 (float16x4_t __a, const int __b)
+{
+ return __aarch64_vdup_lane_f16 (__a, __b);
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vdup_lane_f32 (float32x2_t __a, const int __b)
{
@@ -13232,6 +13270,12 @@ vdup_lane_u64 (uint64x1_t __a, const int __b)
/* vdup_laneq */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_laneq_f16 (float16x8_t __a, const int __b)
+{
+ return __aarch64_vdup_laneq_f16 (__a, __b);
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vdup_laneq_f32 (float32x4_t __a, const int __b)
{
@@ -13305,6 +13349,13 @@ vdup_laneq_u64 (uint64x2_t __a, const int __b)
}
/* vdupq_lane */
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_f16 (float16x4_t __a, const int __b)
+{
+ return __aarch64_vdupq_lane_f16 (__a, __b);
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vdupq_lane_f32 (float32x2_t __a, const int __b)
{
@@ -13378,6 +13429,13 @@ vdupq_lane_u64 (uint64x1_t __a, const int __b)
}
/* vdupq_laneq */
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_laneq_f16 (float16x8_t __a, const int __b)
+{
+ return __aarch64_vdupq_laneq_f16 (__a, __b);
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vdupq_laneq_f32 (float32x4_t __a, const int __b)
{
@@ -13470,6 +13528,13 @@ vdupb_lane_u8 (uint8x8_t __a, const int __b)
}
/* vduph_lane */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vduph_lane_f16 (float16x4_t __a, const int __b)
+{
+ return __aarch64_vget_lane_any (__a, __b);
+}
+
__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
vduph_lane_p16 (poly16x4_t __a, const int __b)
{
@@ -13489,6 +13554,7 @@ vduph_lane_u16 (uint16x4_t __a, const int __b)
}
/* vdups_lane */
+
__extension__ static __inline float32_t __attribute__ ((__always_inline__))
vdups_lane_f32 (float32x2_t __a, const int __b)
{
@@ -13549,6 +13615,13 @@ vdupb_laneq_u8 (uint8x16_t __a, const int __b)
}
/* vduph_laneq */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vduph_laneq_f16 (float16x8_t __a, const int __b)
+{
+ return __aarch64_vget_lane_any (__a, __b);
+}
+
__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
vduph_laneq_p16 (poly16x8_t __a, const int __b)
{
@@ -13568,6 +13641,7 @@ vduph_laneq_u16 (uint16x8_t __a, const int __b)
}
/* vdups_laneq */
+
__extension__ static __inline float32_t __attribute__ ((__always_inline__))
vdups_laneq_f32 (float32x4_t __a, const int __b)
{
@@ -13607,6 +13681,19 @@ vdupd_laneq_u64 (uint64x2_t __a, const int __b)
/* vext */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vext_f16 (float16x4_t __a, float16x4_t __b, __const int __c)
+{
+ __AARCH64_LANE_CHECK (__a, __c);
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__b, __a,
+ (uint16x4_t) {4 - __c, 5 - __c, 6 - __c, 7 - __c});
+#else
+ return __builtin_shuffle (__a, __b,
+ (uint16x4_t) {__c, __c + 1, __c + 2, __c + 3});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vext_f32 (float32x2_t __a, float32x2_t __b, __const int __c)
{
@@ -13738,6 +13825,22 @@ vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
return __a;
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vextq_f16 (float16x8_t __a, float16x8_t __b, __const int __c)
+{
+ __AARCH64_LANE_CHECK (__a, __c);
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__b, __a,
+ (uint16x8_t) {8 - __c, 9 - __c, 10 - __c, 11 - __c,
+ 12 - __c, 13 - __c, 14 - __c,
+ 15 - __c});
+#else
+ return __builtin_shuffle (__a, __b,
+ (uint16x8_t) {__c, __c + 1, __c + 2, __c + 3,
+ __c + 4, __c + 5, __c + 6, __c + 7});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vextq_f32 (float32x4_t __a, float32x4_t __b, __const int __c)
{
@@ -14373,8 +14476,7 @@ vld1q_u64 (const uint64_t *a)
__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
vld1_dup_f16 (const float16_t* __a)
{
- float16_t __f = *__a;
- return (float16x4_t) { __f, __f, __f, __f };
+ return vdup_n_f16 (*__a);
}
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
@@ -14454,8 +14556,7 @@ vld1_dup_u64 (const uint64_t* __a)
__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
vld1q_dup_f16 (const float16_t* __a)
{
- float16_t __f = *__a;
- return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+ return vdupq_n_f16 (*__a);
}
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
@@ -18058,6 +18159,12 @@ vmlsq_laneq_u32 (uint32x4_t __a, uint32x4_t __b,
/* vmov_n_ */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmov_n_f16 (float16_t __a)
+{
+ return vdup_n_f16 (__a);
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vmov_n_f32 (float32_t __a)
{
@@ -18130,6 +18237,12 @@ vmov_n_u64 (uint64_t __a)
return (uint64x1_t) {__a};
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmovq_n_f16 (float16_t __a)
+{
+ return vdupq_n_f16 (__a);
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vmovq_n_f32 (float32_t __a)
{
@@ -20887,6 +21000,12 @@ vrev32q_u16 (uint16x8_t a)
return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
}
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrev64_f16 (float16x4_t __a)
+{
+ return __builtin_shuffle (__a, (uint16x4_t) { 3, 2, 1, 0 });
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vrev64_f32 (float32x2_t a)
{
@@ -20941,6 +21060,12 @@ vrev64_u32 (uint32x2_t a)
return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrev64q_f16 (float16x8_t __a)
+{
+ return __builtin_shuffle (__a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vrev64q_f32 (float32x4_t a)
{
@@ -23893,6 +24018,16 @@ vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx)
/* vtrn */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vtrn1_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vtrn1_f32 (float32x2_t __a, float32x2_t __b)
{
@@ -23983,6 +24118,16 @@ vtrn1_u32 (uint32x2_t __a, uint32x2_t __b)
#endif
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vtrn1q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vtrn1q_f32 (float32x4_t __a, float32x4_t __b)
{
@@ -24109,6 +24254,16 @@ vtrn1q_u64 (uint64x2_t __a, uint64x2_t __b)
#endif
}
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vtrn2_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vtrn2_f32 (float32x2_t __a, float32x2_t __b)
{
@@ -24199,6 +24354,16 @@ vtrn2_u32 (uint32x2_t __a, uint32x2_t __b)
#endif
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vtrn2q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vtrn2q_f32 (float32x4_t __a, float32x4_t __b)
{
@@ -24325,6 +24490,12 @@ vtrn2q_u64 (uint64x2_t __a, uint64x2_t __b)
#endif
}
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vtrn_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return (float16x4x2_t) {vtrn1_f16 (__a, __b), vtrn2_f16 (__a, __b)};
+}
+
__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
vtrn_f32 (float32x2_t a, float32x2_t b)
{
@@ -24379,6 +24550,12 @@ vtrn_u32 (uint32x2_t a, uint32x2_t b)
return (uint32x2x2_t) {vtrn1_u32 (a, b), vtrn2_u32 (a, b)};
}
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vtrnq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return (float16x8x2_t) {vtrn1q_f16 (__a, __b), vtrn2q_f16 (__a, __b)};
+}
+
__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
vtrnq_f32 (float32x4_t a, float32x4_t b)
{
@@ -24627,6 +24804,7 @@ vuqaddd_s64 (int64_t __a, uint64_t __b)
}
#define __INTERLEAVE_LIST(op) \
+ __DEFINTERLEAVE (op, float16x4x2_t, float16x4_t, f16,) \
__DEFINTERLEAVE (op, float32x2x2_t, float32x2_t, f32,) \
__DEFINTERLEAVE (op, poly8x8x2_t, poly8x8_t, p8,) \
__DEFINTERLEAVE (op, poly16x4x2_t, poly16x4_t, p16,) \
@@ -24636,6 +24814,7 @@ vuqaddd_s64 (int64_t __a, uint64_t __b)
__DEFINTERLEAVE (op, uint8x8x2_t, uint8x8_t, u8,) \
__DEFINTERLEAVE (op, uint16x4x2_t, uint16x4_t, u16,) \
__DEFINTERLEAVE (op, uint32x2x2_t, uint32x2_t, u32,) \
+ __DEFINTERLEAVE (op, float16x8x2_t, float16x8_t, f16, q) \
__DEFINTERLEAVE (op, float32x4x2_t, float32x4_t, f32, q) \
__DEFINTERLEAVE (op, poly8x16x2_t, poly8x16_t, p8, q) \
__DEFINTERLEAVE (op, poly16x8x2_t, poly16x8_t, p16, q) \
@@ -24648,6 +24827,16 @@ vuqaddd_s64 (int64_t __a, uint64_t __b)
/* vuzp */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vuzp1_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vuzp1_f32 (float32x2_t __a, float32x2_t __b)
{
@@ -24738,6 +24927,16 @@ vuzp1_u32 (uint32x2_t __a, uint32x2_t __b)
#endif
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vuzp1q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vuzp1q_f32 (float32x4_t __a, float32x4_t __b)
{
@@ -24864,6 +25063,16 @@ vuzp1q_u64 (uint64x2_t __a, uint64x2_t __b)
#endif
}
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vuzp2_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vuzp2_f32 (float32x2_t __a, float32x2_t __b)
{
@@ -24954,6 +25163,16 @@ vuzp2_u32 (uint32x2_t __a, uint32x2_t __b)
#endif
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vuzp2q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vuzp2q_f32 (float32x4_t __a, float32x4_t __b)
{
@@ -25084,6 +25303,16 @@ __INTERLEAVE_LIST (uzp)
/* vzip */
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vzip1_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vzip1_f32 (float32x2_t __a, float32x2_t __b)
{
@@ -25174,6 +25403,18 @@ vzip1_u32 (uint32x2_t __a, uint32x2_t __b)
#endif
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vzip1q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b,
+ (uint16x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
+#else
+ return __builtin_shuffle (__a, __b,
+ (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vzip1q_f32 (float32x4_t __a, float32x4_t __b)
{
@@ -25303,6 +25544,16 @@ vzip1q_u64 (uint64x2_t __a, uint64x2_t __b)
#endif
}
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vzip2_f16 (float16x4_t __a, float16x4_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+#else
+ return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+#endif
+}
+
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vzip2_f32 (float32x2_t __a, float32x2_t __b)
{
@@ -25393,6 +25644,18 @@ vzip2_u32 (uint32x2_t __a, uint32x2_t __b)
#endif
}
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vzip2q_f16 (float16x8_t __a, float16x8_t __b)
+{
+#ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b,
+ (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+#else
+ return __builtin_shuffle (__a, __b,
+ (uint16x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+#endif
+}
+
__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
vzip2q_f32 (float32x4_t __a, float32x4_t __b)
{
@@ -25529,9 +25792,1015 @@ __INTERLEAVE_LIST (zip)
/* End of optimal implementations in approved order. */
+#pragma GCC pop_options
+
+/* ARMv8.2-A FP16 intrinsics. */
+
+#include "arm_fp16.h"
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+/* ARMv8.2-A FP16 one operand vector intrinsics. */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabs_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_absv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabsq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_absv8hf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceqz_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_cmeqv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqzq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_cmeqv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgez_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_cmgev4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgezq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_cmgev8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgtz_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_cmgtv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtzq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_cmgtv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclez_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_cmlev4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vclezq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_cmlev8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcltz_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_cmltv4hf_uss (__a, vdup_n_f16 (0.0f));
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltzq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_cmltv8hf_uss (__a, vdupq_n_f16 (0.0f));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_s16 (int16x4_t __a)
+{
+ return __builtin_aarch64_floatv4hiv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_s16 (int16x8_t __a)
+{
+ return __builtin_aarch64_floatv8hiv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_u16 (uint16x4_t __a)
+{
+ return __builtin_aarch64_floatunsv4hiv4hf ((int16x4_t) __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_f16_u16 (uint16x8_t __a)
+{
+ return __builtin_aarch64_floatunsv8hiv8hf ((int16x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_s16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lbtruncv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_s16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lbtruncv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_u16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lbtruncuv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_u16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lbtruncuv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvta_s16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lroundv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtaq_s16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lroundv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvta_u16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lrounduv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtaq_u16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lrounduv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtm_s16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lfloorv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtmq_s16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lfloorv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtm_u16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lflooruv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtmq_u16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lflooruv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtn_s16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lfrintnv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtnq_s16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lfrintnv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtn_u16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lfrintnuv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtnq_u16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lfrintnuv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvtp_s16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lceilv4hfv4hi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtpq_s16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lceilv8hfv8hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvtp_u16_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_lceiluv4hfv4hi_us (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtpq_u16_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_lceiluv8hfv8hi_us (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vneg_f16 (float16x4_t __a)
+{
+ return -__a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vnegq_f16 (float16x8_t __a)
+{
+ return -__a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecpe_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_frecpev4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpeq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_frecpev8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnd_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_btruncv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_btruncv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrnda_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_roundv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndaq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_roundv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndi_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_nearbyintv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndiq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_nearbyintv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndm_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_floorv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndmq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_floorv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndn_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_frintnv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndnq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_frintnv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndp_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_ceilv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndpq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_ceilv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrndx_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_rintv4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrndxq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_rintv8hf (__a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrte_f16 (float16x4_t a)
+{
+ return __builtin_aarch64_rsqrtev4hf (a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrteq_f16 (float16x8_t a)
+{
+ return __builtin_aarch64_rsqrtev8hf (a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsqrt_f16 (float16x4_t a)
+{
+ return __builtin_aarch64_sqrtv4hf (a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsqrtq_f16 (float16x8_t a)
+{
+ return __builtin_aarch64_sqrtv8hf (a);
+}
+
+/* ARMv8.2-A FP16 two operands vector intrinsics. */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vadd_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __a + __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __a + __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vabd_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_fabdv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vabdq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_fabdv8hf (a, b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcage_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_facgev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcageq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_facgev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcagt_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_facgtv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcagtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_facgtv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcale_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_faclev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_faclev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcalt_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_facltv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcaltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_facltv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_cmeqv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_cmeqv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_cmgev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_cmgev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_cmgtv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_cmgtv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_cmlev4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_cmlev8hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_cmltv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_cmltv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_s16 (int16x4_t __a, const int __b)
+{
+ return __builtin_aarch64_scvtfv4hi (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
+{
+ return __builtin_aarch64_scvtfv8hi (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
+{
+ return __builtin_aarch64_ucvtfv4hi_sus (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
+{
+ return __builtin_aarch64_ucvtfv8hi_sus (__a, __b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcvt_n_s16_f16 (float16x4_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzsv4hf (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzsv8hf (__a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcvt_n_u16_f16 (float16x4_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzuv4hf_uss (__a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
+{
+ return __builtin_aarch64_fcvtzuv8hf_uss (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdiv_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __a / __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdivq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __a / __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmax_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_smax_nanv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_smax_nanv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_fmaxv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_fmaxv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmin_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_smin_nanv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_smin_nanv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vminnm_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_fminv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vminnmq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_fminv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __a * __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __a * __b;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_fmulxv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_fmulxv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpadd_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_faddpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpaddq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_faddpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmax_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_smax_nanpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpmaxq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_smax_nanpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmaxnm_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_smaxpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpmaxnmq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_smaxpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpmin_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_smin_nanpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpminq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_smin_nanpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vpminnm_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_sminpv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vpminnmq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_sminpv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrecps_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __builtin_aarch64_frecpsv4hf (__a, __b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __builtin_aarch64_frecpsv8hf (__a, __b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vrsqrts_f16 (float16x4_t a, float16x4_t b)
+{
+ return __builtin_aarch64_rsqrtsv4hf (a, b);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vrsqrtsq_f16 (float16x8_t a, float16x8_t b)
+{
+ return __builtin_aarch64_rsqrtsv8hf (a, b);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vsub_f16 (float16x4_t __a, float16x4_t __b)
+{
+ return __a - __b;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsubq_f16 (float16x8_t __a, float16x8_t __b)
+{
+ return __a - __b;
+}
+
+/* ARMv8.2-A FP16 three operands vector intrinsics. */
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+ return __builtin_aarch64_fmav4hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+ return __builtin_aarch64_fmav8hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+{
+ return __builtin_aarch64_fnmav4hf (__b, __c, __a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+{
+ return __builtin_aarch64_fnmav8hf (__b, __c, __a);
+}
+
+/* ARMv8.2-A FP16 lane vector intrinsics. */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_lane_f16 (float16_t __a, float16_t __b,
+ float16x4_t __c, const int __lane)
+{
+ return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmah_laneq_f16 (float16_t __a, float16_t __b,
+ float16x8_t __c, const int __lane)
+{
+ return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_lane_f16 (float16x4_t __a, float16x4_t __b,
+ float16x4_t __c, const int __lane)
+{
+ return vfma_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_lane_f16 (float16x8_t __a, float16x8_t __b,
+ float16x4_t __c, const int __lane)
+{
+ return vfmaq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_laneq_f16 (float16x4_t __a, float16x4_t __b,
+ float16x8_t __c, const int __lane)
+{
+ return vfma_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_laneq_f16 (float16x8_t __a, float16x8_t __b,
+ float16x8_t __c, const int __lane)
+{
+ return vfmaq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfma_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c)
+{
+ return vfma_f16 (__a, __b, vdup_n_f16 (__c));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmaq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
+{
+ return vfmaq_f16 (__a, __b, vdupq_n_f16 (__c));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_lane_f16 (float16_t __a, float16_t __b,
+ float16x4_t __c, const int __lane)
+{
+ return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vfmsh_laneq_f16 (float16_t __a, float16_t __b,
+ float16x8_t __c, const int __lane)
+{
+ return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_lane_f16 (float16x4_t __a, float16x4_t __b,
+ float16x4_t __c, const int __lane)
+{
+ return vfms_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_lane_f16 (float16x8_t __a, float16x8_t __b,
+ float16x4_t __c, const int __lane)
+{
+ return vfmsq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_laneq_f16 (float16x4_t __a, float16x4_t __b,
+ float16x8_t __c, const int __lane)
+{
+ return vfms_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_laneq_f16 (float16x8_t __a, float16x8_t __b,
+ float16x8_t __c, const int __lane)
+{
+ return vfmsq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vfms_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c)
+{
+ return vfms_f16 (__a, __b, vdup_n_f16 (__c));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vfmsq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
+{
+ return vfmsq_f16 (__a, __b, vdupq_n_f16 (__c));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane)
+{
+ return __a * __aarch64_vget_lane_any (__b, __lane);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
+{
+ return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
+{
+ return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane)
+{
+ return __a * __aarch64_vget_lane_any (__b, __lane);
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
+{
+ return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane)
+{
+ return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmul_n_f16 (float16x4_t __a, float16_t __b)
+{
+ return vmul_lane_f16 (__a, vdup_n_f16 (__b), 0);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulq_n_f16 (float16x8_t __a, float16_t __b)
+{
+ return vmulq_laneq_f16 (__a, vdupq_n_f16 (__b), 0);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane)
+{
+ return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
+{
+ return vmulx_f16 (__a, __aarch64_vdup_lane_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
+{
+ return vmulxq_f16 (__a, __aarch64_vdupq_lane_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmulxh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane)
+{
+ return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
+{
+ return vmulx_f16 (__a, __aarch64_vdup_laneq_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane)
+{
+ return vmulxq_f16 (__a, __aarch64_vdupq_laneq_f16 (__b, __lane));
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vmulx_n_f16 (float16x4_t __a, float16_t __b)
+{
+ return vmulx_f16 (__a, vdup_n_f16 (__b));
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vmulxq_n_f16 (float16x8_t __a, float16_t __b)
+{
+ return vmulxq_f16 (__a, vdupq_n_f16 (__b));
+}
+
+/* ARMv8.2-A FP16 reduction vector intrinsics. */
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxv_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_reduc_smax_nan_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxvq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_reduc_smax_nan_scal_v8hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminv_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_reduc_smin_nan_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminvq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_reduc_smin_nan_scal_v8hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmv_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_reduc_smax_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vmaxnmvq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_reduc_smax_scal_v8hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmv_f16 (float16x4_t __a)
+{
+ return __builtin_aarch64_reduc_smin_scal_v4hf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vminnmvq_f16 (float16x8_t __a)
+{
+ return __builtin_aarch64_reduc_smin_scal_v8hf (__a);
+}
+
+#pragma GCC pop_options
+
#undef __aarch64_vget_lane_any
#undef __aarch64_vdup_lane_any
+#undef __aarch64_vdup_lane_f16
#undef __aarch64_vdup_lane_f32
#undef __aarch64_vdup_lane_f64
#undef __aarch64_vdup_lane_p8
@@ -25544,6 +26813,7 @@ __INTERLEAVE_LIST (zip)
#undef __aarch64_vdup_lane_u16
#undef __aarch64_vdup_lane_u32
#undef __aarch64_vdup_lane_u64
+#undef __aarch64_vdup_laneq_f16
#undef __aarch64_vdup_laneq_f32
#undef __aarch64_vdup_laneq_f64
#undef __aarch64_vdup_laneq_p8
@@ -25556,6 +26826,7 @@ __INTERLEAVE_LIST (zip)
#undef __aarch64_vdup_laneq_u16
#undef __aarch64_vdup_laneq_u32
#undef __aarch64_vdup_laneq_u64
+#undef __aarch64_vdupq_lane_f16
#undef __aarch64_vdupq_lane_f32
#undef __aarch64_vdupq_lane_f64
#undef __aarch64_vdupq_lane_p8
@@ -25568,6 +26839,7 @@ __INTERLEAVE_LIST (zip)
#undef __aarch64_vdupq_lane_u16
#undef __aarch64_vdupq_lane_u32
#undef __aarch64_vdupq_lane_u64
+#undef __aarch64_vdupq_laneq_f16
#undef __aarch64_vdupq_laneq_f32
#undef __aarch64_vdupq_laneq_f64
#undef __aarch64_vdupq_laneq_p8
@@ -25581,6 +26853,4 @@ __INTERLEAVE_LIST (zip)
#undef __aarch64_vdupq_laneq_u32
#undef __aarch64_vdupq_laneq_u64
-#pragma GCC pop_options
-
#endif
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index ef48ffda6f9..5e8b0ad9cee 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -26,6 +26,9 @@
;; Iterator for General Purpose Integer registers (32- and 64-bit modes)
(define_mode_iterator GPI [SI DI])
+;; Iterator for HI, SI, DI, some instructions can only work on these modes.
+(define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
+
;; Iterator for QI and HI modes
(define_mode_iterator SHORT [QI HI])
@@ -38,6 +41,9 @@
;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes)
(define_mode_iterator GPF [SF DF])
+;; Iterator for all scalar floating point modes (HF, SF, DF)
+(define_mode_iterator GPF_F16 [(HF "AARCH64_ISA_F16") SF DF])
+
;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
(define_mode_iterator GPF_TF_F16 [HF SF DF TF])
@@ -88,11 +94,22 @@
;; Vector Float modes suitable for moving, loading and storing.
(define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
-;; Vector Float modes, barring HF modes.
+;; Vector Float modes.
(define_mode_iterator VDQF [V2SF V4SF V2DF])
+(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
+ (V8HF "TARGET_SIMD_F16INST")
+ V2SF V4SF V2DF])
;; Vector Float modes, and DF.
(define_mode_iterator VDQF_DF [V2SF V4SF V2DF DF])
+(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
+ (V8HF "TARGET_SIMD_F16INST")
+ V2SF V4SF V2DF DF])
+(define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
+ (V8HF "TARGET_SIMD_F16INST")
+ V2SF V4SF V2DF
+ (HF "TARGET_SIMD_F16INST")
+ SF DF])
;; Vector single Float modes.
(define_mode_iterator VDQSF [V2SF V4SF])
@@ -150,6 +167,8 @@
;; Vector modes except double int.
(define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
+(define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI
+ V4HF V8HF V2SF V4SF V2DF])
;; Vector modes for S type.
(define_mode_iterator VDQ_SI [V2SI V4SI])
@@ -157,9 +176,21 @@
;; Vector modes for S and D
(define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
+;; Vector modes for H, S and D
+(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
+ (V8HI "TARGET_SIMD_F16INST")
+ V2SI V4SI V2DI])
+
;; Scalar and Vector modes for S and D
(define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
+;; Scalar and Vector modes for S and D, Vector modes for H.
+(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
+ (V8HI "TARGET_SIMD_F16INST")
+ V2SI V4SI V2DI
+ (HI "TARGET_SIMD_F16INST")
+ SI DI])
+
;; Vector modes for Q and H types.
(define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
@@ -199,7 +230,10 @@
(define_mode_iterator DX [DI DF])
;; Modes available for <f>mul lane operations.
-(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
+(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
+ (V4HF "TARGET_SIMD_F16INST")
+ (V8HF "TARGET_SIMD_F16INST")
+ V2SF V4SF V2DF])
;; Modes available for <f>mul lane operations changing lane count.
(define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF])
@@ -348,8 +382,8 @@
(define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
;; For inequal width int to float conversion
-(define_mode_attr w1 [(SF "w") (DF "x")])
-(define_mode_attr w2 [(SF "x") (DF "w")])
+(define_mode_attr w1 [(HF "w") (SF "w") (DF "x")])
+(define_mode_attr w2 [(HF "x") (SF "x") (DF "w")])
(define_mode_attr short_mask [(HI "65535") (QI "255")])
@@ -361,12 +395,13 @@
;; For scalar usage of vector/FP registers
(define_mode_attr v [(QI "b") (HI "h") (SI "s") (DI "d")
- (SF "s") (DF "d")
+ (HF "h") (SF "s") (DF "d")
(V8QI "") (V16QI "")
(V4HI "") (V8HI "")
(V2SI "") (V4SI "")
(V2DI "") (V2SF "")
- (V4SF "") (V2DF "")])
+ (V4SF "") (V4HF "")
+ (V8HF "") (V2DF "")])
;; For scalar usage of vector/FP registers, narrowing
(define_mode_attr vn2 [(QI "") (HI "b") (SI "h") (DI "s")
@@ -391,7 +426,7 @@
(define_mode_attr vas [(DI "") (SI ".2s")])
;; Map a floating point mode to the appropriate register name prefix
-(define_mode_attr s [(SF "s") (DF "d")])
+(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
;; Give the length suffix letter for a sign- or zero-extension.
(define_mode_attr size [(QI "b") (HI "h") (SI "w")])
@@ -427,8 +462,8 @@
(V4SF ".4s") (V2DF ".2d")
(DI "") (SI "")
(HI "") (QI "")
- (TI "") (SF "")
- (DF "")])
+ (TI "") (HF "")
+ (SF "") (DF "")])
;; Register suffix narrowed modes for VQN.
(define_mode_attr Vmntype [(V8HI ".8b") (V4SI ".4h")
@@ -443,10 +478,21 @@
(V2DI "d") (V4HF "h")
(V8HF "h") (V2SF "s")
(V4SF "s") (V2DF "d")
+ (HF "h")
(SF "s") (DF "d")
(QI "b") (HI "h")
(SI "s") (DI "d")])
+;; Vetype is used everywhere in scheduling type and assembly output,
+;; sometimes they are not the same, for example HF modes on some
+;; instructions. stype is defined to represent scheduling type
+;; more accurately.
+(define_mode_attr stype [(V8QI "b") (V16QI "b") (V4HI "s") (V8HI "s")
+ (V2SI "s") (V4SI "s") (V2DI "d") (V4HF "s")
+ (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d")
+ (HF "s") (SF "s") (DF "d") (QI "b") (HI "s")
+ (SI "s") (DI "d")])
+
;; Mode-to-bitwise operation type mapping.
(define_mode_attr Vbtype [(V8QI "8b") (V16QI "16b")
(V4HI "8b") (V8HI "16b")
@@ -604,7 +650,7 @@
(V4HF "V4HI") (V8HF "V8HI")
(V2SF "V2SI") (V4SF "V4SI")
(V2DF "V2DI") (DF "DI")
- (SF "SI")])
+ (SF "SI") (HF "HI")])
;; Lower case mode of results of comparison operations.
(define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
@@ -656,15 +702,19 @@
(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")
(V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
- (SF "si") (DF "di") (SI "sf") (DI "df")])
+ (SF "si") (DF "di") (SI "sf") (DI "df")
+ (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf")
+ (V8HI "v8hf") (HF "hi") (HI "hf")])
(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
(V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
- (SF "SI") (DF "DI") (SI "SF") (DI "DF")])
+ (SF "SI") (DF "DI") (SI "SF") (DI "DF")
+ (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF")
+ (V8HI "V8HF") (HF "HI") (HI "HF")])
;; for the inequal width integer to fp conversions
-(define_mode_attr fcvt_iesize [(SF "di") (DF "si")])
-(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")])
+(define_mode_attr fcvt_iesize [(HF "di") (SF "di") (DF "si")])
+(define_mode_attr FCVT_IESIZE [(HF "DI") (SF "DI") (DF "SI")])
(define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI")
(V4HI "V8HI") (V8HI "V4HI")
@@ -687,6 +737,7 @@
;; the 'x' constraint. All other modes may use the 'w' constraint.
(define_mode_attr h_con [(V2SI "w") (V4SI "w")
(V4HI "x") (V8HI "x")
+ (V4HF "w") (V8HF "w")
(V2SF "w") (V4SF "w")
(V2DF "w") (DF "w")])
@@ -695,6 +746,7 @@
(V4HI "") (V8HI "")
(V2SI "") (V4SI "")
(DI "") (V2DI "")
+ (V4HF "f") (V8HF "f")
(V2SF "f") (V4SF "f")
(V2DF "f") (DF "f")])
@@ -703,6 +755,7 @@
(V4HI "") (V8HI "")
(V2SI "") (V4SI "")
(DI "") (V2DI "")
+ (V4HF "_fp") (V8HF "_fp")
(V2SF "_fp") (V4SF "_fp")
(V2DF "_fp") (DF "_fp")
(SF "_fp")])
@@ -715,13 +768,14 @@
(V4HF "") (V8HF "_q")
(V2SF "") (V4SF "_q")
(V2DF "_q")
- (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")])
+ (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")])
(define_mode_attr vp [(V8QI "v") (V16QI "v")
(V4HI "v") (V8HI "v")
(V2SI "p") (V4SI "v")
- (V2DI "p") (V2DF "p")
- (V2SF "p") (V4SF "v")])
+ (V2DI "p") (V2DF "p")
+ (V2SF "p") (V4SF "v")
+ (V4HF "v") (V8HF "v")])
(define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
(define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ce614b453dc..a30231d7c03 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12957,7 +12957,10 @@ more feature modifiers. This option has the form
@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a} or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
+
+The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
+support for the ARMv8.2-A architecture extensions.
The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler
support for the ARMv8.1 architecture extension. In particular, it
@@ -13064,6 +13067,8 @@ instructions. This is on by default for all possible values for options
@item lse
Enable Large System Extension instructions. This is on by default for
@option{-march=armv8.1-a}.
+@item fp16
+Enable FP16 extension. This also enables floating-point instructions.
@end table