diff options
author | Christophe Lyon <christophe.lyon@linaro.org> | 2016-08-25 15:38:54 +0200 |
---|---|---|
committer | Yvan Roux <yvan.roux@linaro.org> | 2016-09-07 22:08:35 +0200 |
commit | e59b2ff1fdebf862212b8cefd8e58a7ee73fabe0 (patch) | |
tree | 3fce536ab814a5871acd9655ca71137a60e98940 | |
parent | 3046e9ae43d584f70c7d979634243fee50f7cecb (diff) |
gcc/
Backport from trunk r237956.
2016-07-04 Matthew Wahab <matthew.wahab@arm.com>
Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-arches.def: Add "armv8.2-a".
* config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
(AARCH64_FL_F16): New.
(AARCH64_FL_FOR_ARCH8_2): New.
(AARCH64_ISA_8_2): New.
(AARCH64_ISA_F16): New.
(TARGET_FP_F16INST): New.
(TARGET_SIMD_F16INST): New.
* config/aarch64/aarch64-option-extensions.def ("fp16"): New entry.
("fp"): Disabling "fp" also disables "fp16".
* config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): Conditionally define
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC and __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
* doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16".
gcc/
Backport from trunk r238715.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd.md
(aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>): Use VALL_F16.
(aarch64_ext<mode>): Likewise.
(aarch64_rev<REVERSE:rev_op><mode>): Likewise.
* config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp,
aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev): Support V4HFmode
and V8HFmode.
* config/aarch64/arm_neon.h (__INTERLEAVE_LIST): Support float16x4_t,
float16x8_t.
(__aarch64_vdup_lane_f16, __aarch64_vdup_laneq_f16,
__aarch64_vdupq_lane_f16, __aarch64_vdupq_laneq_f16, vbsl_f16,
vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16, vdup_laneq_f16,
vdupq_lane_f16, vdupq_laneq_f16, vduph_lane_f16, vduph_laneq_f16,
vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16, vrev64_f16, vrev64q_f16,
vtrn1_f16, vtrn1q_f16, vtrn2_f16, vtrn2q_f16, vtrn_f16, vtrnq_f16,
vuzp1_f16, vuzp1q_f16, vuzp2_f16, vuzp2q_f16, vzip1_f16, vzip2q_f16):
New.
(vmov_n_f16): Reimplement using vdup_n_f16.
(vmovq_n_f16): Reimplement using vdupq_n_f16.
gcc/
Backport from trunk r238716.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_rsqrte<mode>): Extend to HF modes.
(neg<mode>2): Likewise.
(abs<mode>2): Likewise.
(<frint_pattern><mode>2): Likewise.
(l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2): Likewise.
(<optab><VDQF:mode><fcvt_target>2): Likewise.
(<fix_trunc_optab><VDQF:mode><fcvt_target>2): Likewise.
(ftrunc<VDQF:mode>2): Likewise.
(<optab><fcvt_target><VDQF:mode>2): Likewise.
(sqrt<mode>2): Likewise.
(*sqrt<mode>2): Likewise.
(aarch64_frecpe<mode>): Likewise.
(aarch64_cm<optab><mode>): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return false for
HF, V4HF and V8HF.
* config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New.
(VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to HF modes.
(stype): New.
* config/aarch64/arm_neon.h (vdup_n_f16): New.
(vdupq_n_f16): Likewise.
(vld1_dup_f16): Use vdup_n_f16.
(vld1q_dup_f16): Use vdupq_n_f16.
(vabs_f16, vabsq_f16, vceqz_f16, vceqzq_f16, vcgez_f16, vcgezq_f16,
vcgtz_f16, vcgtzq_f16, vclez_f16, vclezq_f16, vcltz_f16, vcltzq_f16,
vcvt_f16_s16, vcvtq_f16_s16, vcvt_f16_u16, vcvtq_f16_u16, vcvt_s16_f16,
vcvtq_s16_f16, vcvt_u16_f16, vcvtq_u16_f16, vcvta_s16_f16,
vcvtaq_s16_f16, vcvta_u16_f16, vcvtaq_u16_f16, vcvtm_s16_f16,
vcvtmq_s16_f16, vcvtm_u16_f16, vcvtmq_u16_f16, vcvtn_s16_f16,
vcvtnq_s16_f16, vcvtn_u16_f16, vcvtnq_u16_f16, vcvtp_s16_f16,
vcvtpq_s16_f16, vcvtp_u16_f16, vcvtpq_u16_f16, vneg_f16, vnegq_f16,
vrecpe_f16, vrecpeq_f16, vrnd_f16, vrndq_f16, vrnda_f16, vrndaq_f16,
vrndi_f16, vrndiq_f16, vrndm_f16, vrndmq_f16, vrndn_f16, vrndnq_f16,
vrndp_f16, vrndpq_f16, vrndx_f16, vrndxq_f16, vrsqrte_f16, vrsqrteq_f16,
vsqrt_f16, vsqrtq_f16): New.
gcc/
Backport from trunk r238717.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md
(aarch64_rsqrts<mode>): Extend to HF modes.
(fabd<mode>3): Likewise.
(<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_SDF:mode>3): Likewise.
(<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_SDI:mode>3): Likewise.
(aarch64_<maxmin_uns>p<mode>): Likewise.
(<su><maxmin><mode>3): Likewise.
(<maxmin_uns><mode>3): Likewise.
(<fmaxmin><mode>3): Likewise.
(aarch64_faddp<mode>): Likewise.
(aarch64_fmulx<mode>): Likewise.
(aarch64_frecps<mode>): Likewise.
(*aarch64_fac<optab><mode>): Rename to aarch64_fac<optab><mode>.
(add<mode>3): Extend to HF modes.
(sub<mode>3): Likewise.
(mul<mode>3): Likewise.
(div<mode>3): Likewise.
(*div<mode>3): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Return false for
HF, V4HF and V8HF.
* config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode iterator.
* config/aarch64/arm_neon.h (vadd_f16, vaddq_f16, vabd_f16, vabdq_f16,
vcage_f16, vcageq_f16, vcagt_f16, vcagtq_f16, vcale_f16, vcaleq_f16,
vcalt_f16, vcaltq_f16, vceq_f16, vceqq_f16, vcge_f16, vcgeq_f16,
vcgt_f16, vcgtq_f16, vcle_f16, vcleq_f16, vclt_f16, vcltq_f16,
vcvt_n_f16_s16, vcvtq_n_f16_s16, vcvt_n_f16_u16, vcvtq_n_f16_u16,
vcvt_n_s16_f16, vcvtq_n_s16_f16, vcvt_n_u16_f16, vcvtq_n_u16_f16,
vdiv_f16, vdivq_f16, vdup_lane_f16, vdup_laneq_f16, vdupq_lane_f16,
vdupq_laneq_f16, vdups_lane_f16, vdups_laneq_f16, vmax_f16, vmaxq_f16,
vmaxnm_f16, vmaxnmq_f16, vmin_f16, vminq_f16, vminnm_f16, vminnmq_f16,
vmul_f16, vmulq_f16, vmulx_f16, vmulxq_f16, vpadd_f16, vpaddq_f16,
vpmax_f16, vpmaxq_f16, vpmaxnm_f16, vpmaxnmq_f16, vpmin_f16, vpminq_f16,
vpminnm_f16, vpminnmq_f16, vrecps_f16, vrecpsq_f16, vrsqrts_f16,
vrsqrtsq_f16, vsub_f16, vsubq_f16): New.
gcc/
Backport from trunk r238718.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (fma<mode>4, fnma<mode>4): Extend to HF
modes.
* config/aarch64/arm_neon.h (vfma_f16, vfmaq_f16, vfms_f16,
vfmsq_f16): New.
gcc/
Backport from trunk r238719.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_to_64v2df): Rename to
"*aarch64_mulx_elt_from_dup<mode>".
(*aarch64_mul3_elt<mode>): Update schedule type.
(*aarch64_mul3_elt_from_dup<mode>): Likewise.
(*aarch64_fma4_elt_from_dup<mode>): Likewise.
(*aarch64_fnma4_elt_from_dup<mode>): Likewise.
* config/aarch64/iterators.md (VMUL): Supprt half precision float modes.
(f, fp): Support HF modes.
* config/aarch64/arm_neon.h (vfma_lane_f16, vfmaq_lane_f16,
vfma_laneq_f16, vfmaq_laneq_f16, vfma_n_f16, vfmaq_n_f16, vfms_lane_f16,
vfmsq_lane_f16, vfms_laneq_f16, vfmsq_laneq_f16, vfms_n_f16,
vfmsq_n_f16, vmul_lane_f16, vmulq_lane_f16, vmul_laneq_f16,
vmulq_laneq_f16, vmul_n_f16, vmulq_n_f16, vmulx_lane_f16,
vmulxq_lane_f16, vmulx_laneq_f16, vmulxq_laneq_f16): New.
gcc/
Backport from trunk r238721.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd-builtins.def (reduc_smax_scal_,
reduc_smin_scal_): Use VDQIF_F16.
(reduc_smax_nan_scal_, reduc_smin_nan_scal_): Use VHSDF.
* config/aarch64/aarch64-simd.md (reduc_<maxmin_uns>_scal_<mode>):
Use VHSDF.
(aarch64_reduc_<maxmin_uns>_internal<mode>): Likewise.
* config/aarch64/iterators.md (VDQIF_F16): New.
(vp): Support HF modes.
* config/aarch64/arm_neon.h (vmaxv_f16, vmaxvq_f16, vminv_f16,
vminvq_f16, vmaxnmv_f16, vmaxnmvq_f16, vminnmv_f16, vminnmvq_f16): New.
gcc/
Backport from trunk r238722.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config.gcc (aarch64*-*-*): Install arm_fp16.h.
* config/aarch64/aarch64-builtins.c (hi_UP): New.
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64-simd.md (aarch64_frsqrte<mode>): Extend to HF
mode.
(aarch64_frecp<FRECP:frecp_suffix><mode>): Likewise.
(aarch64_cm<optab><mode>): Likewise.
* config/aarch64/aarch64.md (<frint_pattern><mode>2): Likewise.
(l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2): Likewise.
(fix_trunc<GPF:mode><GPI:mode>2): Likewise.
(sqrt<mode>2): Likewise.
(*sqrt<mode>2): Likewise.
(abs<mode>2): Likewise.
(<optab><mode>hf2): New pattern for HF mode.
(<optab>hihf2): Likewise.
* config/aarch64/arm_neon.h: Include arm_fp16.h.
* config/aarch64/iterators.md (GPF_F16, GPI_F16, VHSDF_HSDF): New.
(w1, w2, v, s, q, Vmtype, V_cmp_result, fcvt_iesize, FCVT_IESIZE):
Support HF mode.
* config/aarch64/arm_fp16.h: New file.
(vabsh_f16, vceqzh_f16, vcgezh_f16, vcgtzh_f16, vclezh_f16, vcltzh_f16,
vcvth_f16_s16, vcvth_f16_s32, vcvth_f16_s64, vcvth_f16_u16,
vcvth_f16_u32, vcvth_f16_u64, vcvth_s16_f16, vcvth_s32_f16,
vcvth_s64_f16, vcvth_u16_f16, vcvth_u32_f16, vcvth_u64_f16,
vcvtah_s16_f16, vcvtah_s32_f16, vcvtah_s64_f16, vcvtah_u16_f16,
vcvtah_u32_f16, vcvtah_u64_f16, vcvtmh_s16_f16, vcvtmh_s32_f16,
vcvtmh_s64_f16, vcvtmh_u16_f16, vcvtmh_u32_f16, vcvtmh_u64_f16,
vcvtnh_s16_f16, vcvtnh_s32_f16, vcvtnh_s64_f16, vcvtnh_u16_f16,
vcvtnh_u32_f16, vcvtnh_u64_f16, vcvtph_s16_f16, vcvtph_s32_f16,
vcvtph_s64_f16, vcvtph_u16_f16, vcvtph_u32_f16, vcvtph_u64_f16,
vnegh_f16, vrecpeh_f16, vrecpxh_f16, vrndh_f16, vrndah_f16, vrndih_f16,
vrndmh_f16, vrndnh_f16, vrndph_f16, vrndxh_f16, vrsqrteh_f16,
vsqrth_f16): New.
gcc/
Backport from trunk r238723.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64.md (<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3):
New.
(<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3): Likewise.
(add<mode>3): Likewise.
(sub<mode>3): Likewise.
(mul<mode>3): Likewise.
(div<mode>3): Likewise.
(*div<mode>3): Likewise.
(<fmaxmin><mode>3): Extend to HF.
* config/aarch64/aarch64-simd.md (aarch64_rsqrts<mode>): Likewise.
(fabd<mode>3): Likewise.
(<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF_HSDF:mode>3): Likewise.
(<FCVT_FIXED2F:fcvt_fixed_insn><VHSDI_HSDI:mode>3): Likewise.
(aarch64_fmulx<mode>): Likewise.
(aarch64_fac<optab><mode>): Likewise.
(aarch64_frecps<mode>): Likewise.
(<FCVT_F2FIXED:fcvt_fixed_insn>hfhi3): New.
(<FCVT_FIXED2F:fcvt_fixed_insn>hihf3): Likewise.
* config/aarch64/iterators.md (VHSDF_SDF): Delete.
(VSDQ_HSDI): Support HI.
(fcvt_target, FCVT_TARGET): Likewise.
* config/aarch64/arm_fp16.h (vaddh_f16, vsubh_f16, vabdh_f16,
vcageh_f16, vcagth_f16, vcaleh_f16, vcalth_f16, vceqh_f16, vcgeh_f16,
vcgth_f16, vcleh_f16, vclth_f16, vcvth_n_f16_s16, vcvth_n_f16_s32,
vcvth_n_f16_s64, vcvth_n_f16_u16, vcvth_n_f16_u32, vcvth_n_f16_u64,
vcvth_n_s16_f16, vcvth_n_s32_f16, vcvth_n_s64_f16, vcvth_n_u16_f16,
vcvth_n_u32_f16, vcvth_n_u64_f16, vdivh_f16, vmaxh_f16, vmaxnmh_f16,
vminh_f16, vminnmh_f16, vmulh_f16, vmulxh_f16, vrecpsh_f16,
vrsqrtsh_f16): New.
gcc/
Backport from trunk r238724.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Register new builtins.
* config/aarch64/aarch64.md (fma, fnma): Support HF.
* config/aarch64/arm_fp16.h (vfmah_f16, vfmsh_f16): New.
gcc/
Backport from trunk r238725.
2016-07-25 Jiong Wang <jiong.wang@arm.com>
* config/aarch64/arm_neon.h (vfmah_lane_f16, vfmah_laneq_f16,
vfmsh_lane_f16, vfmsh_laneq_f16, vmulh_lane_f16, vmulh_laneq_f16,
vmulxh_lane_f16, vmulxh_laneq_f16): New.
Change-Id: I8118d32ccb84626ad42afc4181334258c7fc8e5b
-rw-r--r-- | gcc/config.gcc | 2 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64-arches.def | 1 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64-builtins.c | 5 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64-c.c | 5 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64-option-extensions.def | 8 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64-simd-builtins.def | 161 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64-simd.md | 366 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64.c | 24 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64.h | 11 | ||||
-rw-r--r-- | gcc/config/aarch64/aarch64.md | 172 | ||||
-rw-r--r-- | gcc/config/aarch64/arm_fp16.h | 579 | ||||
-rw-r--r-- | gcc/config/aarch64/arm_neon.h | 1282 | ||||
-rw-r--r-- | gcc/config/aarch64/iterators.md | 88 | ||||
-rw-r--r-- | gcc/doc/invoke.texi | 7 |
14 files changed, 2414 insertions, 297 deletions
diff --git a/gcc/config.gcc b/gcc/config.gcc index 1f3da546892..0cbf84be1e2 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -307,7 +307,7 @@ m32c*-*-*) ;; aarch64*-*-*) cpu_type=aarch64 - extra_headers="arm_neon.h arm_acle.h" + extra_headers="arm_fp16.h arm_neon.h arm_acle.h" c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o" diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 1e9d90b1b66..7dcf140411f 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -32,4 +32,5 @@ AARCH64_ARCH("armv8-a", generic, 8A, 8, AARCH64_FL_FOR_ARCH8) AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1) +AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2) diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 0d51bfa3ee8..1de325a0fc3 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -62,6 +62,7 @@ #define si_UP SImode #define sf_UP SFmode #define hi_UP HImode +#define hf_UP HFmode #define qi_UP QImode #define UP(X) X##_UP @@ -139,6 +140,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_unsigned }; #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_none }; +#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_poly, qualifier_poly, qualifier_poly }; #define TYPES_BINOPP (aarch64_types_binopp_qualifiers) diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index e64dc7676cc..3380ed6f2cd 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -95,6 +95,11 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) else cpp_undef (pfile, "__ARM_FP"); + aarch64_def_or_undef (TARGET_FP_F16INST, + "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", pfile); + aarch64_def_or_undef (TARGET_SIMD_F16INST, + "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", pfile); + aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile); aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile); diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index e8706d1c2e7..a10ccf2254c 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -39,8 +39,8 @@ that are required. Their order is not important. */ /* Enabling "fp" just enables "fp". - Disabling "fp" also disables "simd", "crypto". */ -AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "fp") + Disabling "fp" also disables "simd", "crypto" and "fp16". */ +AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp") /* Enabling "simd" also enables "fp". Disabling "simd" also disables "crypto". */ @@ -55,3 +55,7 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32") /* Enabling or disabling "lse" only changes "lse". */ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics") + +/* Enabling "fp16" also enables "fp". + Disabling "fp16" just disables "fp16". */ +AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16") diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index df0a7d8ae6e..bc5eda6d6dc 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -41,8 +41,8 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) - BUILTIN_VALLF (BINOP, fmulx, 0) - BUILTIN_VDQF_DF (UNOP, sqrt, 2) + BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0) + BUILTIN_VHSDF_DF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) BUILTIN_VDQ_BHSI (UNOP, clrsb, 2) @@ -234,12 +234,12 @@ BUILTIN_VALL (UNOP, reduc_plus_scal_, 10) /* Implemented by reduc_<maxmin_uns>_scal_<mode> (producing scalar). */ - BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10) - BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10) + BUILTIN_VDQIF_F16 (UNOP, reduc_smax_scal_, 10) + BUILTIN_VDQIF_F16 (UNOP, reduc_smin_scal_, 10) BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10) BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10) - BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10) - BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10) + BUILTIN_VHSDF (UNOP, reduc_smax_nan_scal_, 10) + BUILTIN_VHSDF (UNOP, reduc_smin_nan_scal_, 10) /* Implemented by <maxmin><mode>3. smax variants map to fmaxnm, @@ -248,91 +248,131 @@ BUILTIN_VDQ_BHSI (BINOP, smin, 3) BUILTIN_VDQ_BHSI (BINOP, umax, 3) BUILTIN_VDQ_BHSI (BINOP, umin, 3) - BUILTIN_VDQF (BINOP, smax_nan, 3) - BUILTIN_VDQF (BINOP, smin_nan, 3) + BUILTIN_VHSDF (BINOP, smax_nan, 3) + BUILTIN_VHSDF (BINOP, smin_nan, 3) /* Implemented by <fmaxmin><mode>3. */ - BUILTIN_VDQF (BINOP, fmax, 3) - BUILTIN_VDQF (BINOP, fmin, 3) + BUILTIN_VHSDF (BINOP, fmax, 3) + BUILTIN_VHSDF (BINOP, fmin, 3) /* Implemented by aarch64_<maxmin_uns>p<mode>. */ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0) BUILTIN_VDQ_BHSI (BINOP, sminp, 0) BUILTIN_VDQ_BHSI (BINOP, umaxp, 0) BUILTIN_VDQ_BHSI (BINOP, uminp, 0) - BUILTIN_VDQF (BINOP, smaxp, 0) - BUILTIN_VDQF (BINOP, sminp, 0) - BUILTIN_VDQF (BINOP, smax_nanp, 0) - BUILTIN_VDQF (BINOP, smin_nanp, 0) + BUILTIN_VHSDF (BINOP, smaxp, 0) + BUILTIN_VHSDF (BINOP, sminp, 0) + BUILTIN_VHSDF (BINOP, smax_nanp, 0) + BUILTIN_VHSDF (BINOP, smin_nanp, 0) /* Implemented by <frint_pattern><mode>2. */ - BUILTIN_VDQF (UNOP, btrunc, 2) - BUILTIN_VDQF (UNOP, ceil, 2) - BUILTIN_VDQF (UNOP, floor, 2) - BUILTIN_VDQF (UNOP, nearbyint, 2) - BUILTIN_VDQF (UNOP, rint, 2) - BUILTIN_VDQF (UNOP, round, 2) - BUILTIN_VDQF_DF (UNOP, frintn, 2) + BUILTIN_VHSDF (UNOP, btrunc, 2) + BUILTIN_VHSDF (UNOP, ceil, 2) + BUILTIN_VHSDF (UNOP, floor, 2) + BUILTIN_VHSDF (UNOP, nearbyint, 2) + BUILTIN_VHSDF (UNOP, rint, 2) + BUILTIN_VHSDF (UNOP, round, 2) + BUILTIN_VHSDF_DF (UNOP, frintn, 2) + + VAR1 (UNOP, btrunc, 2, hf) + VAR1 (UNOP, ceil, 2, hf) + VAR1 (UNOP, floor, 2, hf) + VAR1 (UNOP, frintn, 2, hf) + VAR1 (UNOP, nearbyint, 2, hf) + VAR1 (UNOP, rint, 2, hf) + VAR1 (UNOP, round, 2, hf) /* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2. */ + VAR1 (UNOP, lbtruncv4hf, 2, v4hi) + VAR1 (UNOP, lbtruncv8hf, 2, v8hi) VAR1 (UNOP, lbtruncv2sf, 2, v2si) VAR1 (UNOP, lbtruncv4sf, 2, v4si) VAR1 (UNOP, lbtruncv2df, 2, v2di) + VAR1 (UNOPUS, lbtruncuv4hf, 2, v4hi) + VAR1 (UNOPUS, lbtruncuv8hf, 2, v8hi) VAR1 (UNOPUS, lbtruncuv2sf, 2, v2si) VAR1 (UNOPUS, lbtruncuv4sf, 2, v4si) VAR1 (UNOPUS, lbtruncuv2df, 2, v2di) + VAR1 (UNOP, lroundv4hf, 2, v4hi) + VAR1 (UNOP, lroundv8hf, 2, v8hi) VAR1 (UNOP, lroundv2sf, 2, v2si) VAR1 (UNOP, lroundv4sf, 2, v4si) VAR1 (UNOP, lroundv2df, 2, v2di) - /* Implemented by l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2. */ + /* Implemented by l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2. */ + BUILTIN_GPI_I16 (UNOP, lroundhf, 2) VAR1 (UNOP, lroundsf, 2, si) VAR1 (UNOP, lrounddf, 2, di) + VAR1 (UNOPUS, lrounduv4hf, 2, v4hi) + VAR1 (UNOPUS, lrounduv8hf, 2, v8hi) VAR1 (UNOPUS, lrounduv2sf, 2, v2si) VAR1 (UNOPUS, lrounduv4sf, 2, v4si) VAR1 (UNOPUS, lrounduv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOPUS, lrounduhf, 2) VAR1 (UNOPUS, lroundusf, 2, si) VAR1 (UNOPUS, lroundudf, 2, di) + VAR1 (UNOP, lceilv4hf, 2, v4hi) + VAR1 (UNOP, lceilv8hf, 2, v8hi) VAR1 (UNOP, lceilv2sf, 2, v2si) VAR1 (UNOP, lceilv4sf, 2, v4si) VAR1 (UNOP, lceilv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOP, lceilhf, 2) + VAR1 (UNOPUS, lceiluv4hf, 2, v4hi) + VAR1 (UNOPUS, lceiluv8hf, 2, v8hi) VAR1 (UNOPUS, lceiluv2sf, 2, v2si) VAR1 (UNOPUS, lceiluv4sf, 2, v4si) VAR1 (UNOPUS, lceiluv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOPUS, lceiluhf, 2) VAR1 (UNOPUS, lceilusf, 2, si) VAR1 (UNOPUS, lceiludf, 2, di) + VAR1 (UNOP, lfloorv4hf, 2, v4hi) + VAR1 (UNOP, lfloorv8hf, 2, v8hi) VAR1 (UNOP, lfloorv2sf, 2, v2si) VAR1 (UNOP, lfloorv4sf, 2, v4si) VAR1 (UNOP, lfloorv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOP, lfloorhf, 2) + VAR1 (UNOPUS, lflooruv4hf, 2, v4hi) + VAR1 (UNOPUS, lflooruv8hf, 2, v8hi) VAR1 (UNOPUS, lflooruv2sf, 2, v2si) VAR1 (UNOPUS, lflooruv4sf, 2, v4si) VAR1 (UNOPUS, lflooruv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOPUS, lflooruhf, 2) VAR1 (UNOPUS, lfloorusf, 2, si) VAR1 (UNOPUS, lfloorudf, 2, di) + VAR1 (UNOP, lfrintnv4hf, 2, v4hi) + VAR1 (UNOP, lfrintnv8hf, 2, v8hi) VAR1 (UNOP, lfrintnv2sf, 2, v2si) VAR1 (UNOP, lfrintnv4sf, 2, v4si) VAR1 (UNOP, lfrintnv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOP, lfrintnhf, 2) VAR1 (UNOP, lfrintnsf, 2, si) VAR1 (UNOP, lfrintndf, 2, di) + VAR1 (UNOPUS, lfrintnuv4hf, 2, v4hi) + VAR1 (UNOPUS, lfrintnuv8hf, 2, v8hi) VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si) VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si) VAR1 (UNOPUS, lfrintnuv2df, 2, v2di) + BUILTIN_GPI_I16 (UNOPUS, lfrintnuhf, 2) VAR1 (UNOPUS, lfrintnusf, 2, si) VAR1 (UNOPUS, lfrintnudf, 2, di) /* Implemented by <optab><fcvt_target><VDQF:mode>2. */ + VAR1 (UNOP, floatv4hi, 2, v4hf) + VAR1 (UNOP, floatv8hi, 2, v8hf) VAR1 (UNOP, floatv2si, 2, v2sf) VAR1 (UNOP, floatv4si, 2, v4sf) VAR1 (UNOP, floatv2di, 2, v2df) + VAR1 (UNOP, floatunsv4hi, 2, v4hf) + VAR1 (UNOP, floatunsv8hi, 2, v8hf) VAR1 (UNOP, floatunsv2si, 2, v2sf) VAR1 (UNOP, floatunsv4si, 2, v4sf) VAR1 (UNOP, floatunsv2di, 2, v2df) @@ -352,19 +392,19 @@ /* Implemented by aarch64_frecp<FRECP:frecp_suffix><mode>. */ - BUILTIN_GPF (UNOP, frecpe, 0) - BUILTIN_GPF (BINOP, frecps, 0) - BUILTIN_GPF (UNOP, frecpx, 0) + BUILTIN_GPF_F16 (UNOP, frecpe, 0) + BUILTIN_GPF_F16 (UNOP, frecpx, 0) BUILTIN_VDQ_SI (UNOP, urecpe, 0) - BUILTIN_VDQF (UNOP, frecpe, 0) - BUILTIN_VDQF (BINOP, frecps, 0) + BUILTIN_VHSDF (UNOP, frecpe, 0) + BUILTIN_VHSDF_HSDF (BINOP, frecps, 0) /* Implemented by a mixture of abs2 patterns. Note the DImode builtin is only ever used for the int64x1_t intrinsic, there is no scalar version. */ BUILTIN_VSDQ_I_DI (UNOP, abs, 0) - BUILTIN_VDQF (UNOP, abs, 2) + BUILTIN_VHSDF (UNOP, abs, 2) + VAR1 (UNOP, abs, 2, hf) BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10) VAR1 (BINOP, float_truncate_hi_, 0, v4sf) @@ -381,7 +421,11 @@ BUILTIN_VALL_F16 (STORE1, st1, 0) /* Implemented by fma<mode>4. */ - BUILTIN_VDQF (TERNOP, fma, 4) + BUILTIN_VHSDF (TERNOP, fma, 4) + VAR1 (TERNOP, fma, 4, hf) + /* Implemented by fnma<mode>4. */ + BUILTIN_VHSDF (TERNOP, fnma, 4) + VAR1 (TERNOP, fnma, 4, hf) /* Implemented by aarch64_simd_bsl<mode>. */ BUILTIN_VDQQH (BSL_P, simd_bsl, 0) @@ -451,19 +495,62 @@ BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0) /* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3. */ - BUILTIN_VSDQ_SDI (SHIFTIMM, scvtf, 3) - BUILTIN_VSDQ_SDI (FCVTIMM_SUS, ucvtf, 3) - BUILTIN_VALLF (SHIFTIMM, fcvtzs, 3) - BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3) + BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3) + BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3) + BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3) + BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3) + VAR1 (SHIFTIMM, scvtfsi, 3, hf) + VAR1 (SHIFTIMM, scvtfdi, 3, hf) + VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf) + VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf) + BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3) + BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3) /* Implemented by aarch64_rsqrte<mode>. */ - BUILTIN_VALLF (UNOP, rsqrte, 0) + BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0) /* Implemented by aarch64_rsqrts<mode>. */ - BUILTIN_VALLF (BINOP, rsqrts, 0) + BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0) /* Implemented by fabd<mode>3. */ - BUILTIN_VALLF (BINOP, fabd, 3) + BUILTIN_VHSDF_HSDF (BINOP, fabd, 3) /* Implemented by aarch64_faddp<mode>. */ - BUILTIN_VDQF (BINOP, faddp, 0) + BUILTIN_VHSDF (BINOP, faddp, 0) + + /* Implemented by aarch64_cm<optab><mode>. */ + BUILTIN_VHSDF_HSDF (BINOP_USS, cmeq, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, cmge, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, cmgt, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, cmle, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, cmlt, 0) + + /* Implemented by neg<mode>2. */ + BUILTIN_VHSDF_HSDF (UNOP, neg, 2) + + /* Implemented by aarch64_fac<optab><mode>. */ + BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0) + + /* Implemented by sqrt<mode>2. */ + VAR1 (UNOP, sqrt, 2, hf) + + /* Implemented by <optab><mode>hf2. */ + VAR1 (UNOP, floatdi, 2, hf) + VAR1 (UNOP, floatsi, 2, hf) + VAR1 (UNOP, floathi, 2, hf) + VAR1 (UNOPUS, floatunsdi, 2, hf) + VAR1 (UNOPUS, floatunssi, 2, hf) + VAR1 (UNOPUS, floatunshi, 2, hf) + BUILTIN_GPI_I16 (UNOP, fix_trunchf, 2) + BUILTIN_GPI (UNOP, fix_truncsf, 2) + BUILTIN_GPI (UNOP, fix_truncdf, 2) + BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2) + BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2) + BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2) + + /* Implemented by <fmaxmin><mode>3. */ + VAR1 (BINOP, fmax, 3, hf) + VAR1 (BINOP, fmin, 3, hf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3f8289cf7dc..c6af9f36d76 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -351,7 +351,7 @@ operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2]))); return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]"; } - [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")] + [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")] ) (define_insn "*aarch64_mul3_elt_<vswap_width_name><mode>" @@ -379,25 +379,25 @@ (match_operand:VMUL 2 "register_operand" "w")))] "TARGET_SIMD" "<f>mul\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"; - [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")] + [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")] ) (define_insn "aarch64_rsqrte<mode>" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")] + [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w") + (unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")] UNSPEC_RSQRTE))] "TARGET_SIMD" "frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>" - [(set_attr "type" "neon_fp_rsqrte_<Vetype><q>")]) + [(set_attr "type" "neon_fp_rsqrte_<stype><q>")]) (define_insn "aarch64_rsqrts<mode>" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w")] - UNSPEC_RSQRTS))] + [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w") + (unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w") + (match_operand:VHSDF_HSDF 2 "register_operand" "w")] + UNSPEC_RSQRTS))] "TARGET_SIMD" "frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>" - [(set_attr "type" "neon_fp_rsqrts_<Vetype><q>")]) + [(set_attr "type" "neon_fp_rsqrts_<stype><q>")]) (define_expand "rsqrt<mode>2" [(set (match_operand:VALLF 0 "register_operand" "=w") @@ -475,14 +475,14 @@ ) (define_insn "fabd<mode>3" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (abs:VALLF - (minus:VALLF - (match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w"))))] + [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w") + (abs:VHSDF_HSDF + (minus:VHSDF_HSDF + (match_operand:VHSDF_HSDF 1 "register_operand" "w") + (match_operand:VHSDF_HSDF 2 "register_operand" "w"))))] "TARGET_SIMD" "fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>" - [(set_attr "type" "neon_fp_abd_<Vetype><q>")] + [(set_attr "type" "neon_fp_abd_<stype><q>")] ) (define_insn "and<mode>3" @@ -1062,10 +1062,10 @@ ;; Pairwise FP Max/Min operations. (define_insn "aarch64_<maxmin_uns>p<mode>" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - FMAXMINV))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FMAXMINV))] "TARGET_SIMD" "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" [(set_attr "type" "neon_minmax<q>")] @@ -1474,36 +1474,36 @@ ;; FP arithmetic operations. (define_insn "add<mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (plus:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fadd\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_addsub_<Vetype><q>")] + [(set_attr "type" "neon_fp_addsub_<stype><q>")] ) (define_insn "sub<mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (minus:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (minus:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fsub\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_addsub_<Vetype><q>")] + [(set_attr "type" "neon_fp_addsub_<stype><q>")] ) (define_insn "mul<mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (mult:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (mult:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fmul\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_mul_<Vetype><q>")] + [(set_attr "type" "neon_fp_mul_<stype><q>")] ) (define_expand "div<mode>3" - [(set (match_operand:VDQF 0 "register_operand") - (div:VDQF (match_operand:VDQF 1 "general_operand") - (match_operand:VDQF 2 "register_operand")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" { if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) @@ -1513,38 +1513,38 @@ }) (define_insn "*div<mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (div:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fdiv\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_div_<Vetype><q>")] + [(set_attr "type" "neon_fp_div_<stype><q>")] ) (define_insn "neg<mode>2" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (neg:VDQF (match_operand:VDQF 1 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))] "TARGET_SIMD" "fneg\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_fp_neg_<Vetype><q>")] + [(set_attr "type" "neon_fp_neg_<stype><q>")] ) (define_insn "abs<mode>2" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (abs:VDQF (match_operand:VDQF 1 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))] "TARGET_SIMD" "fabs\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_fp_abs_<Vetype><q>")] + [(set_attr "type" "neon_fp_abs_<stype><q>")] ) (define_insn "fma<mode>4" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (fma:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w") - (match_operand:VDQF 3 "register_operand" "0")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w") + (match_operand:VHSDF 3 "register_operand" "0")))] "TARGET_SIMD" "fmla\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_mla_<Vetype><q>")] + [(set_attr "type" "neon_fp_mla_<stype><q>")] ) (define_insn "*aarch64_fma4_elt<mode>" @@ -1591,7 +1591,7 @@ (match_operand:VMUL 3 "register_operand" "0")))] "TARGET_SIMD" "fmla\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]" - [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")] + [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")] ) (define_insn "*aarch64_fma4_elt_to_64v2df" @@ -1611,15 +1611,15 @@ ) (define_insn "fnma<mode>4" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (fma:VDQF - (match_operand:VDQF 1 "register_operand" "w") - (neg:VDQF - (match_operand:VDQF 2 "register_operand" "w")) - (match_operand:VDQF 3 "register_operand" "0")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (fma:VHSDF + (match_operand:VHSDF 1 "register_operand" "w") + (neg:VHSDF + (match_operand:VHSDF 2 "register_operand" "w")) + (match_operand:VHSDF 3 "register_operand" "0")))] "TARGET_SIMD" - "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_mla_<Vetype><q>")] + "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" + [(set_attr "type" "neon_fp_mla_<stype><q>")] ) (define_insn "*aarch64_fnma4_elt<mode>" @@ -1669,7 +1669,7 @@ (match_operand:VMUL 3 "register_operand" "0")))] "TARGET_SIMD" "fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]" - [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")] + [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")] ) (define_insn "*aarch64_fnma4_elt_to_64v2df" @@ -1692,24 +1692,50 @@ ;; Vector versions of the floating-point frint patterns. ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn. (define_insn "<frint_pattern><mode>2" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")] - FRINT))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")] + FRINT))] "TARGET_SIMD" "frint<frint_suffix>\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_fp_round_<Vetype><q>")] + [(set_attr "type" "neon_fp_round_<stype><q>")] ) ;; Vector versions of the fcvt standard patterns. ;; Expands to lbtrunc, lround, lceil, lfloor -(define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2" +(define_insn "l<fcvt_pattern><su_optab><VHSDF:mode><fcvt_target>2" [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w") (FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET> - [(match_operand:VDQF 1 "register_operand" "w")] + [(match_operand:VHSDF 1 "register_operand" "w")] FCVT)))] "TARGET_SIMD" "fcvt<frint_suffix><su>\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_fp_to_int_<Vetype><q>")] + [(set_attr "type" "neon_fp_to_int_<stype><q>")] +) + +;; HF Scalar variants of related SIMD instructions. +(define_insn "l<fcvt_pattern><su_optab>hfhi2" + [(set (match_operand:HI 0 "register_operand" "=w") + (FIXUORS:HI (unspec:HF [(match_operand:HF 1 "register_operand" "w")] + FCVT)))] + "TARGET_SIMD_F16INST" + "fcvt<frint_suffix><su>\t%h0, %h1" + [(set_attr "type" "neon_fp_to_int_s")] +) + +(define_insn "<optab>_trunchfhi2" + [(set (match_operand:HI 0 "register_operand" "=w") + (FIXUORS:HI (match_operand:HF 1 "register_operand" "w")))] + "TARGET_SIMD_F16INST" + "fcvtz<su>\t%h0, %h1" + [(set_attr "type" "neon_fp_to_int_s")] +) + +(define_insn "<optab>hihf2" + [(set (match_operand:HF 0 "register_operand" "=w") + (FLOATUORS:HF (match_operand:HI 1 "register_operand" "w")))] + "TARGET_SIMD_F16INST" + "<su_optab>cvtf\t%h0, %h1" + [(set_attr "type" "neon_int_to_fp_s")] ) (define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult" @@ -1732,36 +1758,36 @@ [(set_attr "type" "neon_fp_to_int_<Vetype><q>")] ) -(define_expand "<optab><VDQF:mode><fcvt_target>2" +(define_expand "<optab><VHSDF:mode><fcvt_target>2" [(set (match_operand:<FCVT_TARGET> 0 "register_operand") (FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET> - [(match_operand:VDQF 1 "register_operand")] - UNSPEC_FRINTZ)))] + [(match_operand:VHSDF 1 "register_operand")] + UNSPEC_FRINTZ)))] "TARGET_SIMD" {}) -(define_expand "<fix_trunc_optab><VDQF:mode><fcvt_target>2" +(define_expand "<fix_trunc_optab><VHSDF:mode><fcvt_target>2" [(set (match_operand:<FCVT_TARGET> 0 "register_operand") (FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET> - [(match_operand:VDQF 1 "register_operand")] - UNSPEC_FRINTZ)))] + [(match_operand:VHSDF 1 "register_operand")] + UNSPEC_FRINTZ)))] "TARGET_SIMD" {}) -(define_expand "ftrunc<VDQF:mode>2" - [(set (match_operand:VDQF 0 "register_operand") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand")] - UNSPEC_FRINTZ))] +(define_expand "ftrunc<VHSDF:mode>2" + [(set (match_operand:VHSDF 0 "register_operand") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")] + UNSPEC_FRINTZ))] "TARGET_SIMD" {}) -(define_insn "<optab><fcvt_target><VDQF:mode>2" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (FLOATUORS:VDQF +(define_insn "<optab><fcvt_target><VHSDF:mode>2" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (FLOATUORS:VHSDF (match_operand:<FCVT_TARGET> 1 "register_operand" "w")))] "TARGET_SIMD" "<su_optab>cvtf\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_int_to_fp_<Vetype><q>")] + [(set_attr "type" "neon_int_to_fp_<stype><q>")] ) ;; Conversions between vectors of floats and doubles. @@ -1783,24 +1809,26 @@ ;; Convert between fixed-point and floating-point (vector modes) -(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VDQF:mode>3" - [(set (match_operand:<VDQF:FCVT_TARGET> 0 "register_operand" "=w") - (unspec:<VDQF:FCVT_TARGET> [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:SI 2 "immediate_operand" "i")] +(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF:mode>3" + [(set (match_operand:<VHSDF:FCVT_TARGET> 0 "register_operand" "=w") + (unspec:<VHSDF:FCVT_TARGET> + [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] FCVT_F2FIXED))] "TARGET_SIMD" "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2" - [(set_attr "type" "neon_fp_to_int_<VDQF:Vetype><q>")] + [(set_attr "type" "neon_fp_to_int_<VHSDF:stype><q>")] ) -(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_SDI:mode>3" - [(set (match_operand:<VDQ_SDI:FCVT_TARGET> 0 "register_operand" "=w") - (unspec:<VDQ_SDI:FCVT_TARGET> [(match_operand:VDQ_SDI 1 "register_operand" "w") - (match_operand:SI 2 "immediate_operand" "i")] +(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3" + [(set (match_operand:<VDQ_HSDI:FCVT_TARGET> 0 "register_operand" "=w") + (unspec:<VDQ_HSDI:FCVT_TARGET> + [(match_operand:VDQ_HSDI 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] FCVT_FIXED2F))] "TARGET_SIMD" "<FCVT_FIXED2F:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2" - [(set_attr "type" "neon_int_to_fp_<VDQ_SDI:Vetype><q>")] + [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")] ) ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns @@ -1959,33 +1987,33 @@ ;; NaNs. (define_insn "<su><maxmin><mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (FMAXMIN:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (FMAXMIN:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "f<maxmin>nm\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_minmax_<Vetype><q>")] + [(set_attr "type" "neon_fp_minmax_<stype><q>")] ) (define_insn "<maxmin_uns><mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - FMAXMIN_UNS))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FMAXMIN_UNS))] "TARGET_SIMD" "<maxmin_uns_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_minmax_<Vetype><q>")] + [(set_attr "type" "neon_fp_minmax_<stype><q>")] ) ;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions (define_insn "<fmaxmin><mode>3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - FMAXMIN))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FMAXMIN))] "TARGET_SIMD" "<fmaxmin_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_minmax_<Vetype><q>")] + [(set_attr "type" "neon_fp_minmax_<stype><q>")] ) ;; 'across lanes' add. @@ -2005,13 +2033,13 @@ ) (define_insn "aarch64_faddp<mode>" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - UNSPEC_FADDV))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + UNSPEC_FADDV))] "TARGET_SIMD" "faddp\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" - [(set_attr "type" "neon_fp_reduc_add_<Vetype><q>")] + [(set_attr "type" "neon_fp_reduc_add_<stype><q>")] ) (define_insn "aarch64_reduc_plus_internal<mode>" @@ -2085,8 +2113,8 @@ ;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code. (This is FP smax/smin). (define_expand "reduc_<maxmin_uns>_scal_<mode>" [(match_operand:<VEL> 0 "register_operand") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand")] - FMAXMINV)] + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")] + FMAXMINV)] "TARGET_SIMD" { rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0)); @@ -2133,12 +2161,12 @@ ) (define_insn "aarch64_reduc_<maxmin_uns>_internal<mode>" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")] - FMAXMINV))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")] + FMAXMINV))] "TARGET_SIMD" "<maxmin_uns_op><vp>\\t%<Vetype>0, %1.<Vtype>" - [(set_attr "type" "neon_fp_reduc_minmax_<Vetype><q>")] + [(set_attr "type" "neon_fp_reduc_minmax_<stype><q>")] ) ;; aarch64_simd_bsl may compile to any of bsl/bif/bit depending on register @@ -3007,13 +3035,14 @@ ;; fmulx. (define_insn "aarch64_fmulx<mode>" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w")] - UNSPEC_FMULX))] + [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w") + (unspec:VHSDF_HSDF + [(match_operand:VHSDF_HSDF 1 "register_operand" "w") + (match_operand:VHSDF_HSDF 2 "register_operand" "w")] + UNSPEC_FMULX))] "TARGET_SIMD" "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>" - [(set_attr "type" "neon_fp_mul_<Vetype>")] + [(set_attr "type" "neon_fp_mul_<stype>")] ) ;; vmulxq_lane_f32, and vmulx_laneq_f32 @@ -3055,20 +3084,18 @@ [(set_attr "type" "neon_fp_mul_<Vetype><q>")] ) -;; vmulxq_lane_f64 +;; vmulxq_lane -(define_insn "*aarch64_mulx_elt_to_64v2df" - [(set (match_operand:V2DF 0 "register_operand" "=w") - (unspec:V2DF - [(match_operand:V2DF 1 "register_operand" "w") - (vec_duplicate:V2DF - (match_operand:DF 2 "register_operand" "w"))] +(define_insn "*aarch64_mulx_elt_from_dup<mode>" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF + [(match_operand:VHSDF 1 "register_operand" "w") + (vec_duplicate:VHSDF + (match_operand:<VEL> 2 "register_operand" "w"))] UNSPEC_FMULX))] "TARGET_SIMD" - { - return "fmulx\t%0.2d, %1.2d, %2.d[0]"; - } - [(set_attr "type" "neon_fp_mul_d_scalar_q")] + "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"; + [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")] ) ;; vmulxs_lane_f32, vmulxs_laneq_f32 @@ -4253,30 +4280,32 @@ [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w") (neg:<V_cmp_result> (COMPARISONS:<V_cmp_result> - (match_operand:VALLF 1 "register_operand" "w,w") - (match_operand:VALLF 2 "aarch64_simd_reg_or_zero" "w,YDz") + (match_operand:VHSDF_HSDF 1 "register_operand" "w,w") + (match_operand:VHSDF_HSDF 2 "aarch64_simd_reg_or_zero" "w,YDz") )))] "TARGET_SIMD" "@ fcm<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype> fcm<optab>\t%<v>0<Vmtype>, %<v>1<Vmtype>, 0" - [(set_attr "type" "neon_fp_compare_<Vetype><q>")] + [(set_attr "type" "neon_fp_compare_<stype><q>")] ) ;; fac(ge|gt) ;; Note we can also handle what would be fac(le|lt) by ;; generating fac(ge|gt). -(define_insn "*aarch64_fac<optab><mode>" +(define_insn "aarch64_fac<optab><mode>" [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w") (neg:<V_cmp_result> (FAC_COMPARISONS:<V_cmp_result> - (abs:VALLF (match_operand:VALLF 1 "register_operand" "w")) - (abs:VALLF (match_operand:VALLF 2 "register_operand" "w")) + (abs:VHSDF_HSDF + (match_operand:VHSDF_HSDF 1 "register_operand" "w")) + (abs:VHSDF_HSDF + (match_operand:VHSDF_HSDF 2 "register_operand" "w")) )))] "TARGET_SIMD" "fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>" - [(set_attr "type" "neon_fp_compare_<Vetype><q>")] + [(set_attr "type" "neon_fp_compare_<stype><q>")] ) ;; addp @@ -4305,8 +4334,8 @@ ;; sqrt (define_expand "sqrt<mode>2" - [(set (match_operand:VDQF 0 "register_operand") - (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))] "TARGET_SIMD" { if (aarch64_emit_approx_sqrt (operands[0], operands[1], false)) @@ -4314,11 +4343,11 @@ }) (define_insn "*sqrt<mode>2" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))] "TARGET_SIMD" "fsqrt\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_fp_sqrt_<Vetype><q>")] + [(set_attr "type" "neon_fp_sqrt_<stype><q>")] ) ;; Patterns for vector struct loads and stores. @@ -5176,10 +5205,10 @@ ) (define_insn "aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>" - [(set (match_operand:VALL 0 "register_operand" "=w") - (unspec:VALL [(match_operand:VALL 1 "register_operand" "w") - (match_operand:VALL 2 "register_operand" "w")] - PERMUTE))] + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w") + (match_operand:VALL_F16 2 "register_operand" "w")] + PERMUTE))] "TARGET_SIMD" "<PERMUTE:perm_insn><PERMUTE:perm_hilo>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" [(set_attr "type" "neon_permute<q>")] @@ -5187,11 +5216,11 @@ ;; Note immediate (third) operand is lane index not byte index. (define_insn "aarch64_ext<mode>" - [(set (match_operand:VALL 0 "register_operand" "=w") - (unspec:VALL [(match_operand:VALL 1 "register_operand" "w") - (match_operand:VALL 2 "register_operand" "w") - (match_operand:SI 3 "immediate_operand" "i")] - UNSPEC_EXT))] + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w") + (match_operand:VALL_F16 2 "register_operand" "w") + (match_operand:SI 3 "immediate_operand" "i")] + UNSPEC_EXT))] "TARGET_SIMD" { operands[3] = GEN_INT (INTVAL (operands[3]) @@ -5202,8 +5231,8 @@ ) (define_insn "aarch64_rev<REVERSE:rev_op><mode>" - [(set (match_operand:VALL 0 "register_operand" "=w") - (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")] + [(set (match_operand:VALL_F16 0 "register_operand" "=w") + (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")] REVERSE))] "TARGET_SIMD" "rev<REVERSE:rev_op>\\t%0.<Vtype>, %1.<Vtype>" @@ -5370,31 +5399,32 @@ ) (define_insn "aarch64_frecpe<mode>" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")] - UNSPEC_FRECPE))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")] + UNSPEC_FRECPE))] "TARGET_SIMD" "frecpe\\t%0.<Vtype>, %1.<Vtype>" - [(set_attr "type" "neon_fp_recpe_<Vetype><q>")] + [(set_attr "type" "neon_fp_recpe_<stype><q>")] ) (define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>" - [(set (match_operand:GPF 0 "register_operand" "=w") - (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")] - FRECP))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")] + FRECP))] "TARGET_SIMD" "frecp<FRECP:frecp_suffix>\\t%<s>0, %<s>1" - [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF:Vetype><GPF:q>")] + [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF_F16:stype>")] ) (define_insn "aarch64_frecps<mode>" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w")] - UNSPEC_FRECPS))] + [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w") + (unspec:VHSDF_HSDF + [(match_operand:VHSDF_HSDF 1 "register_operand" "w") + (match_operand:VHSDF_HSDF 2 "register_operand" "w")] + UNSPEC_FRECPS))] "TARGET_SIMD" "frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>" - [(set_attr "type" "neon_fp_recps_<Vetype><q>")] + [(set_attr "type" "neon_fp_recps_<stype><q>")] ) (define_insn "aarch64_urecpe<mode>" diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f2424e9e1d3..ba0d3767ff0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7567,6 +7567,10 @@ bool aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp) { machine_mode mode = GET_MODE (dst); + + if (GET_MODE_INNER (mode) == HFmode) + return false; + machine_mode mmsk = mode_for_vector (int_mode_for_mode (GET_MODE_INNER (mode)), GET_MODE_NUNITS (mode)); @@ -7682,6 +7686,10 @@ bool aarch64_emit_approx_div (rtx quo, rtx num, rtx den) { machine_mode mode = GET_MODE (quo); + + if (GET_MODE_INNER (mode) == HFmode) + return false; + bool use_approx_division_p = (flag_mlow_precision_div || (aarch64_tune_params.approx_modes->division & AARCH64_APPROX_MODE (mode))); @@ -12365,6 +12373,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d) case V4SImode: gen = gen_aarch64_trn2v4si; break; case V2SImode: gen = gen_aarch64_trn2v2si; break; case V2DImode: gen = gen_aarch64_trn2v2di; break; + case V4HFmode: gen = gen_aarch64_trn2v4hf; break; + case V8HFmode: gen = gen_aarch64_trn2v8hf; break; case V4SFmode: gen = gen_aarch64_trn2v4sf; break; case V2SFmode: gen = gen_aarch64_trn2v2sf; break; case V2DFmode: gen = gen_aarch64_trn2v2df; break; @@ -12383,6 +12393,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d) case V4SImode: gen = gen_aarch64_trn1v4si; break; case V2SImode: gen = gen_aarch64_trn1v2si; break; case V2DImode: gen = gen_aarch64_trn1v2di; break; + case V4HFmode: gen = gen_aarch64_trn1v4hf; break; + case V8HFmode: gen = gen_aarch64_trn1v8hf; break; case V4SFmode: gen = gen_aarch64_trn1v4sf; break; case V2SFmode: gen = gen_aarch64_trn1v2sf; break; case V2DFmode: gen = gen_aarch64_trn1v2df; break; @@ -12448,6 +12460,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d) case V4SImode: gen = gen_aarch64_uzp2v4si; break; case V2SImode: gen = gen_aarch64_uzp2v2si; break; case V2DImode: gen = gen_aarch64_uzp2v2di; break; + case V4HFmode: gen = gen_aarch64_uzp2v4hf; break; + case V8HFmode: gen = gen_aarch64_uzp2v8hf; break; case V4SFmode: gen = gen_aarch64_uzp2v4sf; break; case V2SFmode: gen = gen_aarch64_uzp2v2sf; break; case V2DFmode: gen = gen_aarch64_uzp2v2df; break; @@ -12466,6 +12480,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d) case V4SImode: gen = gen_aarch64_uzp1v4si; break; case V2SImode: gen = gen_aarch64_uzp1v2si; break; case V2DImode: gen = gen_aarch64_uzp1v2di; break; + case V4HFmode: gen = gen_aarch64_uzp1v4hf; break; + case V8HFmode: gen = gen_aarch64_uzp1v8hf; break; case V4SFmode: gen = gen_aarch64_uzp1v4sf; break; case V2SFmode: gen = gen_aarch64_uzp1v2sf; break; case V2DFmode: gen = gen_aarch64_uzp1v2df; break; @@ -12536,6 +12552,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d) case V4SImode: gen = gen_aarch64_zip2v4si; break; case V2SImode: gen = gen_aarch64_zip2v2si; break; case V2DImode: gen = gen_aarch64_zip2v2di; break; + case V4HFmode: gen = gen_aarch64_zip2v4hf; break; + case V8HFmode: gen = gen_aarch64_zip2v8hf; break; case V4SFmode: gen = gen_aarch64_zip2v4sf; break; case V2SFmode: gen = gen_aarch64_zip2v2sf; break; case V2DFmode: gen = gen_aarch64_zip2v2df; break; @@ -12554,6 +12572,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d) case V4SImode: gen = gen_aarch64_zip1v4si; break; case V2SImode: gen = gen_aarch64_zip1v2si; break; case V2DImode: gen = gen_aarch64_zip1v2di; break; + case V4HFmode: gen = gen_aarch64_zip1v4hf; break; + case V8HFmode: gen = gen_aarch64_zip1v8hf; break; case V4SFmode: gen = gen_aarch64_zip1v4sf; break; case V2SFmode: gen = gen_aarch64_zip1v2sf; break; case V2DFmode: gen = gen_aarch64_zip1v2df; break; @@ -12598,6 +12618,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d) case V8HImode: gen = gen_aarch64_extv8hi; break; case V2SImode: gen = gen_aarch64_extv2si; break; case V4SImode: gen = gen_aarch64_extv4si; break; + case V4HFmode: gen = gen_aarch64_extv4hf; break; + case V8HFmode: gen = gen_aarch64_extv8hf; break; case V2SFmode: gen = gen_aarch64_extv2sf; break; case V4SFmode: gen = gen_aarch64_extv4sf; break; case V2DImode: gen = gen_aarch64_extv2di; break; @@ -12673,6 +12695,8 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d) case V2SImode: gen = gen_aarch64_rev64v2si; break; case V4SFmode: gen = gen_aarch64_rev64v4sf; break; case V2SFmode: gen = gen_aarch64_rev64v2sf; break; + case V8HFmode: gen = gen_aarch64_rev64v8hf; break; + case V4HFmode: gen = gen_aarch64_rev64v4hf; break; default: return false; } diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index eb81a86e6d8..95b2c7da0e3 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -135,6 +135,9 @@ extern unsigned aarch64_architecture_version; /* ARMv8.1 architecture extensions. */ #define AARCH64_FL_LSE (1 << 4) /* Has Large System Extensions. */ #define AARCH64_FL_V8_1 (1 << 5) /* Has ARMv8.1 extensions. */ +/* ARMv8.2-A architecture extensions. */ +#define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */ +#define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */ /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -146,6 +149,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_FOR_ARCH8 (AARCH64_FL_FPSIMD) #define AARCH64_FL_FOR_ARCH8_1 \ (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1) +#define AARCH64_FL_FOR_ARCH8_2 \ + (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2) /* Macros to test ISA flags. */ @@ -155,6 +160,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_SIMD (aarch64_isa_flags & AARCH64_FL_SIMD) #define AARCH64_ISA_LSE (aarch64_isa_flags & AARCH64_FL_LSE) #define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_V8_1) +#define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2) +#define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO) @@ -165,6 +172,10 @@ extern unsigned aarch64_architecture_version; /* Atomic instructions that can be enabled through the +lse extension. */ #define TARGET_LSE (AARCH64_ISA_LSE) +/* ARMv8.2-A FP16 support that can be enabled through the +fp16 extension. */ +#define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16) +#define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16) + /* Make sure this is always defined so we don't have to check for ifdefs but rather use normal ifs. */ #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 02d71b99a5a..353278890dc 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4445,22 +4445,23 @@ ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn. (define_insn "<frint_pattern><mode>2" - [(set (match_operand:GPF 0 "register_operand" "=w") - (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")] FRINT))] "TARGET_FLOAT" "frint<frint_suffix>\\t%<s>0, %<s>1" - [(set_attr "type" "f_rint<s>")] + [(set_attr "type" "f_rint<stype>")] ) ;; frcvt floating-point round to integer and convert standard patterns. ;; Expands to lbtrunc, lceil, lfloor, lround. -(define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2" +(define_insn "l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (FIXUORS:GPI (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")] - FCVT)))] + (FIXUORS:GPI + (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")] + FCVT)))] "TARGET_FLOAT" - "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF:s>1" + "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF_F16:s>1" [(set_attr "type" "f_cvtf2i")] ) @@ -4486,23 +4487,24 @@ ;; fma - no throw (define_insn "fma<mode>4" - [(set (match_operand:GPF 0 "register_operand" "=w") - (fma:GPF (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w") - (match_operand:GPF 3 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w") + (match_operand:GPF_F16 2 "register_operand" "w") + (match_operand:GPF_F16 3 "register_operand" "w")))] "TARGET_FLOAT" "fmadd\\t%<s>0, %<s>1, %<s>2, %<s>3" - [(set_attr "type" "fmac<s>")] + [(set_attr "type" "fmac<stype>")] ) (define_insn "fnma<mode>4" - [(set (match_operand:GPF 0 "register_operand" "=w") - (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w")) - (match_operand:GPF 2 "register_operand" "w") - (match_operand:GPF 3 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (fma:GPF_F16 + (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")) + (match_operand:GPF_F16 2 "register_operand" "w") + (match_operand:GPF_F16 3 "register_operand" "w")))] "TARGET_FLOAT" "fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3" - [(set_attr "type" "fmac<s>")] + [(set_attr "type" "fmac<stype>")] ) (define_insn "fms<mode>4" @@ -4588,19 +4590,11 @@ [(set_attr "type" "f_cvt")] ) -(define_insn "fix_trunc<GPF:mode><GPI:mode>2" +(define_insn "<optab>_trunc<GPF_F16:mode><GPI:mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (fix:GPI (match_operand:GPF 1 "register_operand" "w")))] + (FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" - "fcvtzs\\t%<GPI:w>0, %<GPF:s>1" - [(set_attr "type" "f_cvtf2i")] -) - -(define_insn "fixuns_trunc<GPF:mode><GPI:mode>2" - [(set (match_operand:GPI 0 "register_operand" "=r") - (unsigned_fix:GPI (match_operand:GPF 1 "register_operand" "w")))] - "TARGET_FLOAT" - "fcvtzu\\t%<GPI:w>0, %<GPF:s>1" + "fcvtz<su>\t%<GPI:w>0, %<GPF_F16:s>1" [(set_attr "type" "f_cvtf2i")] ) @@ -4624,6 +4618,14 @@ [(set_attr "type" "f_cvti2f")] ) +(define_insn "<optab><mode>hf2" + [(set (match_operand:HF 0 "register_operand" "=w") + (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))] + "TARGET_FP_F16INST" + "<su_optab>cvtf\t%h0, %<w>1" + [(set_attr "type" "f_cvti2f")] +) + ;; Convert between fixed-point and floating-point (scalar modes) (define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3" @@ -4654,38 +4656,78 @@ (set_attr "simd" "*, yes")] ) +(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:HF 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] + FCVT_F2FIXED))] + "TARGET_FP_F16INST" + "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<GPI:w>0, %h1, #%2" + [(set_attr "type" "f_cvtf2i")] +) + +(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3" + [(set (match_operand:HF 0 "register_operand" "=w") + (unspec:HF [(match_operand:GPI 1 "register_operand" "r") + (match_operand:SI 2 "immediate_operand" "i")] + FCVT_FIXED2F))] + "TARGET_FP_F16INST" + "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %<GPI:w>1, #%2" + [(set_attr "type" "f_cvti2f")] +) + +(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf3" + [(set (match_operand:HI 0 "register_operand" "=w") + (unspec:HI [(match_operand:HF 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] + FCVT_F2FIXED))] + "TARGET_SIMD" + "<FCVT_F2FIXED:fcvt_fixed_insn>\t%h0, %h1, #%2" + [(set_attr "type" "neon_fp_to_int_s")] +) + +(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn>hi3" + [(set (match_operand:HF 0 "register_operand" "=w") + (unspec:HF [(match_operand:HI 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] + FCVT_FIXED2F))] + "TARGET_SIMD" + "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %h1, #%2" + [(set_attr "type" "neon_int_to_fp_s")] +) + ;; ------------------------------------------------------------------- ;; Floating-point arithmetic ;; ------------------------------------------------------------------- (define_insn "add<mode>3" - [(set (match_operand:GPF 0 "register_operand" "=w") - (plus:GPF - (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (plus:GPF_F16 + (match_operand:GPF_F16 1 "register_operand" "w") + (match_operand:GPF_F16 2 "register_operand" "w")))] "TARGET_FLOAT" "fadd\\t%<s>0, %<s>1, %<s>2" - [(set_attr "type" "fadd<s>")] + [(set_attr "type" "fadd<stype>")] ) (define_insn "sub<mode>3" - [(set (match_operand:GPF 0 "register_operand" "=w") - (minus:GPF - (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (minus:GPF_F16 + (match_operand:GPF_F16 1 "register_operand" "w") + (match_operand:GPF_F16 2 "register_operand" "w")))] "TARGET_FLOAT" "fsub\\t%<s>0, %<s>1, %<s>2" - [(set_attr "type" "fadd<s>")] + [(set_attr "type" "fadd<stype>")] ) (define_insn "mul<mode>3" - [(set (match_operand:GPF 0 "register_operand" "=w") - (mult:GPF - (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (mult:GPF_F16 + (match_operand:GPF_F16 1 "register_operand" "w") + (match_operand:GPF_F16 2 "register_operand" "w")))] "TARGET_FLOAT" "fmul\\t%<s>0, %<s>1, %<s>2" - [(set_attr "type" "fmul<s>")] + [(set_attr "type" "fmul<stype>")] ) (define_insn "*fnmul<mode>3" @@ -4709,9 +4751,9 @@ ) (define_expand "div<mode>3" - [(set (match_operand:GPF 0 "register_operand") - (div:GPF (match_operand:GPF 1 "general_operand") - (match_operand:GPF 2 "register_operand")))] + [(set (match_operand:GPF_F16 0 "register_operand") + (div:GPF_F16 (match_operand:GPF_F16 1 "general_operand") + (match_operand:GPF_F16 2 "register_operand")))] "TARGET_SIMD" { if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) @@ -4721,25 +4763,25 @@ }) (define_insn "*div<mode>3" - [(set (match_operand:GPF 0 "register_operand" "=w") - (div:GPF (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (div:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w") + (match_operand:GPF_F16 2 "register_operand" "w")))] "TARGET_FLOAT" "fdiv\\t%<s>0, %<s>1, %<s>2" - [(set_attr "type" "fdiv<s>")] + [(set_attr "type" "fdiv<stype>")] ) (define_insn "neg<mode>2" - [(set (match_operand:GPF 0 "register_operand" "=w") - (neg:GPF (match_operand:GPF 1 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" "fneg\\t%<s>0, %<s>1" - [(set_attr "type" "ffarith<s>")] + [(set_attr "type" "ffarith<stype>")] ) (define_expand "sqrt<mode>2" - [(set (match_operand:GPF 0 "register_operand") - (sqrt:GPF (match_operand:GPF 1 "register_operand")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" { if (aarch64_emit_approx_sqrt (operands[0], operands[1], false)) @@ -4747,19 +4789,19 @@ }) (define_insn "*sqrt<mode>2" - [(set (match_operand:GPF 0 "register_operand" "=w") - (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" "fsqrt\\t%<s>0, %<s>1" - [(set_attr "type" "fsqrt<s>")] + [(set_attr "type" "fsqrt<stype>")] ) (define_insn "abs<mode>2" - [(set (match_operand:GPF 0 "register_operand" "=w") - (abs:GPF (match_operand:GPF 1 "register_operand" "w")))] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" "fabs\\t%<s>0, %<s>1" - [(set_attr "type" "ffarith<s>")] + [(set_attr "type" "ffarith<stype>")] ) ;; Given that smax/smin do not specify the result when either input is NaN, @@ -4786,13 +4828,13 @@ ;; Scalar forms for the IEEE-754 fmax()/fmin() functions (define_insn "<fmaxmin><mode>3" - [(set (match_operand:GPF 0 "register_operand" "=w") - (unspec:GPF [(match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w")] + [(set (match_operand:GPF_F16 0 "register_operand" "=w") + (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w") + (match_operand:GPF_F16 2 "register_operand" "w")] FMAXMIN))] "TARGET_FLOAT" "<fmaxmin_op>\\t%<s>0, %<s>1, %<s>2" - [(set_attr "type" "f_minmax<s>")] + [(set_attr "type" "f_minmax<stype>")] ) ;; For copysign (x, y), we want to generate: diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h new file mode 100644 index 00000000000..4b7c2dd3bcc --- /dev/null +++ b/gcc/config/aarch64/arm_fp16.h @@ -0,0 +1,579 @@ +/* ARM FP16 scalar intrinsics include file. + + Copyright (C) 2016 Free Software Foundation, Inc. + Contributed by ARM Ltd. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _AARCH64_FP16_H_ +#define _AARCH64_FP16_H_ + +#include <stdint.h> + +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+fp16") + +typedef __fp16 float16_t; + +/* ARMv8.2-A FP16 one operand scalar intrinsics. */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vabsh_f16 (float16_t __a) +{ + return __builtin_aarch64_abshf (__a); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vceqzh_f16 (float16_t __a) +{ + return __builtin_aarch64_cmeqhf_uss (__a, 0.0f); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcgezh_f16 (float16_t __a) +{ + return __builtin_aarch64_cmgehf_uss (__a, 0.0f); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcgtzh_f16 (float16_t __a) +{ + return __builtin_aarch64_cmgthf_uss (__a, 0.0f); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vclezh_f16 (float16_t __a) +{ + return __builtin_aarch64_cmlehf_uss (__a, 0.0f); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcltzh_f16 (float16_t __a) +{ + return __builtin_aarch64_cmlthf_uss (__a, 0.0f); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_f16_s16 (int16_t __a) +{ + return __builtin_aarch64_floathihf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_f16_s32 (int32_t __a) +{ + return __builtin_aarch64_floatsihf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_f16_s64 (int64_t __a) +{ + return __builtin_aarch64_floatdihf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_f16_u16 (uint16_t __a) +{ + return __builtin_aarch64_floatunshihf_us (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_f16_u32 (uint32_t __a) +{ + return __builtin_aarch64_floatunssihf_us (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_f16_u64 (uint64_t __a) +{ + return __builtin_aarch64_floatunsdihf_us (__a); +} + +__extension__ static __inline int16_t __attribute__ ((__always_inline__)) +vcvth_s16_f16 (float16_t __a) +{ + return __builtin_aarch64_fix_trunchfhi (__a); +} + +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vcvth_s32_f16 (float16_t __a) +{ + return __builtin_aarch64_fix_trunchfsi (__a); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vcvth_s64_f16 (float16_t __a) +{ + return __builtin_aarch64_fix_trunchfdi (__a); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcvth_u16_f16 (float16_t __a) +{ + return __builtin_aarch64_fixuns_trunchfhi_us (__a); +} + +__extension__ static __inline uint32_t __attribute__ ((__always_inline__)) +vcvth_u32_f16 (float16_t __a) +{ + return __builtin_aarch64_fixuns_trunchfsi_us (__a); +} + +__extension__ static __inline uint64_t __attribute__ ((__always_inline__)) +vcvth_u64_f16 (float16_t __a) +{ + return __builtin_aarch64_fixuns_trunchfdi_us (__a); +} + +__extension__ static __inline int16_t __attribute__ ((__always_inline__)) +vcvtah_s16_f16 (float16_t __a) +{ + return __builtin_aarch64_lroundhfhi (__a); +} + +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vcvtah_s32_f16 (float16_t __a) +{ + return __builtin_aarch64_lroundhfsi (__a); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vcvtah_s64_f16 (float16_t __a) +{ + return __builtin_aarch64_lroundhfdi (__a); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcvtah_u16_f16 (float16_t __a) +{ + return __builtin_aarch64_lrounduhfhi_us (__a); +} + +__extension__ static __inline uint32_t __attribute__ ((__always_inline__)) +vcvtah_u32_f16 (float16_t __a) +{ + return __builtin_aarch64_lrounduhfsi_us (__a); +} + +__extension__ static __inline uint64_t __attribute__ ((__always_inline__)) +vcvtah_u64_f16 (float16_t __a) +{ + return __builtin_aarch64_lrounduhfdi_us (__a); +} + +__extension__ static __inline int16_t __attribute__ ((__always_inline__)) +vcvtmh_s16_f16 (float16_t __a) +{ + return __builtin_aarch64_lfloorhfhi (__a); +} + +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vcvtmh_s32_f16 (float16_t __a) +{ + return __builtin_aarch64_lfloorhfsi (__a); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vcvtmh_s64_f16 (float16_t __a) +{ + return __builtin_aarch64_lfloorhfdi (__a); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcvtmh_u16_f16 (float16_t __a) +{ + return __builtin_aarch64_lflooruhfhi_us (__a); +} + +__extension__ static __inline uint32_t __attribute__ ((__always_inline__)) +vcvtmh_u32_f16 (float16_t __a) +{ + return __builtin_aarch64_lflooruhfsi_us (__a); +} + +__extension__ static __inline uint64_t __attribute__ ((__always_inline__)) +vcvtmh_u64_f16 (float16_t __a) +{ + return __builtin_aarch64_lflooruhfdi_us (__a); +} + +__extension__ static __inline int16_t __attribute__ ((__always_inline__)) +vcvtnh_s16_f16 (float16_t __a) +{ + return __builtin_aarch64_lfrintnhfhi (__a); +} + +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vcvtnh_s32_f16 (float16_t __a) +{ + return __builtin_aarch64_lfrintnhfsi (__a); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vcvtnh_s64_f16 (float16_t __a) +{ + return __builtin_aarch64_lfrintnhfdi (__a); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcvtnh_u16_f16 (float16_t __a) +{ + return __builtin_aarch64_lfrintnuhfhi_us (__a); +} + +__extension__ static __inline uint32_t __attribute__ ((__always_inline__)) +vcvtnh_u32_f16 (float16_t __a) +{ + return __builtin_aarch64_lfrintnuhfsi_us (__a); +} + +__extension__ static __inline uint64_t __attribute__ ((__always_inline__)) +vcvtnh_u64_f16 (float16_t __a) +{ + return __builtin_aarch64_lfrintnuhfdi_us (__a); +} + +__extension__ static __inline int16_t __attribute__ ((__always_inline__)) +vcvtph_s16_f16 (float16_t __a) +{ + return __builtin_aarch64_lceilhfhi (__a); +} + +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vcvtph_s32_f16 (float16_t __a) +{ + return __builtin_aarch64_lceilhfsi (__a); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vcvtph_s64_f16 (float16_t __a) +{ + return __builtin_aarch64_lceilhfdi (__a); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcvtph_u16_f16 (float16_t __a) +{ + return __builtin_aarch64_lceiluhfhi_us (__a); +} + +__extension__ static __inline uint32_t __attribute__ ((__always_inline__)) +vcvtph_u32_f16 (float16_t __a) +{ + return __builtin_aarch64_lceiluhfsi_us (__a); +} + +__extension__ static __inline uint64_t __attribute__ ((__always_inline__)) +vcvtph_u64_f16 (float16_t __a) +{ + return __builtin_aarch64_lceiluhfdi_us (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vnegh_f16 (float16_t __a) +{ + return __builtin_aarch64_neghf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrecpeh_f16 (float16_t __a) +{ + return __builtin_aarch64_frecpehf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrecpxh_f16 (float16_t __a) +{ + return __builtin_aarch64_frecpxhf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndh_f16 (float16_t __a) +{ + return __builtin_aarch64_btrunchf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndah_f16 (float16_t __a) +{ + return __builtin_aarch64_roundhf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndih_f16 (float16_t __a) +{ + return __builtin_aarch64_nearbyinthf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndmh_f16 (float16_t __a) +{ + return __builtin_aarch64_floorhf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndnh_f16 (float16_t __a) +{ + return __builtin_aarch64_frintnhf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndph_f16 (float16_t __a) +{ + return __builtin_aarch64_ceilhf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrndxh_f16 (float16_t __a) +{ + return __builtin_aarch64_rinthf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrsqrteh_f16 (float16_t __a) +{ + return __builtin_aarch64_rsqrtehf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vsqrth_f16 (float16_t __a) +{ + return __builtin_aarch64_sqrthf (__a); +} + +/* ARMv8.2-A FP16 two operands scalar intrinsics. */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vaddh_f16 (float16_t __a, float16_t __b) +{ + return __a + __b; +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vabdh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_fabdhf (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcageh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_facgehf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcagth_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_facgthf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcaleh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_faclehf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcalth_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_faclthf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vceqh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_cmeqhf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcgeh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_cmgehf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcgth_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_cmgthf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcleh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_cmlehf_uss (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vclth_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_cmlthf_uss (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_n_f16_s16 (int16_t __a, const int __b) +{ + return __builtin_aarch64_scvtfhi (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_n_f16_s32 (int32_t __a, const int __b) +{ + return __builtin_aarch64_scvtfsihf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_n_f16_s64 (int64_t __a, const int __b) +{ + return __builtin_aarch64_scvtfdihf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_n_f16_u16 (uint16_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfhi_sus (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_n_f16_u32 (uint32_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfsihf_sus (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vcvth_n_f16_u64 (uint64_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfdihf_sus (__a, __b); +} + +__extension__ static __inline int16_t __attribute__ ((__always_inline__)) +vcvth_n_s16_f16 (float16_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzshf (__a, __b); +} + +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vcvth_n_s32_f16 (float16_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzshfsi (__a, __b); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vcvth_n_s64_f16 (float16_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzshfdi (__a, __b); +} + +__extension__ static __inline uint16_t __attribute__ ((__always_inline__)) +vcvth_n_u16_f16 (float16_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuhf_uss (__a, __b); +} + +__extension__ static __inline uint32_t __attribute__ ((__always_inline__)) +vcvth_n_u32_f16 (float16_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuhfsi_uss (__a, __b); +} + +__extension__ static __inline uint64_t __attribute__ ((__always_inline__)) +vcvth_n_u64_f16 (float16_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuhfdi_uss (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vdivh_f16 (float16_t __a, float16_t __b) +{ + return __a / __b; +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmaxh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_fmaxhf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmaxnmh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_fmaxhf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vminh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_fminhf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vminnmh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_fminhf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmulh_f16 (float16_t __a, float16_t __b) +{ + return __a * __b; +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmulxh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_fmulxhf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrecpsh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_frecpshf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vrsqrtsh_f16 (float16_t __a, float16_t __b) +{ + return __builtin_aarch64_rsqrtshf (__a, __b); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vsubh_f16 (float16_t __a, float16_t __b) +{ + return __a - __b; +} + +/* ARMv8.2-A FP16 three operands scalar intrinsics. */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vfmah_f16 (float16_t __a, float16_t __b, float16_t __c) +{ + return __builtin_aarch64_fmahf (__b, __c, __a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c) +{ + return __builtin_aarch64_fnmahf (__b, __c, __a); +} + +#pragma GCC pop_options + +#endif diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 702cad69be1..d6e510c8bc4 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -466,6 +466,8 @@ typedef struct poly16x8x4_t #define __aarch64_vdup_lane_any(__size, __q, __a, __b) \ vdup##__q##_n_##__size (__aarch64_vget_lane_any (__a, __b)) +#define __aarch64_vdup_lane_f16(__a, __b) \ + __aarch64_vdup_lane_any (f16, , __a, __b) #define __aarch64_vdup_lane_f32(__a, __b) \ __aarch64_vdup_lane_any (f32, , __a, __b) #define __aarch64_vdup_lane_f64(__a, __b) \ @@ -492,6 +494,8 @@ typedef struct poly16x8x4_t __aarch64_vdup_lane_any (u64, , __a, __b) /* __aarch64_vdup_laneq internal macros. */ +#define __aarch64_vdup_laneq_f16(__a, __b) \ + __aarch64_vdup_lane_any (f16, , __a, __b) #define __aarch64_vdup_laneq_f32(__a, __b) \ __aarch64_vdup_lane_any (f32, , __a, __b) #define __aarch64_vdup_laneq_f64(__a, __b) \ @@ -518,6 +522,8 @@ typedef struct poly16x8x4_t __aarch64_vdup_lane_any (u64, , __a, __b) /* __aarch64_vdupq_lane internal macros. */ +#define __aarch64_vdupq_lane_f16(__a, __b) \ + __aarch64_vdup_lane_any (f16, q, __a, __b) #define __aarch64_vdupq_lane_f32(__a, __b) \ __aarch64_vdup_lane_any (f32, q, __a, __b) #define __aarch64_vdupq_lane_f64(__a, __b) \ @@ -544,6 +550,8 @@ typedef struct poly16x8x4_t __aarch64_vdup_lane_any (u64, q, __a, __b) /* __aarch64_vdupq_laneq internal macros. */ +#define __aarch64_vdupq_laneq_f16(__a, __b) \ + __aarch64_vdup_lane_any (f16, q, __a, __b) #define __aarch64_vdupq_laneq_f32(__a, __b) \ __aarch64_vdup_lane_any (f32, q, __a, __b) #define __aarch64_vdupq_laneq_f64(__a, __b) \ @@ -10369,6 +10377,12 @@ vaddvq_f64 (float64x2_t __a) /* vbsl */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vbsl_f16 (uint16x4_t __a, float16x4_t __b, float16x4_t __c) +{ + return __builtin_aarch64_simd_bslv4hf_suss (__a, __b, __c); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vbsl_f32 (uint32x2_t __a, float32x2_t __b, float32x2_t __c) { @@ -10444,6 +10458,12 @@ vbsl_u64 (uint64x1_t __a, uint64x1_t __b, uint64x1_t __c) {__builtin_aarch64_simd_bsldi_uuuu (__a[0], __b[0], __c[0])}; } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vbslq_f16 (uint16x8_t __a, float16x8_t __b, float16x8_t __c) +{ + return __builtin_aarch64_simd_bslv8hf_suss (__a, __b, __c); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vbslq_f32 (uint32x4_t __a, float32x4_t __b, float32x4_t __c) { @@ -13007,6 +13027,12 @@ vcvtpq_u64_f64 (float64x2_t __a) /* vdup_n */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vdup_n_f16 (float16_t __a) +{ + return (float16x4_t) {__a, __a, __a, __a}; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vdup_n_f32 (float32_t __a) { @@ -13081,6 +13107,12 @@ vdup_n_u64 (uint64_t __a) /* vdupq_n */ +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vdupq_n_f16 (float16_t __a) +{ + return (float16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a}; +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vdupq_n_f32 (float32_t __a) { @@ -13158,6 +13190,12 @@ vdupq_n_u64 (uint64_t __a) /* vdup_lane */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vdup_lane_f16 (float16x4_t __a, const int __b) +{ + return __aarch64_vdup_lane_f16 (__a, __b); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vdup_lane_f32 (float32x2_t __a, const int __b) { @@ -13232,6 +13270,12 @@ vdup_lane_u64 (uint64x1_t __a, const int __b) /* vdup_laneq */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vdup_laneq_f16 (float16x8_t __a, const int __b) +{ + return __aarch64_vdup_laneq_f16 (__a, __b); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vdup_laneq_f32 (float32x4_t __a, const int __b) { @@ -13305,6 +13349,13 @@ vdup_laneq_u64 (uint64x2_t __a, const int __b) } /* vdupq_lane */ + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vdupq_lane_f16 (float16x4_t __a, const int __b) +{ + return __aarch64_vdupq_lane_f16 (__a, __b); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vdupq_lane_f32 (float32x2_t __a, const int __b) { @@ -13378,6 +13429,13 @@ vdupq_lane_u64 (uint64x1_t __a, const int __b) } /* vdupq_laneq */ + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vdupq_laneq_f16 (float16x8_t __a, const int __b) +{ + return __aarch64_vdupq_laneq_f16 (__a, __b); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vdupq_laneq_f32 (float32x4_t __a, const int __b) { @@ -13470,6 +13528,13 @@ vdupb_lane_u8 (uint8x8_t __a, const int __b) } /* vduph_lane */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vduph_lane_f16 (float16x4_t __a, const int __b) +{ + return __aarch64_vget_lane_any (__a, __b); +} + __extension__ static __inline poly16_t __attribute__ ((__always_inline__)) vduph_lane_p16 (poly16x4_t __a, const int __b) { @@ -13489,6 +13554,7 @@ vduph_lane_u16 (uint16x4_t __a, const int __b) } /* vdups_lane */ + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vdups_lane_f32 (float32x2_t __a, const int __b) { @@ -13549,6 +13615,13 @@ vdupb_laneq_u8 (uint8x16_t __a, const int __b) } /* vduph_laneq */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vduph_laneq_f16 (float16x8_t __a, const int __b) +{ + return __aarch64_vget_lane_any (__a, __b); +} + __extension__ static __inline poly16_t __attribute__ ((__always_inline__)) vduph_laneq_p16 (poly16x8_t __a, const int __b) { @@ -13568,6 +13641,7 @@ vduph_laneq_u16 (uint16x8_t __a, const int __b) } /* vdups_laneq */ + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vdups_laneq_f32 (float32x4_t __a, const int __b) { @@ -13607,6 +13681,19 @@ vdupd_laneq_u64 (uint64x2_t __a, const int __b) /* vext */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vext_f16 (float16x4_t __a, float16x4_t __b, __const int __c) +{ + __AARCH64_LANE_CHECK (__a, __c); +#ifdef __AARCH64EB__ + return __builtin_shuffle (__b, __a, + (uint16x4_t) {4 - __c, 5 - __c, 6 - __c, 7 - __c}); +#else + return __builtin_shuffle (__a, __b, + (uint16x4_t) {__c, __c + 1, __c + 2, __c + 3}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vext_f32 (float32x2_t __a, float32x2_t __b, __const int __c) { @@ -13738,6 +13825,22 @@ vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c) return __a; } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vextq_f16 (float16x8_t __a, float16x8_t __b, __const int __c) +{ + __AARCH64_LANE_CHECK (__a, __c); +#ifdef __AARCH64EB__ + return __builtin_shuffle (__b, __a, + (uint16x8_t) {8 - __c, 9 - __c, 10 - __c, 11 - __c, + 12 - __c, 13 - __c, 14 - __c, + 15 - __c}); +#else + return __builtin_shuffle (__a, __b, + (uint16x8_t) {__c, __c + 1, __c + 2, __c + 3, + __c + 4, __c + 5, __c + 6, __c + 7}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vextq_f32 (float32x4_t __a, float32x4_t __b, __const int __c) { @@ -14373,8 +14476,7 @@ vld1q_u64 (const uint64_t *a) __extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) vld1_dup_f16 (const float16_t* __a) { - float16_t __f = *__a; - return (float16x4_t) { __f, __f, __f, __f }; + return vdup_n_f16 (*__a); } __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) @@ -14454,8 +14556,7 @@ vld1_dup_u64 (const uint64_t* __a) __extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) vld1q_dup_f16 (const float16_t* __a) { - float16_t __f = *__a; - return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f }; + return vdupq_n_f16 (*__a); } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) @@ -18058,6 +18159,12 @@ vmlsq_laneq_u32 (uint32x4_t __a, uint32x4_t __b, /* vmov_n_ */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmov_n_f16 (float16_t __a) +{ + return vdup_n_f16 (__a); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmov_n_f32 (float32_t __a) { @@ -18130,6 +18237,12 @@ vmov_n_u64 (uint64_t __a) return (uint64x1_t) {__a}; } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmovq_n_f16 (float16_t __a) +{ + return vdupq_n_f16 (__a); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vmovq_n_f32 (float32_t __a) { @@ -20887,6 +21000,12 @@ vrev32q_u16 (uint16x8_t a) return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 }); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrev64_f16 (float16x4_t __a) +{ + return __builtin_shuffle (__a, (uint16x4_t) { 3, 2, 1, 0 }); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vrev64_f32 (float32x2_t a) { @@ -20941,6 +21060,12 @@ vrev64_u32 (uint32x2_t a) return __builtin_shuffle (a, (uint32x2_t) { 1, 0 }); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrev64q_f16 (float16x8_t __a) +{ + return __builtin_shuffle (__a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 }); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vrev64q_f32 (float32x4_t a) { @@ -23893,6 +24018,16 @@ vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx) /* vtrn */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vtrn1_f16 (float16x4_t __a, float16x4_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3}); +#else + return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vtrn1_f32 (float32x2_t __a, float32x2_t __b) { @@ -23983,6 +24118,16 @@ vtrn1_u32 (uint32x2_t __a, uint32x2_t __b) #endif } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vtrn1q_f16 (float16x8_t __a, float16x8_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7}); +#else + return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vtrn1q_f32 (float32x4_t __a, float32x4_t __b) { @@ -24109,6 +24254,16 @@ vtrn1q_u64 (uint64x2_t __a, uint64x2_t __b) #endif } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vtrn2_f16 (float16x4_t __a, float16x4_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2}); +#else + return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vtrn2_f32 (float32x2_t __a, float32x2_t __b) { @@ -24199,6 +24354,16 @@ vtrn2_u32 (uint32x2_t __a, uint32x2_t __b) #endif } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vtrn2q_f16 (float16x8_t __a, float16x8_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6}); +#else + return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vtrn2q_f32 (float32x4_t __a, float32x4_t __b) { @@ -24325,6 +24490,12 @@ vtrn2q_u64 (uint64x2_t __a, uint64x2_t __b) #endif } +__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__)) +vtrn_f16 (float16x4_t __a, float16x4_t __b) +{ + return (float16x4x2_t) {vtrn1_f16 (__a, __b), vtrn2_f16 (__a, __b)}; +} + __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__)) vtrn_f32 (float32x2_t a, float32x2_t b) { @@ -24379,6 +24550,12 @@ vtrn_u32 (uint32x2_t a, uint32x2_t b) return (uint32x2x2_t) {vtrn1_u32 (a, b), vtrn2_u32 (a, b)}; } +__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__)) +vtrnq_f16 (float16x8_t __a, float16x8_t __b) +{ + return (float16x8x2_t) {vtrn1q_f16 (__a, __b), vtrn2q_f16 (__a, __b)}; +} + __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__)) vtrnq_f32 (float32x4_t a, float32x4_t b) { @@ -24627,6 +24804,7 @@ vuqaddd_s64 (int64_t __a, uint64_t __b) } #define __INTERLEAVE_LIST(op) \ + __DEFINTERLEAVE (op, float16x4x2_t, float16x4_t, f16,) \ __DEFINTERLEAVE (op, float32x2x2_t, float32x2_t, f32,) \ __DEFINTERLEAVE (op, poly8x8x2_t, poly8x8_t, p8,) \ __DEFINTERLEAVE (op, poly16x4x2_t, poly16x4_t, p16,) \ @@ -24636,6 +24814,7 @@ vuqaddd_s64 (int64_t __a, uint64_t __b) __DEFINTERLEAVE (op, uint8x8x2_t, uint8x8_t, u8,) \ __DEFINTERLEAVE (op, uint16x4x2_t, uint16x4_t, u16,) \ __DEFINTERLEAVE (op, uint32x2x2_t, uint32x2_t, u32,) \ + __DEFINTERLEAVE (op, float16x8x2_t, float16x8_t, f16, q) \ __DEFINTERLEAVE (op, float32x4x2_t, float32x4_t, f32, q) \ __DEFINTERLEAVE (op, poly8x16x2_t, poly8x16_t, p8, q) \ __DEFINTERLEAVE (op, poly16x8x2_t, poly16x8_t, p16, q) \ @@ -24648,6 +24827,16 @@ vuqaddd_s64 (int64_t __a, uint64_t __b) /* vuzp */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vuzp1_f16 (float16x4_t __a, float16x4_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3}); +#else + return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vuzp1_f32 (float32x2_t __a, float32x2_t __b) { @@ -24738,6 +24927,16 @@ vuzp1_u32 (uint32x2_t __a, uint32x2_t __b) #endif } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vuzp1q_f16 (float16x8_t __a, float16x8_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7}); +#else + return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vuzp1q_f32 (float32x4_t __a, float32x4_t __b) { @@ -24864,6 +25063,16 @@ vuzp1q_u64 (uint64x2_t __a, uint64x2_t __b) #endif } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vuzp2_f16 (float16x4_t __a, float16x4_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2}); +#else + return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vuzp2_f32 (float32x2_t __a, float32x2_t __b) { @@ -24954,6 +25163,16 @@ vuzp2_u32 (uint32x2_t __a, uint32x2_t __b) #endif } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vuzp2q_f16 (float16x8_t __a, float16x8_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6}); +#else + return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vuzp2q_f32 (float32x4_t __a, float32x4_t __b) { @@ -25084,6 +25303,16 @@ __INTERLEAVE_LIST (uzp) /* vzip */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vzip1_f16 (float16x4_t __a, float16x4_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3}); +#else + return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vzip1_f32 (float32x2_t __a, float32x2_t __b) { @@ -25174,6 +25403,18 @@ vzip1_u32 (uint32x2_t __a, uint32x2_t __b) #endif } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vzip1q_f16 (float16x8_t __a, float16x8_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, + (uint16x8_t) {12, 4, 13, 5, 14, 6, 15, 7}); +#else + return __builtin_shuffle (__a, __b, + (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vzip1q_f32 (float32x4_t __a, float32x4_t __b) { @@ -25303,6 +25544,16 @@ vzip1q_u64 (uint64x2_t __a, uint64x2_t __b) #endif } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vzip2_f16 (float16x4_t __a, float16x4_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1}); +#else + return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7}); +#endif +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vzip2_f32 (float32x2_t __a, float32x2_t __b) { @@ -25393,6 +25644,18 @@ vzip2_u32 (uint32x2_t __a, uint32x2_t __b) #endif } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vzip2q_f16 (float16x8_t __a, float16x8_t __b) +{ +#ifdef __AARCH64EB__ + return __builtin_shuffle (__a, __b, + (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3}); +#else + return __builtin_shuffle (__a, __b, + (uint16x8_t) {4, 12, 5, 13, 6, 14, 7, 15}); +#endif +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vzip2q_f32 (float32x4_t __a, float32x4_t __b) { @@ -25529,9 +25792,1015 @@ __INTERLEAVE_LIST (zip) /* End of optimal implementations in approved order. */ +#pragma GCC pop_options + +/* ARMv8.2-A FP16 intrinsics. */ + +#include "arm_fp16.h" + +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+fp16") + +/* ARMv8.2-A FP16 one operand vector intrinsics. */ + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vabs_f16 (float16x4_t __a) +{ + return __builtin_aarch64_absv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vabsq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_absv8hf (__a); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vceqz_f16 (float16x4_t __a) +{ + return __builtin_aarch64_cmeqv4hf_uss (__a, vdup_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vceqzq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_cmeqv8hf_uss (__a, vdupq_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcgez_f16 (float16x4_t __a) +{ + return __builtin_aarch64_cmgev4hf_uss (__a, vdup_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcgezq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_cmgev8hf_uss (__a, vdupq_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcgtz_f16 (float16x4_t __a) +{ + return __builtin_aarch64_cmgtv4hf_uss (__a, vdup_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcgtzq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_cmgtv8hf_uss (__a, vdupq_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vclez_f16 (float16x4_t __a) +{ + return __builtin_aarch64_cmlev4hf_uss (__a, vdup_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vclezq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_cmlev8hf_uss (__a, vdupq_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcltz_f16 (float16x4_t __a) +{ + return __builtin_aarch64_cmltv4hf_uss (__a, vdup_n_f16 (0.0f)); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcltzq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_cmltv8hf_uss (__a, vdupq_n_f16 (0.0f)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_f16_s16 (int16x4_t __a) +{ + return __builtin_aarch64_floatv4hiv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvtq_f16_s16 (int16x8_t __a) +{ + return __builtin_aarch64_floatv8hiv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_f16_u16 (uint16x4_t __a) +{ + return __builtin_aarch64_floatunsv4hiv4hf ((int16x4_t) __a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvtq_f16_u16 (uint16x8_t __a) +{ + return __builtin_aarch64_floatunsv8hiv8hf ((int16x8_t) __a); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvt_s16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lbtruncv4hfv4hi (__a); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtq_s16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lbtruncv8hfv8hi (__a); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvt_u16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lbtruncuv4hfv4hi_us (__a); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtq_u16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lbtruncuv8hfv8hi_us (__a); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvta_s16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lroundv4hfv4hi (__a); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtaq_s16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lroundv8hfv8hi (__a); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvta_u16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lrounduv4hfv4hi_us (__a); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtaq_u16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lrounduv8hfv8hi_us (__a); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvtm_s16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lfloorv4hfv4hi (__a); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtmq_s16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lfloorv8hfv8hi (__a); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvtm_u16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lflooruv4hfv4hi_us (__a); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtmq_u16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lflooruv8hfv8hi_us (__a); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvtn_s16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lfrintnv4hfv4hi (__a); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtnq_s16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lfrintnv8hfv8hi (__a); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvtn_u16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lfrintnuv4hfv4hi_us (__a); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtnq_u16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lfrintnuv8hfv8hi_us (__a); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvtp_s16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lceilv4hfv4hi (__a); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtpq_s16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lceilv8hfv8hi (__a); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvtp_u16_f16 (float16x4_t __a) +{ + return __builtin_aarch64_lceiluv4hfv4hi_us (__a); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtpq_u16_f16 (float16x8_t __a) +{ + return __builtin_aarch64_lceiluv8hfv8hi_us (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vneg_f16 (float16x4_t __a) +{ + return -__a; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vnegq_f16 (float16x8_t __a) +{ + return -__a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrecpe_f16 (float16x4_t __a) +{ + return __builtin_aarch64_frecpev4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrecpeq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_frecpev8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrnd_f16 (float16x4_t __a) +{ + return __builtin_aarch64_btruncv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_btruncv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrnda_f16 (float16x4_t __a) +{ + return __builtin_aarch64_roundv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndaq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_roundv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrndi_f16 (float16x4_t __a) +{ + return __builtin_aarch64_nearbyintv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndiq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_nearbyintv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrndm_f16 (float16x4_t __a) +{ + return __builtin_aarch64_floorv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndmq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_floorv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrndn_f16 (float16x4_t __a) +{ + return __builtin_aarch64_frintnv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndnq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_frintnv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrndp_f16 (float16x4_t __a) +{ + return __builtin_aarch64_ceilv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndpq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_ceilv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrndx_f16 (float16x4_t __a) +{ + return __builtin_aarch64_rintv4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrndxq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_rintv8hf (__a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrsqrte_f16 (float16x4_t a) +{ + return __builtin_aarch64_rsqrtev4hf (a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrsqrteq_f16 (float16x8_t a) +{ + return __builtin_aarch64_rsqrtev8hf (a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vsqrt_f16 (float16x4_t a) +{ + return __builtin_aarch64_sqrtv4hf (a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vsqrtq_f16 (float16x8_t a) +{ + return __builtin_aarch64_sqrtv8hf (a); +} + +/* ARMv8.2-A FP16 two operands vector intrinsics. */ + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vadd_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a + __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vaddq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a + __b; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vabd_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_fabdv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vabdq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_fabdv8hf (a, b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcage_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_facgev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcageq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_facgev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcagt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_facgtv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcagtq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_facgtv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcale_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_faclev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcaleq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_faclev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcalt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_facltv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcaltq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_facltv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vceq_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmeqv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vceqq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmeqv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcge_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmgev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcgeq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmgev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcgt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmgtv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcgtq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmgtv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcle_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmlev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcleq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmlev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vclt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmltv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcltq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmltv8hf_uss (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_n_f16_s16 (int16x4_t __a, const int __b) +{ + return __builtin_aarch64_scvtfv4hi (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_f16_s16 (int16x8_t __a, const int __b) +{ + return __builtin_aarch64_scvtfv8hi (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_n_f16_u16 (uint16x4_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfv4hi_sus (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_f16_u16 (uint16x8_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfv8hi_sus (__a, __b); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvt_n_s16_f16 (float16x4_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzsv4hf (__a, __b); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_s16_f16 (float16x8_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzsv8hf (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvt_n_u16_f16 (float16x4_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_u16_f16 (float16x8_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuv8hf_uss (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vdiv_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a / __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vdivq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a / __b; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmax_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_smax_nanv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmaxq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_smax_nanv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmaxnm_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_fmaxv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmaxnmq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_fmaxv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmin_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_smin_nanv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vminq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_smin_nanv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vminnm_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_fminv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vminnmq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_fminv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmul_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a * __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a * __b; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmulx_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_fmulxv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulxq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_fmulxv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpadd_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_faddpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpaddq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_faddpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpmax_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_smax_nanpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpmaxq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_smax_nanpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpmaxnm_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_smaxpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpmaxnmq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_smaxpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpmin_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_smin_nanpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpminq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_smin_nanpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpminnm_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_sminpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpminnmq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_sminpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrecps_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_frecpsv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrecpsq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_frecpsv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrsqrts_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_rsqrtsv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrsqrtsq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_rsqrtsv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vsub_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a - __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vsubq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a - __b; +} + +/* ARMv8.2-A FP16 three operands vector intrinsics. */ + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c) +{ + return __builtin_aarch64_fmav4hf (__b, __c, __a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c) +{ + return __builtin_aarch64_fmav8hf (__b, __c, __a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c) +{ + return __builtin_aarch64_fnmav4hf (__b, __c, __a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c) +{ + return __builtin_aarch64_fnmav8hf (__b, __c, __a); +} + +/* ARMv8.2-A FP16 lane vector intrinsics. */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vfmah_lane_f16 (float16_t __a, float16_t __b, + float16x4_t __c, const int __lane) +{ + return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane)); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vfmah_laneq_f16 (float16_t __a, float16_t __b, + float16x8_t __c, const int __lane) +{ + return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfma_lane_f16 (float16x4_t __a, float16x4_t __b, + float16x4_t __c, const int __lane) +{ + return vfma_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmaq_lane_f16 (float16x8_t __a, float16x8_t __b, + float16x4_t __c, const int __lane) +{ + return vfmaq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfma_laneq_f16 (float16x4_t __a, float16x4_t __b, + float16x8_t __c, const int __lane) +{ + return vfma_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmaq_laneq_f16 (float16x8_t __a, float16x8_t __b, + float16x8_t __c, const int __lane) +{ + return vfmaq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfma_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c) +{ + return vfma_f16 (__a, __b, vdup_n_f16 (__c)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmaq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c) +{ + return vfmaq_f16 (__a, __b, vdupq_n_f16 (__c)); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vfmsh_lane_f16 (float16_t __a, float16_t __b, + float16x4_t __c, const int __lane) +{ + return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane)); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vfmsh_laneq_f16 (float16_t __a, float16_t __b, + float16x8_t __c, const int __lane) +{ + return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfms_lane_f16 (float16x4_t __a, float16x4_t __b, + float16x4_t __c, const int __lane) +{ + return vfms_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmsq_lane_f16 (float16x8_t __a, float16x8_t __b, + float16x4_t __c, const int __lane) +{ + return vfmsq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfms_laneq_f16 (float16x4_t __a, float16x4_t __b, + float16x8_t __c, const int __lane) +{ + return vfms_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmsq_laneq_f16 (float16x8_t __a, float16x8_t __b, + float16x8_t __c, const int __lane) +{ + return vfmsq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfms_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c) +{ + return vfms_f16 (__a, __b, vdup_n_f16 (__c)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmsq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c) +{ + return vfmsq_f16 (__a, __b, vdupq_n_f16 (__c)); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmulh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane) +{ + return __a * __aarch64_vget_lane_any (__b, __lane); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane) +{ + return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane))); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane) +{ + return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane))); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmulh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane) +{ + return __a * __aarch64_vget_lane_any (__b, __lane); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmul_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane) +{ + return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane))); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane) +{ + return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane))); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmul_n_f16 (float16x4_t __a, float16_t __b) +{ + return vmul_lane_f16 (__a, vdup_n_f16 (__b), 0); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulq_n_f16 (float16x8_t __a, float16_t __b) +{ + return vmulq_laneq_f16 (__a, vdupq_n_f16 (__b), 0); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmulxh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane) +{ + return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmulx_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane) +{ + return vmulx_f16 (__a, __aarch64_vdup_lane_f16 (__b, __lane)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulxq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane) +{ + return vmulxq_f16 (__a, __aarch64_vdupq_lane_f16 (__b, __lane)); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmulxh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane) +{ + return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmulx_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane) +{ + return vmulx_f16 (__a, __aarch64_vdup_laneq_f16 (__b, __lane)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulxq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane) +{ + return vmulxq_f16 (__a, __aarch64_vdupq_laneq_f16 (__b, __lane)); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmulx_n_f16 (float16x4_t __a, float16_t __b) +{ + return vmulx_f16 (__a, vdup_n_f16 (__b)); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulxq_n_f16 (float16x8_t __a, float16_t __b) +{ + return vmulxq_f16 (__a, vdupq_n_f16 (__b)); +} + +/* ARMv8.2-A FP16 reduction vector intrinsics. */ + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmaxv_f16 (float16x4_t __a) +{ + return __builtin_aarch64_reduc_smax_nan_scal_v4hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmaxvq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_reduc_smax_nan_scal_v8hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vminv_f16 (float16x4_t __a) +{ + return __builtin_aarch64_reduc_smin_nan_scal_v4hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vminvq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_reduc_smin_nan_scal_v8hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmaxnmv_f16 (float16x4_t __a) +{ + return __builtin_aarch64_reduc_smax_scal_v4hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vmaxnmvq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_reduc_smax_scal_v8hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vminnmv_f16 (float16x4_t __a) +{ + return __builtin_aarch64_reduc_smin_scal_v4hf (__a); +} + +__extension__ static __inline float16_t __attribute__ ((__always_inline__)) +vminnmvq_f16 (float16x8_t __a) +{ + return __builtin_aarch64_reduc_smin_scal_v8hf (__a); +} + +#pragma GCC pop_options + #undef __aarch64_vget_lane_any #undef __aarch64_vdup_lane_any +#undef __aarch64_vdup_lane_f16 #undef __aarch64_vdup_lane_f32 #undef __aarch64_vdup_lane_f64 #undef __aarch64_vdup_lane_p8 @@ -25544,6 +26813,7 @@ __INTERLEAVE_LIST (zip) #undef __aarch64_vdup_lane_u16 #undef __aarch64_vdup_lane_u32 #undef __aarch64_vdup_lane_u64 +#undef __aarch64_vdup_laneq_f16 #undef __aarch64_vdup_laneq_f32 #undef __aarch64_vdup_laneq_f64 #undef __aarch64_vdup_laneq_p8 @@ -25556,6 +26826,7 @@ __INTERLEAVE_LIST (zip) #undef __aarch64_vdup_laneq_u16 #undef __aarch64_vdup_laneq_u32 #undef __aarch64_vdup_laneq_u64 +#undef __aarch64_vdupq_lane_f16 #undef __aarch64_vdupq_lane_f32 #undef __aarch64_vdupq_lane_f64 #undef __aarch64_vdupq_lane_p8 @@ -25568,6 +26839,7 @@ __INTERLEAVE_LIST (zip) #undef __aarch64_vdupq_lane_u16 #undef __aarch64_vdupq_lane_u32 #undef __aarch64_vdupq_lane_u64 +#undef __aarch64_vdupq_laneq_f16 #undef __aarch64_vdupq_laneq_f32 #undef __aarch64_vdupq_laneq_f64 #undef __aarch64_vdupq_laneq_p8 @@ -25581,6 +26853,4 @@ __INTERLEAVE_LIST (zip) #undef __aarch64_vdupq_laneq_u32 #undef __aarch64_vdupq_laneq_u64 -#pragma GCC pop_options - #endif diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index ef48ffda6f9..5e8b0ad9cee 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -26,6 +26,9 @@ ;; Iterator for General Purpose Integer registers (32- and 64-bit modes) (define_mode_iterator GPI [SI DI]) +;; Iterator for HI, SI, DI, some instructions can only work on these modes. +(define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI]) + ;; Iterator for QI and HI modes (define_mode_iterator SHORT [QI HI]) @@ -38,6 +41,9 @@ ;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes) (define_mode_iterator GPF [SF DF]) +;; Iterator for all scalar floating point modes (HF, SF, DF) +(define_mode_iterator GPF_F16 [(HF "AARCH64_ISA_F16") SF DF]) + ;; Iterator for all scalar floating point modes (HF, SF, DF and TF) (define_mode_iterator GPF_TF_F16 [HF SF DF TF]) @@ -88,11 +94,22 @@ ;; Vector Float modes suitable for moving, loading and storing. (define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF]) -;; Vector Float modes, barring HF modes. +;; Vector Float modes. (define_mode_iterator VDQF [V2SF V4SF V2DF]) +(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST") + (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) ;; Vector Float modes, and DF. (define_mode_iterator VDQF_DF [V2SF V4SF V2DF DF]) +(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST") + (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF DF]) +(define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST") + (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF + (HF "TARGET_SIMD_F16INST") + SF DF]) ;; Vector single Float modes. (define_mode_iterator VDQSF [V2SF V4SF]) @@ -150,6 +167,8 @@ ;; Vector modes except double int. (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF]) +(define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI + V4HF V8HF V2SF V4SF V2DF]) ;; Vector modes for S type. (define_mode_iterator VDQ_SI [V2SI V4SI]) @@ -157,9 +176,21 @@ ;; Vector modes for S and D (define_mode_iterator VDQ_SDI [V2SI V4SI V2DI]) +;; Vector modes for H, S and D +(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST") + (V8HI "TARGET_SIMD_F16INST") + V2SI V4SI V2DI]) + ;; Scalar and Vector modes for S and D (define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI]) +;; Scalar and Vector modes for S and D, Vector modes for H. +(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST") + (V8HI "TARGET_SIMD_F16INST") + V2SI V4SI V2DI + (HI "TARGET_SIMD_F16INST") + SI DI]) + ;; Vector modes for Q and H types. (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI]) @@ -199,7 +230,10 @@ (define_mode_iterator DX [DI DF]) ;; Modes available for <f>mul lane operations. -(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI V2SF V4SF V2DF]) +(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI + (V4HF "TARGET_SIMD_F16INST") + (V8HF "TARGET_SIMD_F16INST") + V2SF V4SF V2DF]) ;; Modes available for <f>mul lane operations changing lane count. (define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF]) @@ -348,8 +382,8 @@ (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")]) ;; For inequal width int to float conversion -(define_mode_attr w1 [(SF "w") (DF "x")]) -(define_mode_attr w2 [(SF "x") (DF "w")]) +(define_mode_attr w1 [(HF "w") (SF "w") (DF "x")]) +(define_mode_attr w2 [(HF "x") (SF "x") (DF "w")]) (define_mode_attr short_mask [(HI "65535") (QI "255")]) @@ -361,12 +395,13 @@ ;; For scalar usage of vector/FP registers (define_mode_attr v [(QI "b") (HI "h") (SI "s") (DI "d") - (SF "s") (DF "d") + (HF "h") (SF "s") (DF "d") (V8QI "") (V16QI "") (V4HI "") (V8HI "") (V2SI "") (V4SI "") (V2DI "") (V2SF "") - (V4SF "") (V2DF "")]) + (V4SF "") (V4HF "") + (V8HF "") (V2DF "")]) ;; For scalar usage of vector/FP registers, narrowing (define_mode_attr vn2 [(QI "") (HI "b") (SI "h") (DI "s") @@ -391,7 +426,7 @@ (define_mode_attr vas [(DI "") (SI ".2s")]) ;; Map a floating point mode to the appropriate register name prefix -(define_mode_attr s [(SF "s") (DF "d")]) +(define_mode_attr s [(HF "h") (SF "s") (DF "d")]) ;; Give the length suffix letter for a sign- or zero-extension. (define_mode_attr size [(QI "b") (HI "h") (SI "w")]) @@ -427,8 +462,8 @@ (V4SF ".4s") (V2DF ".2d") (DI "") (SI "") (HI "") (QI "") - (TI "") (SF "") - (DF "")]) + (TI "") (HF "") + (SF "") (DF "")]) ;; Register suffix narrowed modes for VQN. (define_mode_attr Vmntype [(V8HI ".8b") (V4SI ".4h") @@ -443,10 +478,21 @@ (V2DI "d") (V4HF "h") (V8HF "h") (V2SF "s") (V4SF "s") (V2DF "d") + (HF "h") (SF "s") (DF "d") (QI "b") (HI "h") (SI "s") (DI "d")]) +;; Vetype is used everywhere in scheduling type and assembly output, +;; sometimes they are not the same, for example HF modes on some +;; instructions. stype is defined to represent scheduling type +;; more accurately. +(define_mode_attr stype [(V8QI "b") (V16QI "b") (V4HI "s") (V8HI "s") + (V2SI "s") (V4SI "s") (V2DI "d") (V4HF "s") + (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d") + (HF "s") (SF "s") (DF "d") (QI "b") (HI "s") + (SI "s") (DI "d")]) + ;; Mode-to-bitwise operation type mapping. (define_mode_attr Vbtype [(V8QI "8b") (V16QI "16b") (V4HI "8b") (V8HI "16b") @@ -604,7 +650,7 @@ (V4HF "V4HI") (V8HF "V8HI") (V2SF "V2SI") (V4SF "V4SI") (V2DF "V2DI") (DF "DI") - (SF "SI")]) + (SF "SI") (HF "HI")]) ;; Lower case mode of results of comparison operations. (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi") @@ -656,15 +702,19 @@ (define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si") (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf") - (SF "si") (DF "di") (SI "sf") (DI "df")]) + (SF "si") (DF "di") (SI "sf") (DI "df") + (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf") + (V8HI "v8hf") (HF "hi") (HI "hf")]) (define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI") (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF") - (SF "SI") (DF "DI") (SI "SF") (DI "DF")]) + (SF "SI") (DF "DI") (SI "SF") (DI "DF") + (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF") + (V8HI "V8HF") (HF "HI") (HI "HF")]) ;; for the inequal width integer to fp conversions -(define_mode_attr fcvt_iesize [(SF "di") (DF "si")]) -(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")]) +(define_mode_attr fcvt_iesize [(HF "di") (SF "di") (DF "si")]) +(define_mode_attr FCVT_IESIZE [(HF "DI") (SF "DI") (DF "SI")]) (define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI") (V4HI "V8HI") (V8HI "V4HI") @@ -687,6 +737,7 @@ ;; the 'x' constraint. All other modes may use the 'w' constraint. (define_mode_attr h_con [(V2SI "w") (V4SI "w") (V4HI "x") (V8HI "x") + (V4HF "w") (V8HF "w") (V2SF "w") (V4SF "w") (V2DF "w") (DF "w")]) @@ -695,6 +746,7 @@ (V4HI "") (V8HI "") (V2SI "") (V4SI "") (DI "") (V2DI "") + (V4HF "f") (V8HF "f") (V2SF "f") (V4SF "f") (V2DF "f") (DF "f")]) @@ -703,6 +755,7 @@ (V4HI "") (V8HI "") (V2SI "") (V4SI "") (DI "") (V2DI "") + (V4HF "_fp") (V8HF "_fp") (V2SF "_fp") (V4SF "_fp") (V2DF "_fp") (DF "_fp") (SF "_fp")]) @@ -715,13 +768,14 @@ (V4HF "") (V8HF "_q") (V2SF "") (V4SF "_q") (V2DF "_q") - (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")]) + (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")]) (define_mode_attr vp [(V8QI "v") (V16QI "v") (V4HI "v") (V8HI "v") (V2SI "p") (V4SI "v") - (V2DI "p") (V2DF "p") - (V2SF "p") (V4SF "v")]) + (V2DI "p") (V2DF "p") + (V2SF "p") (V4SF "v") + (V4HF "v") (V8HF "v")]) (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")]) (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")]) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index ce614b453dc..a30231d7c03 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12957,7 +12957,10 @@ more feature modifiers. This option has the form @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}. The permissible values for @var{arch} are @samp{armv8-a}, -@samp{armv8.1-a} or @var{native}. +@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}. + +The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler +support for the ARMv8.2-A architecture extensions. The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler support for the ARMv8.1 architecture extension. In particular, it @@ -13064,6 +13067,8 @@ instructions. This is on by default for all possible values for options @item lse Enable Large System Extension instructions. This is on by default for @option{-march=armv8.1-a}. +@item fp16 +Enable FP16 extension. This also enables floating-point instructions. @end table |