mirror of
https://github.com/opelly27/Stockfish.git
synced 2026-05-20 15:37:47 +00:00
Add support for ARM dot product instructions
The sdot instruction computes (and accumulates) a signed dot product, which is quite handy for Stockfish's NNUE code. The instruction is optional for Armv8.2 and Armv8.3, and mandatory for Armv8.4 and above. The commit adds a new 'arm-dotprod' architecture with enabled dot product support. It also enables dot product support for the existing 'apple-silicon' architecture, which is at least Armv8.5. The following local speed test was performed on an Apple M1 with ARCH=apple-silicon. I had to remove CPU pinning from the benchmark script. However, the results were still consistent: Checking both binaries against themselves reported a speedup of +0.0000 and +0.0005, respectively. ``` Result of 100 runs ================== base (...ish.037ef3e1) = 1917997 +/- 7152 test (...fish.dotprod) = 2159682 +/- 9066 diff = +241684 +/- 2923 speedup = +0.1260 P(speedup > 0) = 1.0000 CPU: 10 x arm Hyperthreading: off ``` Fixes #4193 closes https://github.com/official-stockfish/Stockfish/pull/4400 No functional change
This commit is contained in:
committed by
Joost VandeVondele
parent
037ef3e18d
commit
b4ad3a3c4b
@@ -346,6 +346,19 @@ namespace Stockfish::Simd {
|
||||
|
||||
#endif
|
||||
|
||||
#if defined (USE_NEON_DOTPROD)
|
||||
|
||||
[[maybe_unused]] static void dotprod_m128_add_dpbusd_epi32x2(
|
||||
int32x4_t& acc,
|
||||
int8x16_t a0, int8x16_t b0,
|
||||
int8x16_t a1, int8x16_t b1) {
|
||||
|
||||
acc = vdotq_s32(acc, a0, b0);
|
||||
acc = vdotq_s32(acc, a1, b1);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
#if defined (USE_NEON)
|
||||
|
||||
[[maybe_unused]] static int neon_m128_reduce_add_epi32(int32x4_t s) {
|
||||
|
||||
Reference in New Issue
Block a user