Commit Graph

73 Commits

Author SHA1 Message Date
Shawn Xu 5cf6f99177 Remove some incorrectly marked const qualifiers
closes https://github.com/official-stockfish/Stockfish/pull/5744

No functional change
2025-01-06 00:43:49 +01:00
MinetaS 2680c9c799 Small speedup in incremental accumulator updates
Instead of updating at most two accumulators, update all accumluators
during incremental updates. Tests have shown that this change yields a
small speedup of at least 0.5%, and up to 1% with shorter TC.

Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 54368 W: 14179 L: 13842 D: 26347
Ptnml(0-2): 173, 6122, 14262, 6449, 178
https://tests.stockfishchess.org/tests/view/66db038a9de3e7f9b33d1ad9

Passed 5+0.05:
LLR: 2.98 (-2.94,2.94) <0.00,2.00>
Total: 55040 W: 14682 L: 14322 D: 26036
Ptnml(0-2): 303, 6364, 13856, 6664, 333
https://tests.stockfishchess.org/tests/view/66dbc325dc53972b68218ba7

Passed non-regression LTC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 57390 W: 14555 L: 14376 D: 28459
Ptnml(0-2): 37, 5876, 16683, 6069, 30
https://tests.stockfishchess.org/tests/view/66dbc30adc53972b68218ba5

closes https://github.com/official-stockfish/Stockfish/pull/5576

No functional change
2024-09-09 18:02:32 +02:00
Shawn Xu bc80ece6c7 Improve Comments for Pairwise Multiplication Optimization
closes https://github.com/official-stockfish/Stockfish/pull/5524

no functional change
2024-08-20 20:47:46 +02:00
Stéphane Nicolet 7e72b37e4c Clean up comments in code
- Capitalize comments
- Reformat multi-lines comments to equalize the widths of the lines
- Try to keep the width of comments around 85 characters
- Remove periods at the end of single-line comments

closes https://github.com/official-stockfish/Stockfish/pull/5469

No functional change
2024-07-11 07:29:33 +02:00
cj5716 c6a1e7fd42 Optimise pairwise multiplication
This speedup was first inspired by a comment by @AndyGrant on my recent
PR "If mullo_epi16 would preserve the signedness, then this could be
used to remove 50% of the max operations during the halfkp-pairwise
mat-mul relu deal."

That got me thinking, because although mullo_epi16 did not preserve the
signedness, mulhi_epi16 did, and so we could shift left and then use
mulhi_epi16, instead of shifting right after the mullo.

However, due to some issues with shifting into the sign bit, the FT
weights and biases had to be multiplied by 2 for the optimisation to
work.

Speedup on "Arch=x86-64-bmi2 COMP=clang", courtesy of @Torom
Result of 50 runs
base (...es/stockfish) =     962946  +/- 1202
test (...ise-max-less) =     979696  +/- 1084
diff                   =     +16750  +/- 1794

speedup        = +0.0174
P(speedup > 0) =  1.0000

CPU: 4 x Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Hyperthreading: on

Also a speedup on "COMP=gcc", courtesy of Torom once again
Result of 50 runs
base (...tockfish_gcc) =     966033  +/- 1574
test (...max-less_gcc) =     983319  +/- 1513
diff                   =     +17286  +/- 2515

speedup        = +0.0179
P(speedup > 0) =  1.0000

CPU: 4 x Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Hyperthreading: on

Passed STC:
LLR: 2.96 (-2.94,2.94) <0.00,2.00>
Total: 67712 W: 17715 L: 17358 D: 32639
Ptnml(0-2): 225, 7472, 18140, 7759, 260
https://tests.stockfishchess.org/tests/view/664c1d75830eb9f886616906

closes https://github.com/official-stockfish/Stockfish/pull/5282

No functional change
2024-05-23 21:37:46 +02:00
xoto10 2682c2127d Use 5% less time on first move
Stockfish appears to take too much time on the first move of a game and
then not enough on moves 2,3,4... Probably caused by most of the factors
that increase time usually applying on the first move.

Attempts to give more time to the subsequent moves have not worked so
far, but this change to simply reduce first move time by 5% worked.

STC 10+0.1 :
LLR: 2.96 (-2.94,2.94) <0.00,2.00>
Total: 78496 W: 20516 L: 20135 D: 37845
Ptnml(0-2): 340, 8859, 20456, 9266, 327
https://tests.stockfishchess.org/tests/view/663d47bf507ebe1c0e9200ba

LTC 60+0.6 :
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 94872 W: 24179 L: 23751 D: 46942
Ptnml(0-2): 61, 9743, 27405, 10161, 66
https://tests.stockfishchess.org/tests/view/663e779cbb28828150dd9089

closes https://github.com/official-stockfish/Stockfish/pull/5235

Bench: 1876282
2024-05-15 16:09:30 +02:00
mstembera e608eab8dd Optimize update_accumulator_refresh_cache()
STC https://tests.stockfishchess.org/tests/view/664105df26ac5f9b286d30e6
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 178528 W: 46235 L: 45750 D: 86543
Ptnml(0-2): 505, 17792, 52142, 18363, 462

Combo of two yellow speedups
https://tests.stockfishchess.org/tests/view/6640abf9d163897c63214f5c
LLR: -2.93 (-2.94,2.94) <0.00,2.00>
Total: 355744 W: 91714 L: 91470 D: 172560
Ptnml(0-2): 913, 36233, 103384, 36381, 961

https://tests.stockfishchess.org/tests/view/6628ce073fe04ce4cefc739c
LLR: -2.93 (-2.94,2.94) <0.00,2.00>
Total: 627040 W: 162001 L: 161339 D: 303700
Ptnml(0-2): 2268, 72379, 163532, 73105, 2236

closes https://github.com/official-stockfish/Stockfish/pull/5239

No functional change
2024-05-13 07:32:32 +02:00
cj5716 61f12a4c38 Simplify accumulator refreshes
Passed Non-Regression STC:
https://tests.stockfishchess.org/tests/view/6631f5d5d01fb9ac9bcdc7d0
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 57472 W: 14979 L: 14784 D: 27709
Ptnml(0-2): 185, 6486, 15192, 6695, 178

closes https://github.com/official-stockfish/Stockfish/pull/5207

No functional change
2024-05-05 15:11:37 +02:00
cj5716 8ee9905d8b Remove PSQT-only mode
Passed STC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 94208 W: 24270 L: 24112 D: 45826
Ptnml(0-2): 286, 11186, 24009, 11330, 293
https://tests.stockfishchess.org/tests/view/6635ddd773559a8aa8582826

Passed LTC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 114960 W: 29107 L: 28982 D: 56871
Ptnml(0-2): 37, 12683, 31924, 12790, 46
https://tests.stockfishchess.org/tests/view/663604a973559a8aa85881ed

closes #5214

Bench 1653939
2024-05-05 12:36:20 +02:00
mstembera be142337d8 Accumulator cache bugfix and cleanup
STC:
https://tests.stockfishchess.org/tests/view/663068913a05f1bf7a511dc2
LLR: 2.98 (-2.94,2.94) <-1.75,0.25>
Total: 70304 W: 18211 L: 18026 D: 34067
Ptnml(0-2): 232, 7966, 18582, 8129, 243

1) Fixes a bug introduced in
   https://github.com/official-stockfish/Stockfish/pull/5194. Only one
   psqtOnly flag was used for two perspectives which was causing
   wrong entries to be cleared and marked.
2) The finny caches should be cleared like histories and not at the
   start of every search.

closes https://github.com/official-stockfish/Stockfish/pull/5203

No functional change
2024-05-01 14:17:32 +02:00
cj5716 6a9b8a0c7b Optimise NNUE Accumulator updates
Passed STC:
https://tests.stockfishchess.org/tests/view/662e3c6a5e9274400985a741
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 86176 W: 22284 L: 21905 D: 41987
Ptnml(0-2): 254, 9572, 23051, 9963, 248

closes https://github.com/official-stockfish/Stockfish/pull/5202

No functional change
2024-05-01 14:10:57 +02:00
mstembera a129c0695b Combine remove and add in update_accumulator_refresh_cache()
Combine remove and add in update_accumulator_refresh_cache().
Move remove before add to match other parts of the code.

STC:
https://tests.stockfishchess.org/tests/view/662d96dc6115ff6764c7f4ca
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 364032 W: 94421 L: 93624 D: 175987
Ptnml(0-2): 1261, 41983, 94811, 42620, 1341

closes https://github.com/official-stockfish/Stockfish/pull/5194

Bench: 1836777
2024-04-28 21:35:48 +02:00
mstembera 940a3a7383 Cache small net w/ psqtOnly support
Caching the small net in the same way as the big net allows them to
share the same code path and completely removes
update_accumulator_refresh().

STC:
https://tests.stockfishchess.org/tests/view/662bfb5ed46f72253dcfed85
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 151712 W: 39252 L: 39158 D: 73302
Ptnml(0-2): 565, 17474, 39683, 17570, 564

closes https://github.com/official-stockfish/Stockfish/pull/5194

Bench: 1836777
2024-04-28 21:30:19 +02:00
gab8192 49ef4c935a Implement accumulator refresh table
For each thread persist an accumulator cache for the network, where each
cache contains multiple entries for each of the possible king squares.
When the accumulator needs to be refreshed, the cached entry is used to more
efficiently update the accumulator, instead of rebuilding it from scratch.
This idea, was first described by Luecx (author of Koivisto) and
is commonly referred to as "Finny Tables".

When the accumulator needs to be refreshed, instead of filling it with
biases and adding every piece from scratch, we...

1. Take the `AccumulatorRefreshEntry` associated with the new king bucket
2. Calculate the features to activate and deactivate (from differences
   between bitboards in the entry and bitboards of the actual position)
3. Apply the updates on the refresh entry
4. Copy the content of the refresh entry accumulator to the accumulator
   we were refreshing
5. Copy the bitboards from the position to the refresh entry, to match
   the newly updated accumulator

Results at STC:
https://tests.stockfishchess.org/tests/view/662301573fe04ce4cefc1386
(first version)
https://tests.stockfishchess.org/tests/view/6627fa063fe04ce4cefc6560
(final)

Non-Regression between first and final:
https://tests.stockfishchess.org/tests/view/662801e33fe04ce4cefc660a

STC SMP:
https://tests.stockfishchess.org/tests/view/662808133fe04ce4cefc667c

closes https://github.com/official-stockfish/Stockfish/pull/5183

No functional change
2024-04-24 18:38:20 +02:00
Gahtan Nahdi d0e72c19fa fix clang compiler warning for avx512 build
Initialize variable in constexpr function to get rid of clang compiler warning for avx512 build.

closes https://github.com/official-stockfish/Stockfish/pull/5176

Non-functional change
2024-04-21 14:38:16 +02:00
mstembera 94484db6e8 Avoid permuting inputs during transform()
Avoid permuting inputs during transform() and instead do it once at load time.
Affects AVX2 and newer Intel architectures only.

https://tests.stockfishchess.org/tests/view/661306613eb00c8ccc0033c7
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 108480 W: 28319 L: 27898 D: 52263
Ptnml(0-2): 436, 12259, 28438, 12662, 445

speedups measured such as e.g.

```
Result of 100 runs
==================
base (./stockfish.master       ) =    1241128  +/- 3757
test (./stockfish.patch        ) =    1247713  +/- 3689
diff                             =      +6585  +/- 2583

speedup        = +0.0053
P(speedup > 0) =  1.0000
```

closes https://github.com/official-stockfish/Stockfish/pull/5160

No functional change
2024-04-11 22:38:38 +02:00
mstembera 5001d49f42 Update nnue_feature_transformer.h
Unroll update_accumulator_refresh to process two
active indices simultaneously.

The compiler might not unroll effectively because
the number of active indices isn't known at
compile time.

STC https://tests.stockfishchess.org/tests/view/65faa8850ec64f0526c4fca9
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 130464 W: 33882 L: 33431 D: 63151
Ptnml(0-2): 539, 14591, 34501, 15082, 519

closes https://github.com/official-stockfish/Stockfish/pull/5125

No functional change
2024-03-26 18:06:49 +01:00
mstembera 7831131591 Only evaluate the PSQT part of the small net for large evals.
Thanks to Viren6 for suggesting to set complexity to 0.

STC https://tests.stockfishchess.org/tests/view/65d7d6709b2da0226a5a203f
LLR: 2.92 (-2.94,2.94) <0.00,2.00>
Total: 328384 W: 85316 L: 84554 D: 158514
Ptnml(0-2): 1414, 39076, 82486, 39766, 1450

LTC https://tests.stockfishchess.org/tests/view/65dce6d290f639b028a54d2e
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 165162 W: 41918 L: 41330 D: 81914
Ptnml(0-2): 102, 18332, 45124, 18922, 101

closes https://github.com/official-stockfish/Stockfish/pull/5083

bench: 1504003
2024-03-03 15:29:58 +01:00
FauziAkram 59691d46a1 Assorted trivial cleanups
Renaming doubleExtensions variable to multiExtensions, since now we have also triple extensions.

Some extra cleanups.

Recent tests used to measure the elo worth:
https://tests.stockfishchess.org/tests/view/659fd0c379aa8af82b96abc3
https://tests.stockfishchess.org/tests/view/65a8f3da79aa8af82b9751e3
https://tests.stockfishchess.org/tests/view/65b51824c865510db0272740
https://tests.stockfishchess.org/tests/view/65b58fbfc865510db0272f5b

closes https://github.com/official-stockfish/Stockfish/pull/5032

No functional change
2024-02-09 19:06:24 +01:00
Linmiao Xu 584d9efedc Dual NNUE with L1-128 smallnet
Credit goes to @mstembera for:
- writing the code enabling dual NNUE:
  https://github.com/official-stockfish/Stockfish/pull/4898
- the idea of trying L1-128 trained exclusively on high simple eval
  positions

The L1-128 smallnet is:
- epoch 399 of a single-stage training from scratch
- trained only on positions from filtered data with high material
  difference
  - defined by abs(simple_eval) > 1000

```yaml
experiment-name: 128--S1-only-hse-v2

training-dataset:
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-1k.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-1k.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack

  # T80 2022
  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-1k.binpack

  # T80 2023
  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-1k.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-1k.binpack

start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
https://github.com/official-stockfish/nnue-pytorch/pull/259

Data filtered for high simple eval positions with:
https://github.com/linrock/nnue-data/blob/32d6a68/filter_high_simple_eval_plain.py
https://github.com/linrock/Stockfish/blob/61dbfe/src/tools/transform.cpp#L626-L655

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch399.nnue : -318.1 +/- 2.1

Passed STC:
https://tests.stockfishchess.org/tests/view/6574cb9d95ea6ba1fcd49e3b
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 62432 W: 15875 L: 15521 D: 31036
Ptnml(0-2): 177, 7331, 15872, 7633, 203

Passed LTC:
https://tests.stockfishchess.org/tests/view/6575da2d4d789acf40aaac6e
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 64830 W: 16118 L: 15738 D: 32974
Ptnml(0-2): 43, 7129, 17697, 7497, 49

closes https://github.com/official-stockfish/Stockfish/pulls

Bench: 1330050

Co-Authored-By: mstembera <5421953+mstembera@users.noreply.github.com>
2024-01-07 21:15:52 +01:00
Disservin 444f03ee95 Update copyright year
closes https://github.com/official-stockfish/Stockfish/pull/4954

No functional change
2024-01-04 15:47:10 +01:00
FauziAkram 833a2e2bc0 Cleanup comments
Tests used to derive some Elo worth comments:
https://tests.stockfishchess.org/tests/view/656a7f4e136acbc573555a31
https://tests.stockfishchess.org/tests/view/6585fb455457644dc984620f

closes https://github.com/official-stockfish/Stockfish/pull/4945

No functional change
2023-12-31 19:54:27 +01:00
Joost VandeVondele ec02714b62 Cleanup comments and some code reorg.
passed STC:
https://tests.stockfishchess.org/tests/view/6536dc7dcc309ae83955b04d
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 58048 W: 14693 L: 14501 D: 28854
Ptnml(0-2): 200, 6399, 15595, 6669, 161

closes https://github.com/official-stockfish/Stockfish/pull/4846

No functional change
2023-10-24 17:43:05 +02:00
Disservin 2d0237db3f add clang-format
This introduces clang-format to enforce a consistent code style for Stockfish.

Having a documented and consistent style across the code will make contributing easier
for new developers, and will make larger changes to the codebase easier to make.

To facilitate formatting, this PR includes a Makefile target (`make format`) to format the code,
this requires clang-format (version 17 currently) to be installed locally.

Installing clang-format is straightforward on most OS and distros
(e.g. with https://apt.llvm.org/, brew install clang-format, etc), as this is part of quite commonly
used suite of tools and compilers (llvm / clang).

Additionally, a CI action is present that will verify if the code requires formatting,
and comment on the PR as needed. Initially, correct formatting is not required, it will be
done by maintainers as part of the merge or in later commits, but obviously this is encouraged.

fixes https://github.com/official-stockfish/Stockfish/issues/3608
closes https://github.com/official-stockfish/Stockfish/pull/4790

Co-Authored-By: Joost VandeVondele <Joost.VandeVondele@gmail.com>
2023-10-22 16:06:27 +02:00
mstembera d3d0c69dc1 Remove outdated Tile naming.
cleanup variable naming after  #4816

closes #4833

No functional change
2023-10-21 10:28:55 +02:00
mstembera c17a657b04 Optimize the most common update accumalator cases w/o tiling
In the most common case where we only update a single state
it's faster to not use temporary accumulation registers and tiling.
(Also includes a couple of small cleanups.)

passed STC
https://tests.stockfishchess.org/tests/view/651918e3cff46e538ee0023b
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 34944 W: 8989 L: 8687 D: 17268
Ptnml(0-2): 88, 3743, 9512, 4037, 92

A simpler version
https://tests.stockfishchess.org/tests/view/65190dfacff46e538ee00155
also passed but this version is stronger still
https://tests.stockfishchess.org/tests/view/6519b95fcff46e538ee00fa2

closes https://github.com/official-stockfish/Stockfish/pull/4816

No functional change
2023-10-08 07:42:39 +02:00
mstembera 8a912951de Remove handcrafted MMX code
too small a benefit to maintain this old target

closes https://github.com/official-stockfish/Stockfish/pull/4804

No functional change
2023-10-08 07:37:01 +02:00
mstembera 95fe2b9a9d Reduce SIMD register count from 32 to 16
in the case of avx512 and vnni512 archs.

Up to 17% speedup, depending on the compiler, e.g.

```
AMD pro 7840u (zen4 phoenix apu 4nm)
bash bench_parallel.sh ./stockfish_avx512_gcc13 ./stockfish_avx512_pr_gcc13 20 10
sf_base =  1077737 +/-   8446 (95%)
sf_test =  1264268 +/-   8543 (95%)
diff    =   186531 +/-   4280 (95%)
speedup =  17.308% +/- 0.397% (95%)
```

Prior to this patch, it appears gcc spills registers.

closes https://github.com/official-stockfish/Stockfish/pull/4796

No functional change
2023-09-22 19:15:34 +02:00
mstembera 97f706ecc1 Sparse impl of affine_transform_non_ssse3()
deal with the general case

About a 8.6% speedup (for general arch)

Results for 200 tests for each version:

            Base      Test      Diff
    Mean    141741    153998    -12257
    StDev   2990      3042      3742

p-value: 0.999
speedup: 0.086

closes https://github.com/official-stockfish/Stockfish/pull/4786

No functional change
2023-09-22 19:03:47 +02:00
Disservin 3c0e86a91e Cleanup includes
Reorder a few includes, include "position.h" where it was previously missing
and apply include-what-you-use suggestions. Also make the order of the includes
consistent, in the following way:

1. Related header (for .cpp files)
2. A blank line
3. C/C++ headers
4. A blank line
5. All other header files

closes https://github.com/official-stockfish/Stockfish/pull/4763
fixes https://github.com/official-stockfish/Stockfish/issues/4707

No functional change
2023-09-03 08:24:51 +02:00
maxim a46087ee30 Compressed network parameters
Implemented LEB128 (de)compression for the feature transformer.
Reduces embedded network size from 70 MiB to 39 Mib.

The new nn-78bacfcee510.nnue corresponds to the master net compressed.

closes https://github.com/official-stockfish/Stockfish/pull/4617

No functional change
2023-06-19 21:37:23 +02:00
pb00067 f0556dcbe3 Small cleanups
remove some unneeded assignments, typos, incorrect comments, add authors entry.

closes https://github.com/official-stockfish/Stockfish/pull/4417

no functional change
2023-03-14 08:38:02 +01:00
Sebastian Buchwald 564456a6a8 Unify type alias declarations
The commit unifies the declaration of type aliases by replacing all
typedefs with corresponding using statements.

closing https://github.com/official-stockfish/Stockfish/pull/4412

No functional change
2023-02-27 08:29:47 +01:00
Sebastian Buchwald 29b5ad5dea Fix typo in method name
closes https://github.com/official-stockfish/Stockfish/pull/4404

No functional change
2023-02-24 20:12:53 +01:00
Joost VandeVondele 08385527dd Introduce a function to compute NNUE accumulator
This patch introduces `hint_common_parent_position()` to signal that potentially several child nodes will require an NNUE eval. By populating explicitly the accumulator, these subsequent evaluations can be performed more efficiently.

This was based on the observation that calculating the evaluation in an excluded move position yielded a significant Elo gain, even though the evaluation itself was already available (work by pb00067).

Sopel wrote the code to perform just the accumulator update. This PR is based on cleaned up code that

passed STC:
https://tests.stockfishchess.org/tests/view/63f62f9be74a12625bcd4aa0
 LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 110368 W: 29607 L: 29167 D: 51594
Ptnml(0-2): 41, 10551, 33572, 10967, 53

and in an the earlier (equivalent) version

passed STC:
https://tests.stockfishchess.org/tests/view/63f3c3fee74a12625bcce2a6
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 47552 W: 12786 L: 12467 D: 22299
Ptnml(0-2): 120, 5107, 12997, 5438, 114

passed LTC:
https://tests.stockfishchess.org/tests/view/63f45cc2e74a12625bccfa63
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 110368 W: 29607 L: 29167 D: 51594
Ptnml(0-2): 41, 10551, 33572, 10967, 53

closes https://github.com/official-stockfish/Stockfish/pull/4402

Bench: 3726250
2023-02-23 13:25:35 +01:00
Sebastian Buchwald b60f9cc451 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/4315

No functional change
2023-01-02 19:07:38 +01:00
mstembera 93f71ecfe1 Optimize make_index() using templates and lookup tables.
https://tests.stockfishchess.org/tests/view/634517e54bc7650f07542f99
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 642672 W: 171819 L: 170658 D: 300195
Ptnml(0-2): 2278, 68077, 179416, 69336, 2229

this also introduces `-flto-partition=one` as suggested by MinetaS (Syine Mineta)
to avoid linking errors due to LTO on 32 bit mingw. This change was tested in isolation as well

https://tests.stockfishchess.org/tests/view/634aacf84bc7650f0755188b
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 119352 W: 31986 L: 31862 D: 55504
Ptnml(0-2): 439, 12624, 33400, 12800, 413

closes https://github.com/official-stockfish/Stockfish/pull/4199

No functional change
2022-10-16 11:42:19 +02:00
Giacomo Lorenzetti f7d1491b3d Assorted small cleanups
closes https://github.com/official-stockfish/Stockfish/pull/3973

No functional change
2022-05-29 18:42:48 +02:00
Ben Chaney 270a0e737f Generalize the feature transform to use vec_t macros
This commit generalizes the feature transform to use vec_t macros
that are architecture defined instead of using a seperate code path for each one.

It should make some old architectures (MMX, including improvements by Fanael) faster
and make further such improvements easier in the future.

Includes some corrections to CI for mingw.

closes https://github.com/official-stockfish/Stockfish/pull/3955
closes https://github.com/official-stockfish/Stockfish/pull/3928

No functional change
2022-03-02 23:39:08 +01:00
mstembera 5f781d366e Clean up and simplify some nnue code.
Remove some unnecessary code and it's execution during inference. Also the change on line 49 in nnue_architecture.h results in a more efficient SIMD code path through ClippedReLU::propagate().

passed STC:
https://tests.stockfishchess.org/tests/view/6217d3bfda649bba32ef25d5
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 12056 W: 3281 L: 3092 D: 5683
Ptnml(0-2): 55, 1213, 3312, 1384, 64

passed STC SMP:
https://tests.stockfishchess.org/tests/view/6217f344da649bba32ef295e
LLR: 2.94 (-2.94,2.94) <-2.25,0.25>
Total: 27376 W: 7295 L: 7137 D: 12944
Ptnml(0-2): 52, 2859, 7715, 3003, 59

closes https://github.com/official-stockfish/Stockfish/pull/3944

No functional change

bench: 6820724
2022-02-25 08:37:57 +01:00
Tomasz Sobczyk cb9c2594fc Update architecture to "SFNNv4". Update network to nn-6877cd24400e.nnue.
Architecture:

The diagram of the "SFNNv4" architecture:
https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png

The most important architectural changes are the following:

* 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster.
* The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16.
* The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future.

Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better.

First session:

The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session.

The training was done using the following command:

python3 train.py \
    /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
    --gpus "$3," \
    --threads 4 \
    --num-workers 4 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --gamma=0.992 \
    --lr=8.75e-4 \
    --max_epochs=400 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones.

The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view

Second session:

The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py

The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600).

The training was done using the following command:

The training was done using the following command:

python3 train.py \
        /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
        /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
        --gpus "$3," \
        --threads 4 \
        --num-workers 4 \
        --batch-size 16384 \
        --progress_bar_refresh_rate 20 \
        --random-fen-skipping 3 \
        --features=HalfKAv2_hm^ \
        --lambda=1.0 \
        --gamma=0.995 \
        --lr=4.375e-4 \
        --max_epochs=800 \
        --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \
        --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id

In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest.

The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way:

Get the https://github.com/official-stockfish/Stockfish/blob/5640ad48ae5881223b868362c1cbeb042947f7b4/script/interleave_binpacks.py script.
Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view
Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z
Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z
Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z
Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack

Tests:

STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7
LLR: 2.94 (-2.94,2.94) <0.00,2.50>
Total: 16952 W: 4775 L: 4521 D: 7656
Ptnml(0-2): 133, 1818, 4318, 2076, 131

LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85
LLR: 2.94 (-2.94,2.94) <0.50,3.00>
Total: 14944 W: 4138 L: 3907 D: 6899
Ptnml(0-2): 21, 1499, 4202, 1728, 22

closes https://github.com/official-stockfish/Stockfish/pull/3927

Bench: 4919707
2022-02-10 19:54:31 +01:00
Brad Knox ad926d34c0 Update copyright years
Happy New Year!

closes https://github.com/official-stockfish/Stockfish/pull/3881

No functional change
2022-01-06 15:45:45 +01:00
Tomasz Sobczyk 4766dfc395 Optimize FT activation and affine transform for NEON.
This patch optimizes the NEON implementation in two ways.

    The activation layer after the feature transformer is rewritten to make it easier for the compiler to see through dependencies and unroll. This in itself is a minimal, but a positive improvement. Other architectures could benefit from this too in the future. This is not an algorithmic change.
    The affine transform for large matrices (first layer after FT) on NEON now utilizes the same optimized code path as >=SSSE3, which makes the memory accesses more sequential and makes better use of the available registers, which allows for code that has longer dependency chains.

Benchmarks from Redshift#161, profile-build with apple clang

george@Georges-MacBook-Air nets % ./stockfish-b82d93 bench 2>&1 | tail -4 (current master)
===========================
Total time (ms) : 2167
Nodes searched  : 4667742
Nodes/second    : 2154011
george@Georges-MacBook-Air nets % ./stockfish-7377b8 bench 2>&1 | tail -4 (this patch)
===========================
Total time (ms) : 1842
Nodes searched  : 4667742
Nodes/second    : 2534061

This is a solid 18% improvement overall, larger in a bench with NNUE-only, not mixed.

Improvement is also observed on armv7-neon (Raspberry Pi, and older phones), around 5% speedup.

No changes for architectures other than NEON.

closes https://github.com/official-stockfish/Stockfish/pull/3837

No functional changes.
2021-12-07 18:08:54 +01:00
mstembera 644f6d4790 Simplify away ValueListInserter
plus minor cleanups

STC: https://tests.stockfishchess.org/tests/view/616f059b40f619782fd4f73f
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 84992 W: 21244 L: 21197 D: 42551
Ptnml(0-2): 279, 9005, 23868, 9078, 266

closes https://github.com/official-stockfish/Stockfish/pull/3749

No functional change
2021-10-23 12:21:17 +02:00
Tomasz Sobczyk 900f249f59 Reduce the number of accumulator states
Reduce from 3 to 2. Make the intent of the states clearer.

STC: https://tests.stockfishchess.org/tests/view/60c50111457376eb8bcaad03
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 61888 W: 5007 L: 4944 D: 51937
Ptnml(0-2): 164, 3947, 22649, 4030, 154

LTC: https://tests.stockfishchess.org/tests/view/60c52b1c457376eb8bcaad2c
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 20248 W: 688 L: 618 D: 18942
Ptnml(0-2): 7, 551, 8946, 605, 15

closes https://github.com/official-stockfish/Stockfish/pull/3548

No functional change.
2021-06-14 11:22:08 +02:00
Tomasz Sobczyk ce4c523ad3 Register count for feature transformer
Compute optimal register count for feature transformer accumulation dynamically.
This also introduces a change where AVX512 would only use 8 registers instead of 16
(now possible due to a 2x increase in feature transformer size).

closes https://github.com/official-stockfish/Stockfish/pull/3543

No functional change
2021-06-13 13:10:56 +02:00
Tomasz Sobczyk b84fa04db6 Read NNUE net faster
Load feature transformer weights in bulk on little-endian machines.
This is in particular useful to test new nets with c-chess-cli,
see https://github.com/lucasart/c-chess-cli/issues/44

```
$ time ./stockfish.exe uci

Before : 0m0.914s
After  : 0m0.483s
```

No functional change
2021-06-13 09:39:03 +02:00
Stéphane Nicolet 8f081c86f7 Clean SIMD code a bit
Cleaner vector code structure in feature transformer. This patch just
regroups the parts of the inner loop for each SIMD instruction set.

Tested for non-regression:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 115760 W: 9835 L: 9831 D: 96094
Ptnml(0-2): 326, 7776, 41715, 7694, 369
https://tests.stockfishchess.org/tests/view/60b96b39457376eb8bcaa26e

It would be nice if a future patch could use some of the macros at
the top of the file to unify the code between the distincts SIMD
instruction sets (of course, unifying the Relu will be the challenge).

closes https://github.com/official-stockfish/Stockfish/pull/3506

No functional change
2021-06-04 14:07:46 +02:00
Tomasz Sobczyk 5448cad29e Fix export of the feature transformer.
PSQT export was missing.

fixes #3507

closes https://github.com/official-stockfish/Stockfish/pull/3508

No functional change
2021-05-30 21:31:58 +02:00
Stéphane Nicolet f193778446 Do not use lazy evaluation inside NNUE
This simplification patch implements two changes:

1. it simplifies away the so-called "lazy" path in the NNUE evaluation internals,
   where we trusted the psqt head alone to avoid the costly "positional" head in
   some cases;
2. it raises a little bit the NNUEThreshold1 in evaluate.cpp (from 682 to 800),
   which increases the limit where we switched from NNUE eval to Classical eval.

Both effects increase the number of positional evaluations done by our new net
architecture, but the results of our tests below seem to indicate that the loss
of speed will be compensated by the gain of eval quality.

STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 26280 W: 2244 L: 2137 D: 21899
Ptnml(0-2): 72, 1755, 9405, 1810, 98
https://tests.stockfishchess.org/tests/view/60ae73f112066fd299795a51

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 20592 W: 750 L: 677 D: 19165
Ptnml(0-2): 9, 614, 8980, 681, 12
https://tests.stockfishchess.org/tests/view/60ae88e812066fd299795a82

closes https://github.com/official-stockfish/Stockfish/pull/3503

Bench: 3817907
2021-05-27 01:21:56 +02:00