Compare commits

..

80 Commits

Author SHA1 Message Date
Joost VandeVondele 7262fd5d14 Stockfish 14.1
Official release version of Stockfish 14.1

Bench: 6334068

---

Today, we have the pleasure to announce Stockfish 14.1.

As usual, downloads will be freely available at stockfishchess.org/download [1].

With Stockfish 14.1 our users get access to the strongest chess engine
available today. In the period leading up to this release, Stockfish
convincingly won several chess engine tournaments, including the TCEC 21
superfinal, the TCEC Cup 9, and the Computer Chess Championship for
Fischer Random Chess (Chess960). In the latter tournament, Stockfish
was undefeated in 599 out of 600 games played.

Compared to Stockfish 14, this release introduces a more advanced NNUE
architecture and various search improvements. In self play testing, using
a book of balanced openings, Stockfish 14.1 wins three times more game
pairs than it loses [2]. At this high level, draws are very common, so the
Elo difference to Stockfish 14 is about 17 Elo. The NNUE evaluation method,
introduced to top level chess with Stockfish 12 about one year ago [3],
has now been adopted by several other strong CPU based chess engines.

The Stockfish project builds on a thriving community of enthusiasts
(thanks everybody!) that contribute their expertise, time, and resources
to build a free and open-source chess engine that is robust,
widely available, and very strong. We invite our chess fans to join the
fishtest testing framework and programmers to contribute to the project [4].

Stay safe and enjoy chess!

The Stockfish team

[1] https://stockfishchess.org/download/
[2] https://tests.stockfishchess.org/tests/view/6175c320af70c2be1788fa2b
[3] https://github.com/official-stockfish/Stockfish/discussions/3628
[4] https://stockfishchess.org/get-involved/
2021-10-28 07:38:19 +02:00
mstembera 385deefd80 Fix sometimes incorrect key for prefetches
STC
https://tests.stockfishchess.org/tests/view/61737b4f6ce927be32558401
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 138712 W: 34914 L: 34942 D: 68856
Ptnml(0-2): 421, 14817, 38894, 14817, 407

Very minor tweak since Position::key() depends on the 50 move rule counter.
Comments: https://github.com/mstembera/Stockfish/commit/cddde31eed505cdf0c4fc8ff96b89f6e39c797e1

closes https://github.com/official-stockfish/Stockfish/pull/3759

No functional change
2021-10-25 12:26:44 +02:00
Joost VandeVondele 2c86ae196d Adjust ButterflyHistory decay parameter
passed STC:
LLR: 2.98 (-2.94,2.94) <-0.50,2.50>
Total: 26680 W: 6807 L: 6593 D: 13280
Ptnml(0-2): 73, 3007, 6989, 3175, 96
https://tests.stockfishchess.org/tests/view/6174094e6ce927be32558441

passed LTC:
LLR: 2.98 (-2.94,2.94) <0.50,3.50>
Total: 21104 W: 5403 L: 5185 D: 10516
Ptnml(0-2): 8, 2160, 6001, 2372, 11
https://tests.stockfishchess.org/tests/view/61744927351812fe5f969864

closes https://github.com/official-stockfish/Stockfish/pull/3761

Bench: 6334068
2021-10-24 22:17:55 +02:00
Stefan Geschwentner 8557f35aa5 Double extend search even more via LMR
Allow now for the first five moves a two plies deeper LMR search.

STC:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 99608 W: 25143 L: 25115 D: 49350
Ptnml(0-2): 291, 11444, 26328, 11428, 313
https://tests.stockfishchess.org/tests/view/61718c9438cb9784038af8d7

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 52064 W: 13234 L: 13145 D: 25685
Ptnml(0-2): 35, 5431, 15014, 5514, 38
https://tests.stockfishchess.org/tests/view/6171e13e38cb9784038af928

closes https://github.com/official-stockfish/Stockfish/pull/3760

Bench: 7222293
2021-10-24 22:13:47 +02:00
bmc4 1163d972a9 Simplify LMR multiThread condition
STC (8 threads):
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 110584 W: 27818 L: 27807 D: 54959
Ptnml(0-2): 156, 12089, 30791, 12100, 156
https://tests.stockfishchess.org/tests/view/6172ef436ce927be325583a9

LTC (8 threads):
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 23632 W: 6025 L: 5903 D: 11704
Ptnml(0-2): 5, 2292, 7100, 2414, 5
https://tests.stockfishchess.org/tests/view/6173cf096ce927be32558412

closes https://github.com/official-stockfish/Stockfish/pull/3757

No functional change (in the single-threaded case)
Bench: 6689428
2021-10-24 22:08:28 +02:00
FauziAkram fc8213c7df Tuning of a Null Move Parameter
STC:
LLR: 2.99 (-2.94,2.94) <-0.50,2.50>
Total: 78744 W: 19956 L: 19664 D: 39124
Ptnml(0-2): 259, 9005, 20573, 9255, 280
https://tests.stockfishchess.org/tests/view/6172017a38cb9784038af947

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 68528 W: 17309 L: 16964 D: 34255
Ptnml(0-2): 41, 7194, 19455, 7527, 47
https://tests.stockfishchess.org/tests/view/6172994d38cb9784038af983

closes https://github.com/official-stockfish/Stockfish/pull/3756

bench: 6689428
2021-10-23 12:27:32 +02:00
bmc4 927a84d310 Increase TTdepth acceptance some Threads
Increase TTdepth acceptance only on half of the Threads

STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 19272 W: 4956 L: 4766 D: 9550
Ptnml(0-2): 25, 1989, 5423, 2169, 30
https://tests.stockfishchess.org/tests/view/6172be6238cb9784038af9a7

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 23688 W: 6111 L: 5897 D: 11680
Ptnml(0-2): 2, 2275, 7081, 2479, 7
https://tests.stockfishchess.org/tests/view/6172e32938cb9784038af9c7

closes https://github.com/official-stockfish/Stockfish/pull/3754

No functional change in the single-threaded case
2021-10-23 12:23:29 +02:00
Stefano Cardanobile 2214fcecf7 Rewrite NNUE evaluation adjustments
Make the eval code in the evaluate_nnue.cpp more similar to the rest of the codebase:

* remove multiple variable assignment
* make if conditions explicit and indent on multiple lines

passed STC
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 59032 W: 14834 L: 14751 D: 29447
Ptnml(0-2): 176, 6310, 16459, 6397, 174
https://tests.stockfishchess.org/tests/view/616f250540f619782fd4f76d

closes https://github.com/official-stockfish/Stockfish/pull/3753

No functional change
2021-10-23 12:22:02 +02:00
mstembera 644f6d4790 Simplify away ValueListInserter
plus minor cleanups

STC: https://tests.stockfishchess.org/tests/view/616f059b40f619782fd4f73f
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 84992 W: 21244 L: 21197 D: 42551
Ptnml(0-2): 279, 9005, 23868, 9078, 266

closes https://github.com/official-stockfish/Stockfish/pull/3749

No functional change
2021-10-23 12:21:17 +02:00
Stefan Geschwentner 8a8640a761 Double extend more often via LMR
Allow for first three moves always a two plies deeper LMR search.

STC:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 206096 W: 51966 L: 52093 D: 102037
Ptnml(0-2): 664, 23817, 54293, 23530, 744
https://tests.stockfishchess.org/tests/view/616f197d40f619782fd4f75a

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 62384 W: 15567 L: 15492 D: 31325
Ptnml(0-2): 40, 6633, 17777, 6696, 46
https://tests.stockfishchess.org/tests/view/616ffa1b4f0b65a0e231e682

closes https://github.com/official-stockfish/Stockfish/pull/3752

Bench: 6154836
2021-10-21 12:42:30 +02:00
bmc4 42a895d9c9 Simplify null move search condition
Remove `ss->ttPv` condition on null move search condition

STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 80832 W: 20276 L: 20221 D: 40335
Ptnml(0-2): 267, 9335, 21168, 9368, 278
https://tests.stockfishchess.org/tests/view/616ed4a0942d40685e3237c6

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 54184 W: 13464 L: 13377 D: 27343
Ptnml(0-2): 37, 5758, 15435, 5805, 57
https://tests.stockfishchess.org/tests/view/616ef71f40f619782fd4f72d

closes https://github.com/official-stockfish/Stockfish/pull/3750

bench: 6201607
2021-10-21 08:43:43 +02:00
bmc4 4af1ae82c6 Adjust TTdepth acceptance on early cutoff
STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 63784 W: 16185 L: 15917 D: 31682
Ptnml(0-2): 231, 7309, 16531, 7603, 218
https://tests.stockfishchess.org/tests/view/616ed03a942d40685e3237c0

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 12728 W: 3268 L: 3072 D: 6388
Ptnml(0-2): 8, 1298, 3563, 1480, 15
https://tests.stockfishchess.org/tests/view/616ef156942d40685e32380a

closes https://github.com/official-stockfish/Stockfish/pull/3748

bench: 7050445
2021-10-19 22:14:39 +02:00
bmc4 b37054c310 Simplify evaluate condition on search
Remove condition for MOVE_NULL on search.

STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 47544 W: 11968 L: 11864 D: 23712
Ptnml(0-2): 150, 5535, 12318, 5599, 170
https://tests.stockfishchess.org/tests/view/616e37143799eb91f1f071ee

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 67472 W: 16938 L: 16870 D: 33664
Ptnml(0-2): 49, 7119, 19331, 7189, 48
https://tests.stockfishchess.org/tests/view/616e3fab3799eb91f1f071f1

closes https://github.com/official-stockfish/Stockfish/pull/3746

bench: 5255771
2021-10-19 22:09:47 +02:00
bmc4 67d0616483 Simplify probCutCount away
Simplify away the limitation in number of moves in probCut.

STC:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 286768 W: 71888 L: 72133 D: 142747
Ptnml(0-2): 983, 33084, 75471, 32887, 959
https://tests.stockfishchess.org/tests/view/616c9b9b90e1312a3cd0ef0a

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 69312 W: 17243 L: 17176 D: 34893
Ptnml(0-2): 42, 7452, 19614, 7493, 55
https://tests.stockfishchess.org/tests/view/616cebbf4f95b438f7a85f93

closes https://github.com/official-stockfish/Stockfish/pull/3745

bench: 5005810
2021-10-18 21:00:08 +02:00
Stefano Cardanobile f7494961de Reformat Eval::evaluate()
Non functional simplification: the goal of this patch is to make
the style in the evaluate() function similar to the rest of the code.

passed STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 95608 W: 24058 L: 24026 D: 47524
Ptnml(0-2): 292, 10379, 26396, 10479, 258
https://tests.stockfishchess.org/tests/view/616c64fd99b580bf37797e4f

closes https://github.com/official-stockfish/Stockfish/pull/3744

Non-functional change
2021-10-18 20:45:47 +02:00
Stéphane Nicolet 8a74c08928 Remove noLMRExtension flag
This simplification patch removes the noLMRExtension flag. It was introduced in June
(see following link for that commit), but does not seem to be necessary anymore.
Link: https://github.com/official-stockfish/Stockfish/commit/e1f181ee643dcaa92c606b74b3abd23dede136cd

STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 21200 W: 5369 L: 5228 D: 10603
Ptnml(0-2): 67, 2355, 5616, 2494, 68
https://tests.stockfishchess.org/tests/view/616c03d299b580bf37797dcb

LTC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 37536 W: 9387 L: 9278 D: 18871
Ptnml(0-2): 23, 3988, 10643, 4085, 29
https://tests.stockfishchess.org/tests/view/616c10f499b580bf37797ddd

closes https://github.com/official-stockfish/Stockfish/pull/3743

Bench: 4792969
2021-10-17 17:54:39 +02:00
Stéphane Nicolet 6847be2c75 Allow some LMR double extensions
Allow some LMR double extensions for the second and third sons of each node.

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 170320 W: 42608 L: 42187 D: 85525
Ptnml(0-2): 516, 19635, 44422, 20086, 501
https://tests.stockfishchess.org/tests/view/616a9e3899b580bf37797cf4

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 74400 W: 18783 L: 18423 D: 37194
Ptnml(0-2): 46, 7812, 21129, 8162, 51
https://tests.stockfishchess.org/tests/view/616b378499b580bf37797d61

closes https://github.com/official-stockfish/Stockfish/pull/3742

Bench: 4877152
2021-10-17 12:29:11 +02:00
Stefano Cardanobile 4231d99ab4 Smooth improving
Smooth dependency on improvement margin in null move search.

STC
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 17384 W: 4468 L: 4272 D: 8644
Ptnml(0-2): 42, 1919, 4592, 2079, 60
https://tests.stockfishchess.org/tests/view/61689b8a1e5f6627cc1c0fdc

LTC
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 45648 W: 11525 L: 11243 D: 22880
Ptnml(0-2): 26, 4731, 13036, 4997, 34
https://tests.stockfishchess.org/tests/view/6168a12c1e5f6627cc1c0fe3

It would be interesting to test if the other pruning/reduction heuristics
in master which are using the improving variable (ie the sign of improvement)
could benefit from a smooth function of the improvement value (or maybe a
Relu of the improvement value).

closes https://github.com/official-stockfish/Stockfish/pull/3740

Bench: 4916775
2021-10-15 14:57:01 +02:00
Joost VandeVondele 580698e5e5 Compute ttCapture earlier
Compute ttCapture earlier, and reuse.

passed STC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 74128 W: 18640 L: 18578 D: 36910
Ptnml(0-2): 224, 7970, 20649, 7962, 259
https://tests.stockfishchess.org/tests/view/615dd9fa1a32f4036ac7fc4d

closes https://github.com/official-stockfish/Stockfish/pull/3734

No functional change
2021-10-14 09:58:03 +02:00
bmc4 0bddd942b4 Simplify ttHitAverage away
Simplify ttHitAverage away, which was introduced in the following commit:
[here](https://github.com/BM123499/Stockfish/commit/fe124896b241b4791454fd151da10101ad48f6d7)

A few tweaks with Elo gaining bounds have been tried to keep the code,
but they all failed:
https://tests.stockfishchess.org/tests/view/61656f7683dd501a05b0b292
https://tests.stockfishchess.org/tests/view/6165c0ca83dd501a05b0b2ca
https://tests.stockfishchess.org/tests/view/6165bf9683dd501a05b0b2c8
https://tests.stockfishchess.org/tests/view/6165719483dd501a05b0b29b
https://tests.stockfishchess.org/tests/view/6166c7fd83dd501a05b0b353
https://tests.stockfishchess.org/tests/view/6166c63b83dd501a05b0b350

STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 58504 W: 14781 L: 14694 D: 29029
Ptnml(0-2): 175, 6718, 15426, 6711, 222
https://tests.stockfishchess.org/tests/view/6165112c83dd501a05b0b257

LTC:
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 33480 W: 8448 L: 8332 D: 16700
Ptnml(0-2): 21, 3569, 9447, 3679, 24
https://tests.stockfishchess.org/tests/view/61656fcf83dd501a05b0b294

change https://github.com/official-stockfish/Stockfish/pull/3739

bench: 4540339
2021-10-14 09:47:20 +02:00
Joseph Ellis 673841301b Simplify multi-cut condition
Now that the multi-cut condition is safer, we can avoid the cost of the sub-search.

STC:
https://tests.stockfishchess.org/tests/view/6165fd9283dd501a05b0b2fe
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 18648 W: 4745 L: 4600 D: 9303
Ptnml(0-2): 47, 2111, 4887, 2208, 71

LTC:
https://tests.stockfishchess.org/tests/view/616629ea83dd501a05b0b320
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 41704 W: 10407 L: 10302 D: 20995
Ptnml(0-2): 35, 4425, 11823, 4538, 31

closes https://github.com/official-stockfish/Stockfish/pull/3738

Bench: 5905086
2021-10-13 23:34:23 +02:00
Michael Chaly c8459b18ba Reduce more if multiple moves exceed alpha
Idea of this patch is the following: in case we already have four moves that
exceeded alpha in the current node, the probability of finding fifth should
be reasonably low. Note that four is completely arbitrary - there could and
probably should be some tweaks, both in tweaking best move count threshold
for more reductions and tweaking how they work - for example making more
reductions with best move count linearly.

passed STC:
https://tests.stockfishchess.org/tests/view/615f614783dd501a05b0aee2
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 141816 W: 36056 L: 35686 D: 70074
Ptnml(0-2): 499, 15131, 39273, 15511, 494

passed LTC:
https://tests.stockfishchess.org/tests/view/615fdff683dd501a05b0af35
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 68536 W: 17221 L: 16891 D: 34424
Ptnml(0-2): 38, 6573, 20725, 6885, 47

closes https://github.com/official-stockfish/Stockfish/pull/3736

Bench: 6131513
2021-10-09 09:59:33 +02:00
xoto10 f21a66f70d Small clean-up, Sept 2021
Closes https://github.com/official-stockfish/Stockfish/pull/3485

No functional change
2021-10-07 09:41:57 +02:00
Stéphane Nicolet 54a989930e Capping stat bonus at 2000
This patch updates the stat_bonus() function (used in the history tables to
help move ordering), keeping the same quadratic for small depths but changing
the values for depth >= 9:

The old bonus formula was increasing from zero at depth 1 to 4100 at depth 14,
then used the strange, small value of 73 for all depths >= 15.

The new bonus formula increases from 0 at depth 1 to 2000 at depth 8, then
keeps 2000 for all depths >= 8.

passed STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 169624 W: 42875 L: 42454 D: 84295
Ptnml(0-2): 585, 19340, 44557, 19729, 601
https://tests.stockfishchess.org/tests/view/615bd69e9d256038a969b97c

passed LTC:
LLR: 3.07 (-2.94,2.94) <0.50,3.50>
Total: 37336 W: 9456 L: 9191 D: 18689
Ptnml(0-2): 20, 3810, 10747, 4067, 24
https://tests.stockfishchess.org/tests/view/615c75d99d256038a969b9b2

closes https://github.com/official-stockfish/Stockfish/pull/3731

Bench: 6261865
2021-10-06 12:04:35 +02:00
Joost VandeVondele 329bdbd9cf Improve the Chess960 correction for cornered bishops
As Chess960 patches can not be tested on fishtest, this was locally tuned
and tested:

Elo: 2.36 +- 1.07
LOS: 0.999992

closes https://github.com/official-stockfish/Stockfish/pull/3730

Bench: 5714575
2021-10-06 11:57:34 +02:00
J. Oster 371b522e9e Time-management fix in MultiPV mode.
When playing games in MultiPV mode we must take care to only track the
best move changing for the first PV line. Otherwise, SF will spend most
of its time for the initial moves after the book exit.

This has been observed and reported on Discord, but can also be seen in
games played in Stefan Pohl's MultiPV experiment.

Tested with MultiPV=4.

STC:
https://tests.stockfishchess.org/tests/view/615c24b59d256038a969b990
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 1744 W: 694 L: 447 D: 603
Ptnml(0-2): 32, 125, 358, 278, 79

LTC:
https://tests.stockfishchess.org/tests/view/615c31769d256038a969b993
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 2048 W: 723 L: 525 D: 800
Ptnml(0-2): 10, 158, 511, 314, 31

closes https://github.com/official-stockfish/Stockfish/pull/3729

Bench: 5714575
2021-10-06 11:53:33 +02:00
Michael Chaly 135caee606 Increase reductions with thread count
Respin of multi-thread idea that was simplified away recently: basically doing
more reductions with thread count since Lazy SMP naturally widens search. With
drawish book this idea got simplified away but with less drawish book it again
gains elo, maybe trying to reinstall other ideas that were simplified away
previously can be beneficial.

passed STC
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 39736 W: 10205 L: 9986 D: 19545
Ptnml(0-2): 45, 4254, 11064, 4447, 58
https://tests.stockfishchess.org/tests/view/615750702d02f48db3961b00

passed LTC
LLR: 2.97 (-2.94,2.94) <0.50,3.50>
Total: 60352 W: 15530 L: 15218 D: 29604
Ptnml(0-2): 24, 5900, 18016, 6212, 24
https://tests.stockfishchess.org/tests/view/6157d8935488e26ea5eace7f

closes https://github.com/official-stockfish/Stockfish/pull/3724

Bench 5714575
2021-10-03 11:28:19 +02:00
Michael Chaly 21ad356c09 Extend quiet tt moves at PvNodes
Idea is to extend some quiet ttMoves if a lot of things indicate that
the transposition table move is going to be a good move:

1) move being a killer - so being the best move in nearby node;
2) reply continuation history is really good.

This is basically saying that move is good "in general" in this position,
that it is a good reply to the opponent move and that it was the best in
this position somewhere in search - so extending it makes a lot of sense.
In general in past year we had a lot of extensions of different types,
maybe there is something more in it :)

passed STC
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 42944 W: 10932 L: 10695 D: 21317
Ptnml(0-2): 141, 4869, 11210, 5116, 136
https://tests.stockfishchess.org/tests/view/614cca8e7bdc23e77ceb89f0

passed LTC
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 156848 W: 39473 L: 38893 D: 78482
Ptnml(0-2): 125, 16327, 44913, 16961, 98
https://tests.stockfishchess.org/tests/view/614cf93d7bdc23e77ceb8a13

closes https://github.com/official-stockfish/Stockfish/pull/3719

Bench: 5714575
2021-09-26 06:58:14 +02:00
Stéphane Nicolet 919da65d70 Reduction instead of cutoff
In master, during singular move analysis, when both the transposition value
and a reduced search for the other moves seem to indicate a fail high, we
heuristically prune the whole subtree and return an fail high score.

This patch is a little bit more cautious in this case, and instead of the
risky cutoff, we now search the ttMove with a reduced depth (by two plies).

STC:
https://tests.stockfishchess.org/tests/view/614dafe07bdc23e77ceb8a89
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 46728 W: 11909 L: 11666 D: 23153
Ptnml(0-2): 181, 5288, 12168, 5561, 166

LTC:
https://tests.stockfishchess.org/tests/view/614dc84abe4c07e0ecac3c95
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 74520 W: 18809 L: 18450 D: 37261
Ptnml(0-2): 45, 7735, 21346, 8084, 50

closes https://github.com/official-stockfish/Stockfish/pull/3718

Bench: 5499262
2021-09-25 22:12:17 +02:00
OfekShochat 00e34a758f Range reductions
adding reductions for when the delta between the static eval and the child's eval is consistently low.

passed STC
https://tests.stockfishchess.org/html/live_elo.html?614d7b3c7bdc23e77ceb8a5d
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 88872 W: 22672 L: 22366 D: 43834
Ptnml(0-2): 343, 10150, 23117, 10510, 316

passed LTC
https://tests.stockfishchess.org/html/live_elo.html?614daf3e7bdc23e77ceb8a82
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 24368 W: 6153 L: 5928 D: 12287
Ptnml(0-2): 13, 2503, 6937, 2708, 23

closes https://github.com/official-stockfish/Stockfish/pull/3717

Bench: 5443950
2021-09-24 23:17:48 +02:00
Stéphane Nicolet ff3fa0c664 Tweak doubly singular condition (Topo's patch)
This patch relax a little bit the condition for doubly singular moves
(ie moves that are so forced that we think that they deserve a local
double extension of the search). We lower the margin and allow up to
six such double extensions in the path between the root and the critical
node.

Original idea by Siad Daboul (@TopoIogist) in PR #3709

Tested with the previous commit:

passed STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 33048 W: 8458 L: 8236 D: 16354
Ptnml(0-2): 120, 3701, 8660, 3923, 120
https://tests.stockfishchess.org/tests/view/614b24347bdc23e77ceb88fe

passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 54176 W: 13712 L: 13406 D: 27058
Ptnml(0-2): 36, 5653, 15399, 5969, 31
https://tests.stockfishchess.org/tests/view/614b3b727bdc23e77ceb8911

closes https://github.com/official-stockfish/Stockfish/pull/3714

Bench: 5792377
2021-09-23 23:24:28 +02:00
Stéphane Nicolet 73018a0337 Detect search explosions
This patch detects some search explosions (due to double extensions in
search.cpp) which can happen in some pathological positions, and takes
measures to ensure progress in search even for these pathological situations.

While a small number of double extensions can be useful during search
(for example to resolve a tactical sequence), a sustained regime of
double extensions leads to search explosion and a non-finishing search.
See the discussion in https://github.com/official-stockfish/Stockfish/pull/3544
and the issue https://github.com/official-stockfish/Stockfish/issues/3532 .

The implemented algorithm is the following:

a) at each node during search, store the current depth in the stack.
   Double extensions are by definition levels of the stack where the
   depth at ply N is strictly higher than depth at ply N-1.

b) during search, calculate for each thread a running average of the
   number of double extensions in the last 4096 visited nodes.

c) if one thread has more than 2% of double extensions for a sustained
   period of time (6 millions consecutive nodes, or about 4 seconds on
   my iMac), we decide that this thread is in an explosion state and
   we calm down this thread by preventing it to do any double extension
   for the next 6 millions nodes.

To calculate the running averages, we also introduced a auxiliary class
generalizing the computations of ttHitAverage variable we already had in
code. The implementation uses an exponential moving average of period 4096
and resolution 1/1024, and all computations are done with integers for
efficiency.

-----------

Example where the patch solves a search explosion:

```
   ./stockfish
   ucinewgame
   position fen 8/Pk6/8/1p6/8/P1K5/8/6B1 w - - 37 130
   go infinite
```

This algorithm does not affect search in normal, non-pathological positions.
We verified, for instance, that the usual bench is unchanged up to depth 20
at least, and that the node numbers are unchanged for a search of the starting
position at depth 32.

-------------

See https://github.com/official-stockfish/Stockfish/pull/3714

Bench: 5575265
2021-09-23 23:19:06 +02:00
Michael Chaly e8788d1b32 Combo of various parameter tweaks
Combination of parameter tweaks in search, evaluation and time management.
Original patches by snicolet xoto10 lonfom169 and Vizvezdenec.

Includes:

* Use bigger grain of positional evaluation more frequently (up to 1 exchange difference in non-pawn-material);
* More extra time according to increment;
* Increase margin for singular extensions;
* Do more aggresive parent node futility pruning.

Passed STC
https://tests.stockfishchess.org/tests/view/6147deab3733d0e0dd9f313d
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 45488 W: 11691 L: 11450 D: 22347
Ptnml(0-2): 145, 5208, 11824, 5395, 172

Passed LTC
https://tests.stockfishchess.org/tests/view/6147f1d53733d0e0dd9f3141
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 62520 W: 15808 L: 15482 D: 31230
Ptnml(0-2): 43, 6439, 17960, 6785, 33

closes https://github.com/official-stockfish/Stockfish/pull/3710

bench 5575265
2021-09-21 19:48:40 +02:00
xoto10 5b47b4e6c0 Increase optimumTime by 10%
STC 10+0.1 :
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 47032 W: 12078 L: 11841 D: 23113
Ptnml(0-2): 159, 5098, 12746, 5373, 140
https://tests.stockfishchess.org/tests/view/613f9df1f29dda16fcca8731

LTC 60+0.6 :
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 66248 W: 16631 L: 16301 D: 33316
Ptnml(0-2): 44, 6560, 19578, 6906, 36
https://tests.stockfishchess.org/tests/view/6140603d7315e7c73204a4c1

Non-regression tests with other time control styles:

Moves/Time 40/10+0 :
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 51640 W: 13350 L: 13254 D: 25036
Ptnml(0-2): 183, 5770, 13797, 5908, 162
https://tests.stockfishchess.org/tests/view/6141592b7315e7c73204a599

TCEC Style 10+0.01 :
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 20592 W: 5300 L: 5157 D: 10135
Ptnml(0-2): 81, 2240, 5544, 2317, 114
https://tests.stockfishchess.org/tests/view/61425bb27315e7c73204a6a2

Sudden death 15+0 :
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 127104 W: 32728 L: 32741 D: 61635
Ptnml(0-2): 735, 13973, 34149, 13960, 735
https://tests.stockfishchess.org/tests/view/614256a77315e7c73204a699

The first 3 tests were run with an initial version of the code, which was then modified to make the amount of extra time dependent on the size of increment. No increment gives no extra time, and the extra time given increases until an increment of 1% or more of remaining time gives 10% extra thinking time.

closes https://github.com/official-stockfish/Stockfish/pull/3702

Bench 6658747
2021-09-17 08:14:36 +02:00
SFisGOD 723f48dec0 Update default net to nn-13406b1dcbe0.nnue
SPSA 1: https://tests.stockfishchess.org/tests/view/6134abc425b9b35584838572
Parameters: A total of 64 net biases were tuned (hidden layer 1)
Base net: nn-6762d36ad265.nnue
New net: nn-c9fdeea14cb2.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/61355b7e25b9b3558483860e
Parameters: 256 net weights and 8 net biases (output layer)
Base net: nn-c9fdeea14cb2.nnue
New net: nn-0ddc28184f4c.nnue

SPSA 3: https://tests.stockfishchess.org/tests/view/613737be0cd98ab40c0c9e4e
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-0ddc28184f4c.nnue
New net: nn-2419828bb394.nnue

SPSA 4: https://tests.stockfishchess.org/tests/view/613966ff689039fce12e0fe7
Parameters: A total of 64 net biases were tuned (hidden layer 1)
Base net: nn-2419828bb394.nnue
New net: nn-05d9b1ee3037.nnue

SPSA 5: https://tests.stockfishchess.org/tests/view/613b4a38689039fce12e1209
Parameters: 256 net weights and 8 net biases (output layer)
Base net: nn-05d9b1ee3037.nnue
New net: nn-98c6ce0fc15f.nnue

SPSA 6: https://tests.stockfishchess.org/tests/view/613e331515591e7c9ebc3fe9
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-98c6ce0fc15f.nnue
New net: nn-13406b1dcbe0.nnue

STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 82008 W: 21044 L: 20752 D: 40212
Ptnml(0-2): 264, 9341, 21525, 9587, 287
https://tests.stockfishchess.org/tests/view/613f7c6cf29dda16fcca870c

LTC:
LLR: 2.96 (-2.94,2.94) <0.50,3.50>
Total: 182928 W: 46258 L: 45602 D: 91068
Ptnml(0-2): 107, 19448, 51712, 20076, 121
https://tests.stockfishchess.org/tests/view/613fccb97315e7c73204a48c

Closes #3703

Bench: 6658747
2021-09-15 17:50:20 +02:00
xoto10 fd5e77950e Update 2 search parameters after tune.
A tuning run on 3 search parameters was done with 200k games, narrow ranges (50-150%) and a small value for A (3% of total games) :
https://tests.stockfishchess.org/tests/view/613b5f4b689039fce12e1220

STC 10+0.1 :
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 73112 W: 18800 L: 18520 D: 35792
Ptnml(0-2): 205, 8395, 19115, 8597, 244
https://tests.stockfishchess.org/tests/view/613cb8d2689039fce12e1308

LTC 60+0.6 :
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 45616 W: 11604 L: 11321 D: 22691
Ptnml(0-2): 24, 4769, 12946, 5038, 31
https://tests.stockfishchess.org/tests/view/613d07048253e53e97b55b32

closes https://github.com/official-stockfish/Stockfish/pull/3698

Bench 6504816
2021-09-12 18:03:56 +02:00
Michael Chaly 30fdbf4328 Decrease depth for cutnodes with no tt move
By analogy to existing logic of decreasing depth for PvNodes w/o tt move
do the same for cutNodes.

Passed STC
https://tests.stockfishchess.org/tests/view/613abf5a689039fce12e1155
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 90336 W: 23108 L: 22804 D: 44424
Ptnml(0-2): 286, 10316, 23642, 10656, 268

Passed LTC
https://tests.stockfishchess.org/tests/view/613ae330689039fce12e1172
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 37736 W: 9607 L: 9346 D: 18783
Ptnml(0-2): 21, 3917, 10730, 4180, 20

closes https://github.com/official-stockfish/Stockfish/pull/3697

bench 5891181
2021-09-10 11:50:43 +02:00
Stefan Geschwentner b7b6b4ba18 Further improve history updates
Now even double history updates if a search failed low at an expected PV or CUT node.

STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 30736 W: 7891 L: 7674 D: 15171
Ptnml(0-2): 90, 3477, 8017, 3694, 90
https://tests.stockfishchess.org/tests/view/61364ae30cd98ab40c0c9da5

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 73600 W: 18684 L: 18326 D: 36590
Ptnml(0-2): 41, 7734, 20899, 8078, 48
https://tests.stockfishchess.org/tests/view/6136940f0cd98ab40c0c9df3

closes https://github.com/official-stockfish/Stockfish/pull/3694

Bench: 6030657
2021-09-07 19:59:14 +02:00
Stefan Geschwentner c31fc8d163 Improve history updates
If a search failed low at an expected PV or CUT node do greater history updates.

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 95112 W: 24293 L: 23982 D: 46837
Ptnml(0-2): 285, 10893, 24906, 11170, 302
https://tests.stockfishchess.org/tests/view/6132aa1a2ffb3c36aceb926f

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 116352 W: 29450 L: 28975 D: 57927
Ptnml(0-2): 93, 12263, 32984, 12748, 88
https://tests.stockfishchess.org/tests/view/613394d12ffb3c36aceb92f4

closes https://github.com/official-stockfish/Stockfish/pull/3693

Bench: 6130736
2021-09-06 14:19:47 +02:00
SFisGOD be63ce1bb5 Update default net to nn-6762d36ad265.nnue
SPSA 1: https://tests.stockfishchess.org/tests/view/612cdb1fbb4956d8b78eb5ab
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-fe433fd8c7f6.nnue
New net: nn-5f134823db04.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/612fcde645091e810014af19
Parameters: A total of 64 net biases were tuned (hidden layer 1)
Base net: nn-5f134823db04.nnue
New net: nn-8eca5dd4e3f7.nnue

SPSA 3: https://tests.stockfishchess.org/tests/view/6130822345091e810014af61
Parameters: 256 net weights and 8 net biases (output layer)
Base net: nn-8eca5dd4e3f7.nnue
New net: nn-4556108e4f00.nnue

SPSA 4: https://tests.stockfishchess.org/tests/view/613287652ffb3c36aceb923c
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-4556108e4f00.nnue
New net: nn-6762d36ad265.nnue

STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 162776 W: 41220 L: 40807 D: 80749
Ptnml(0-2): 517, 18800, 42359, 19177, 535
https://tests.stockfishchess.org/tests/view/6134107125b9b35584838559

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 41056 W: 10428 L: 10156 D: 20472
Ptnml(0-2): 30, 4288, 11618, 4564, 28
https://tests.stockfishchess.org/tests/view/6134ad6525b9b3558483857a

closes https://github.com/official-stockfish/Stockfish/pull/3691

Bench: 5812158
2021-09-06 14:08:22 +02:00
Michael Chaly e404a7d97c Extend captures and promotions
This patch introduces extension for captures and promotions. Every capture or
promotion that is not the first move in the list gets extended at PvNodes and
cutNodes. Special thanks to @locutus2 - all my previous attepmts that failed
on this idea were done only for PvNodes - idea to include also cutNodes was
based on his latest passed patch.

STC
https://tests.stockfishchess.org/tests/view/6134abf325b9b35584838574
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 188920 W: 47754 L: 47304 D: 93862
Ptnml(0-2): 595, 21754, 49344, 22140, 627

LTC
https://tests.stockfishchess.org/tests/view/613521de25b9b355848385d7
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 8768 W: 2283 L: 2098 D: 4387
Ptnml(0-2): 7, 866, 2452, 1053, 6

closes https://github.com/official-stockfish/Stockfish/pull/3692

bench: 5564555
2021-09-06 13:59:17 +02:00
SFisGOD 2807dcfab6 Update default net to nn-735bba95dec0.nnue
SPSA 1: https://tests.stockfishchess.org/tests/view/61286d8b62d20cf82b5ad1bd
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-33495fe25081.nnue
New net: nn-83e3cf2af92b.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/6129cf2162d20cf82b5ad25f
Parameters: A total of 64 net biases were tuned (hidden layer 1)
Base net: nn-83e3cf2af92b.nnue
New net: nn-69a528eaef35.nnue

SPSA 3: https://tests.stockfishchess.org/tests/view/612a0dcb62d20cf82b5ad2a0
Parameters: 256 net weights and 8 net biases (output layer)
Base net: nn-69a528eaef35.nnue
New net: nn-735bba95dec0.nnue

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 95144 W: 24310 L: 23999 D: 46835
Ptnml(0-2): 232, 11059, 24748, 11232, 301
https://tests.stockfishchess.org/tests/view/612bb3be0fdf40644b4b9996

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 33632 W: 8522 L: 8271 D: 16839
Ptnml(0-2): 18, 3511, 9516, 3744, 27
https://tests.stockfishchess.org/tests/view/612ce5b9bb4956d8b78eb5b3

Closes https://github.com/official-stockfish/Stockfish/pull/3685

Bench: 5600615
2021-08-31 12:56:19 +02:00
VoyagerOne ad357e147a CMH Pruning Tweak
Tweak pruning formula by adding up CMH values.

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 14608 W: 3837 L: 3641 D: 7130
Ptnml(0-2): 27, 1681, 3723, 1815, 58
https://tests.stockfishchess.org/tests/view/612792f362d20cf82b5ad156

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 53520 W: 13580 L: 13276 D: 26664
Ptnml(0-2): 28, 5610, 15183, 5908, 31
https://tests.stockfishchess.org/tests/view/6127d27062d20cf82b5ad191

closes https://github.com/official-stockfish/Stockfish/pull/3682

Bench: 5186641
2021-08-27 21:41:32 +02:00
SFisGOD 69eede7d08 Update default net to nn-33495fe25081.nnue
STC:
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 37368 W: 9621 L: 9391 D: 18356
Ptnml(0-2): 117, 4287, 9664, 4481, 135
https://tests.stockfishchess.org/tests/view/612768165318138ee1204977

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 13328 W: 3446 L: 3246 D: 6636
Ptnml(0-2): 11, 1383, 3682, 1571, 17
https://tests.stockfishchess.org/tests/view/6127dc8d62d20cf82b5ad196

Closes https://github.com/official-stockfish/Stockfish/pull/3679

Bench: 5179347
2021-08-27 07:51:26 +02:00
ppigazzini f30f231cbf Use "pedantic" flag also for mingw
This will avoid to run in fishtest a test where the linux machines exit from
the building process and only the windows machines run the test.

See:
https://tests.stockfishchess.org/tests/view/61122d732a8a49ac5be79996
https://github.com/SFisGOD/Stockfish/commit/4e422577d6ebd1f6ecf606189190b8f6fb03f6c9#comments

closes https://github.com/official-stockfish/Stockfish/pull/3671

No functional change.
2021-08-27 07:49:26 +02:00
Joost VandeVondele af0d82792e Fix empty EvalFile option
some GUIs send an empty string for EvalFile, in that case explicitly try the default name

fixes https://github.com/official-stockfish/Stockfish/issues/3675

closes https://github.com/official-stockfish/Stockfish/pull/3678

No functional change.
2021-08-27 07:48:18 +02:00
bmc4 d754ea50a8 Simplify Declaration on Pawn Move Generation
Removes possible micro-optimization in favor of readability.

STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 75432 W: 5824 L: 5777 D: 63831
Ptnml(0-2): 178, 4648, 28036, 4657, 197
https://tests.stockfishchess.org/tests/view/611fa7f84977aa1525c9cb75

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 41200 W: 1156 L: 1106 D: 38938
Ptnml(0-2): 13, 981, 18562, 1031, 13
https://tests.stockfishchess.org/tests/view/611fcc694977aa1525c9cb9b

Closes https://github.com/official-stockfish/Stockfish/pull/3669

No functional change
2021-08-22 09:15:19 +02:00
SFisGOD 590447d7a1 Update default net to nn-517c4f68b5df.nnue
SPSA: https://tests.stockfishchess.org/tests/view/611cf0da4977aa1525c9ca03
Parameters: 256 net weights and 8 net biases (output layer)
Base net: nn-ac5605a608d6.nnue
New net: nn-517c4f68b5df.nnue

STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 11600 W: 998 L: 851 D: 9751
Ptnml(0-2): 30, 705, 4186, 846, 33
https://tests.stockfishchess.org/tests/view/611f84524977aa1525c9cb5b

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 9360 W: 338 L: 243 D: 8779
Ptnml(0-2): 0, 220, 4151, 303, 6
https://tests.stockfishchess.org/tests/view/611f8c5b4977aa1525c9cb64

closes https://github.com/official-stockfish/Stockfish/pull/3667

Bench: 4844618
2021-08-22 09:09:58 +02:00
candirufish 939ffe454d do more LMR extensions for PV nodes
LMR Pv and depth 6 Extension tweak:

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 52488 W: 1542 L: 1394 D: 49552
Ptnml(0-2): 18, 1253, 23552, 1405, 16
https://tests.stockfishchess.org/tests/view/611e49c34977aa1525c9caa7

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 76216 W: 6000 L: 5784 D: 64432
Ptnml(0-2): 204, 4745, 28006, 4937, 216
https://tests.stockfishchess.org/tests/view/611e0e254977aa1525c9ca89

closes https://github.com/official-stockfish/Stockfish/pull/3666

Bench: 5046381
2021-08-22 09:05:53 +02:00
bmc4 e57d2d9d47 Simplify Null Move Search Reduction
slightly simpler formula for reduction computation.

first round of tests:
STC:
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 15632 W: 1319 L: 1204 D: 13109
Ptnml(0-2): 33, 956, 5733, 1051, 43
https://tests.stockfishchess.org/tests/view/60bd03c7457376eb8bcaa600

LTC:
LLR: 3.37 (-2.94,2.94) <-2.50,0.50>
Total: 86296 W: 2814 L: 2779 D: 80703
Ptnml(0-2): 33, 2500, 38039, 2551, 25
https://tests.stockfishchess.org/tests/view/60bd1ff0457376eb8bcaa653

recent tests:
STC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 23936 W: 1895 L: 1793 D: 20248
Ptnml(0-2): 40, 1470, 8869, 1526, 63
https://tests.stockfishchess.org/tests/view/611f9b7d4977aa1525c9cb6b

LTC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 62568 W: 1750 L: 1713 D: 59105
Ptnml(0-2): 19, 1560, 28085, 1605, 15
https://tests.stockfishchess.org/tests/view/611fa4814977aa1525c9cb71

functional on high depth

closes https://github.com/official-stockfish/Stockfish/pull/3535

Bench: 5375286
2021-08-22 09:00:15 +02:00
Tomasz Sobczyk 18dcf1f097 Optimize and tidy up affine transform code.
The new network caused some issues initially due to the very narrow neuron set between the first two FC layers. Necessary changes were hacked together to make it work. This patch is a mature approach to make the affine transform code faster, more readable, and easier to maintain should the layer sizes change again.

The following changes were made:

* ClippedReLU always produces a multiple of 32 outputs. This is about as good of a solution for AffineTransform's SIMD requirements as it can get without a bigger rewrite.

* All self-contained simd helpers are moved to a separate file (simd.h). Inline asm is utilized to work around GCC's issues with code generation and register assignment. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693, https://godbolt.org/z/da76fY1n7

* AffineTransform has 2 specializations. While it's more lines of code due to the boilerplate, the logic in both is significantly reduced, as these two are impossible to nicely combine into one.
 1) The first specialization is for cases when there's >=128 inputs. It uses a different approach to perform the affine transform and can make full use of AVX512 without any edge cases. Furthermore, it has higher theoretical throughput because less loads are needed in the hot path, requiring only a fixed amount of instructions for horizontal additions at the end, which are amortized by the large number of inputs.
 2) The second specialization is made to handle smaller layers where performance is still necessary but edge cases need to be handled. AVX512 implementation for this was ommited by mistake, a remnant from the temporary implementation for the new... This could be easily reintroduced if needed. A slightly more detailed description of both implementations is in the code.

Overall it should be a minor speedup, as shown on fishtest:

passed STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 51520 W: 4074 L: 3888 D: 43558
Ptnml(0-2): 111, 3136, 19097, 3288, 128

and various tests shown in the pull request

closes https://github.com/official-stockfish/Stockfish/pull/3663

No functional change
2021-08-20 08:50:25 +02:00
Tomasz Sobczyk ccf0239bc4 Improve handling of the debug log file.
Fix handling of empty strings in uci options and reassigning of the log file

Fixes https://github.com/official-stockfish/Stockfish/issues/3650

Closes https://github.com/official-stockfish/Stockfish/pull/3655

No functional change
2021-08-20 07:57:09 +02:00
Torsten Hellwig 1946a67567 Update default net to nn-ac5605a608d6.nnue
This net was created with the nnue-pytorch trainer, it used the previous master net as a starting point.

The training data includes all T60 data (https://drive.google.com/drive/folders/1rzZkgIgw7G5vQMLr2hZNiUXOp7z80613), all T74 data (https://drive.google.com/drive/folders/1aFUv3Ih3-A8Vxw9064Kw_FU4sNhMHZU-) and the wrongNNUE_02_d9.binpack (https://drive.google.com/file/d/1seGNOqcVdvK_vPNq98j-zV3XPE5zWAeq). The Leela data were randomly named and then concatenated. All data was merged into one binpack using interleave_binpacks.py.

python3 train.py \
    ../data/t60_t74_wrong.binpack \
    ../data/t60_t74_wrong.binpack \
    --resume-from-model ../data/nn-e8321e467bf6.pt \
    --gpus 1 \
    --threads 4 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 300 \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=600 \
    --seed $RANDOM \
    --default_root_dir ../output/exp_24

STC:
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 15320 W: 1415 L: 1257 D: 12648
Ptnml(0-2): 50, 1002, 5402, 1152, 54
https://tests.stockfishchess.org/tests/view/611c404a4977aa1525c9c97f

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 9440 W: 345 L: 248 D: 8847
Ptnml(0-2): 3, 222, 4175, 315, 5
https://tests.stockfishchess.org/tests/view/611c6c7d4977aa1525c9c996

LTC with UHO_XXL_+0.90_+1.19.epd:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 6232 W: 1638 L: 1459 D: 3135
Ptnml(0-2): 5, 592, 1744, 769, 6
https://tests.stockfishchess.org/tests/view/611c9b214977aa1525c9c9cb

closes https://github.com/official-stockfish/Stockfish/pull/3664

Bench: 5375286
2021-08-18 09:17:22 +02:00
Joost VandeVondele f10ebc2bdf Regenerate dependencies on code change
fixes https://github.com/official-stockfish/Stockfish/issues/3658

dependencies are now regenerated for each code change, this adds some 1s overhead in compile time, but avoids potential miscompilations or build problems.

closes https://github.com/official-stockfish/Stockfish/pull/3659

No functional change
2021-08-17 21:08:34 +02:00
Tomasz Sobczyk d61d38586e New NNUE architecture and net
Introduces a new NNUE network architecture and associated network parameters

The summary of the changes:

* Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.b40q4rb1w7on.
* The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code.
* The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.6qkocr97fezq

The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit# for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.

The network was trained with a slightly modified version of the pytorch trainer (https://github.com/glinscott/nnue-pytorch); the changes are in https://github.com/glinscott/nnue-pytorch/pull/143

The training utilized 2 datasets.

    dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
    dataset B - as described in https://github.com/official-stockfish/Stockfish/commit/ba01f4b95448bcb324755f4dd2a632a57c6e67bc

The training process was as following:

    train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training).
    convert the .ckpt to .pt
    --resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.

The first training command:

python3 train.py \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=600 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

The second training command:

python3 serialize.py \
    --features=HalfKAv2_hm^ \
    ../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \
    ../nnue-pytorch-training/experiment_$1/base/base.pt

python3 train.py \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=0.8 \
    --max_epochs=600 \
    --resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4

LLR: 2.97 (-2.94,2.94) <-0.50,2.50>
Total: 22480 W: 2434 L: 2251 D: 17795
Ptnml(0-2): 101, 1736, 7410, 1865, 128

LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea

LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9776 W: 442 L: 333 D: 9001
Ptnml(0-2): 5, 295, 4180, 402, 6

closes https://github.com/official-stockfish/Stockfish/pull/3646

bench: 5189338
2021-08-15 12:05:43 +02:00
Joost VandeVondele dabaf2220f Revert futility pruning patches
reverts 09b6d28391 and
dbd7f602d3 that significantly impact mate
finding capabilities. For example on ChestUCI_23102018.epd, at 1M nodes,
the number of mates found is nearly reduced 2x without these depth conditions:

       sf6  2091
       sf7  2093
       sf8  2107
       sf9  2062
      sf10  2208
      sf11  2552
      sf12  2563
      sf13  2509
      sf14  2427
    master  1246
   patched  2467

(script for testing at https://github.com/official-stockfish/Stockfish/files/6936412/matecheck.zip)

closes https://github.com/official-stockfish/Stockfish/pull/3641

fixes https://github.com/official-stockfish/Stockfish/issues/3627

Bench: 5467570
2021-08-05 16:41:07 +02:00
VoyagerOne a1a83f3869 SEE simplification
Simplified SEE formula by removing std::min. Should also be easier to tune.

STC:
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 22656 W: 1836 L: 1729 D: 19091
Ptnml(0-2): 54, 1426, 8267, 1521, 60
https://tests.stockfishchess.org/tests/view/610ae62f2a8a49ac5be79449

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 26248 W: 806 L: 744 D: 24698
Ptnml(0-2): 6, 668, 11715, 728, 7
https://tests.stockfishchess.org/tests/view/610b17ad2a8a49ac5be79466

closes https://github.com/official-stockfish/Stockfish/pull/3643

bench:  4915145
2021-08-05 16:32:07 +02:00
SFisGOD 73ef5b8c4a Update default net to nn-46832cfbead3.nnue
SPSA 1: https://tests.stockfishchess.org/tests/view/6100e7f096b86d98abf6a832
Parameters: A total of 256 net weights and 8 net biases were tuned (output layer)
Base net: nn-56a5f1c4173a.nnue
New net: nn-ec3c8e029926.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/610733caafad2da4f4ae3da7
Parameters: A total of 256 net biases were tuned (hidden layer 2)
Base net: nn-ec3c8e029926.nnue
New net: nn-46832cfbead3.nnue

STC:
LLR: 2.98 (-2.94,2.94) <-0.50,2.50>
Total: 50520 W: 3953 L: 3765 D: 42802
Ptnml(0-2): 138, 3063, 18678, 3235, 146
https://tests.stockfishchess.org/tests/view/610a79692a8a49ac5be793f4

LTC:
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 57256 W: 1723 L: 1566 D: 53967
Ptnml(0-2): 12, 1442, 25568, 1589, 17
https://tests.stockfishchess.org/tests/view/610ac5bb2a8a49ac5be79434

Closes https://github.com/official-stockfish/Stockfish/pull/3642

Bench: 5359314
2021-08-05 08:52:07 +02:00
Stefan Geschwentner 5cd42f6b0b Simplify new cmh pruning thresholds by using directly a quadratic formula.
This decouples also the stat bonus updates from the threshold which creates less dependencies for tuning of stat bonus parameters.
Perhaps a further fine tuning of the now separated coefficients for constHist[0] and constHist[1] could give further gains.

STC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 78384 W: 6134 L: 6090 D: 66160
Ptnml(0-2): 207, 5013, 28705, 5063, 204
https://tests.stockfishchess.org/tests/view/6106d235afad2da4f4ae3d4b

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 38176 W: 1149 L: 1095 D: 35932
Ptnml(0-2): 6, 1000, 17030, 1038, 14
https://tests.stockfishchess.org/tests/view/6107a080afad2da4f4ae3def

closes https://github.com/official-stockfish/Stockfish/pull/3639

Bench: 5098146
2021-08-05 08:47:33 +02:00
VoyagerOne 31ebd918ea Futile pruning simplification
Remove CMH conditions in futile pruning.

STC:
LLR: 2.94 (-2.94,2.94) <-2.50,0.50>
Total: 93520 W: 7165 L: 7138 D: 79217
Ptnml(0-2): 222, 5923, 34427, 5982, 206
https://tests.stockfishchess.org/tests/view/61083104e50a153c346ef8df

LTC:
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 59072 W: 1746 L: 1706 D: 55620
Ptnml(0-2): 13, 1562, 26353, 1588, 20
https://tests.stockfishchess.org/tests/view/610894f2e50a153c346ef913

closes https://github.com/official-stockfish/Stockfish/pull/3638

Bench: 5229673
2021-08-05 08:44:38 +02:00
VoyagerOne a0fca67da4 CMH Pruning Tweak
replace CounterMovePruneThreshold by a depth dependent threshold

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 35512 W: 2718 L: 2552 D: 30242
Ptnml(0-2): 66, 2138, 13194, 2280, 78
https://tests.stockfishchess.org/tests/view/6104442fafad2da4f4ae3b94

LTC:
LLR: 2.96 (-2.94,2.94) <0.50,3.50>
Total: 36536 W: 1150 L: 1019 D: 34367
Ptnml(0-2): 10, 920, 16278, 1049, 11
https://tests.stockfishchess.org/tests/view/6104b033afad2da4f4ae3bbc

closes https://github.com/official-stockfish/Stockfish/pull/3636

Bench: 5848718
2021-07-31 15:29:19 +02:00
Tomasz Sobczyk 26edf9534a Avoid unnecessary stores in the affine transform
This patch improves the codegen in the AffineTransform::forward function for architectures >=SSSE3. Current code works directly on memory and the compiler cannot see that the stores through outptr do not alias the loads through weights and input32. The solution implemented is to perform the affine transform with local variables as accumulators and only store the result to memory at the end. The number of accumulators required is OutputDimensions / OutputSimdWidth, which means that for the 1024->16 affine transform it requires 4 registers with SSSE3, 2 with AVX2, 1 with AVX512. It also cuts the number of stores required by NumRegs * 256 for each node evaluated. The local accumulators are expected to be assigned to registers, but even if this cannot be done in some case due to register pressure it will help the compiler to see that there is no aliasing between the loads and stores and may still result in better codegen.

See https://godbolt.org/z/59aTKbbYc for codegen comparison.

passed STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 140328 W: 10635 L: 10358 D: 119335
Ptnml(0-2): 302, 8339, 52636, 8554, 333

closes https://github.com/official-stockfish/Stockfish/pull/3634

No functional change
2021-07-30 17:15:52 +02:00
SFisGOD e973eee919 Update default net to nn-56a5f1c4173a.nnue
SPSA 1: https://tests.stockfishchess.org/tests/view/60fd24efd8a6b65b2f3a796e
Parameters: A total of 256 net biases were tuned (hidden layer 2)
New best values: Half of the changes from the tuning run
New net: nn-5992d3ba79f3.nnue

SPSA 2: https://tests.stockfishchess.org/tests/view/60fec7d6d8a6b65b2f3a7aa2
Parameters: A total of 128 net biases were tuned (hidden layer 1)
New best values: Half of the changes from the tuning run
New net: nn-56a5f1c4173a.nnue

STC:
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 140392 W: 10863 L: 10578 D: 118951
Ptnml(0-2): 347, 8754, 51718, 9021, 356
https://tests.stockfishchess.org/tests/view/610037e396b86d98abf6a79e

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 14216 W: 454 L: 355 D: 13407
Ptnml(0-2): 4, 323, 6356, 420, 5
https://tests.stockfishchess.org/tests/view/61019995afad2da4f4ae3a3c

Closes #3633

Bench: 4801359
2021-07-29 07:35:13 +02:00
SFisGOD 237ed1ef8f Update default net to nn-26abeed38351.nnue
SPSA: https://tests.stockfishchess.org/tests/view/60fba335d8a6b65b2f3a7891

New best values: Half of the changes from the tuning run.
Setting: nodestime=300 with 10+0.1 (approximate real TC is 2.5 seconds)
The rest is the same as described in #3593

The change from nodestime=600 to 300 was suggested by gekkehenker to prevent time losses for some slow workers
SFisGOD@94cd757#commitcomment-53324840

STC:
LLR: 2.96 (-2.94,2.94) <-0.50,2.50>
Total: 67448 W: 5241 L: 5036 D: 57171
Ptnml(0-2): 151, 4198, 24827, 4391, 157
https://tests.stockfishchess.org/tests/view/60fd50f2d8a6b65b2f3a798e

LTC:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 48752 W: 1504 L: 1358 D: 45890
Ptnml(0-2): 13, 1226, 21754, 1368, 15
https://tests.stockfishchess.org/tests/view/60fd7bb2d8a6b65b2f3a79a9

Closes https://github.com/official-stockfish/Stockfish/pull/3630

Bench:  5124774
2021-07-26 07:52:59 +02:00
Giacomo Lorenzetti 910d26b5c3 Simplification in LMR
This commit removes the `!captureOrPromotion` condition from ttCapture reduction and from good/bad history reduction (similar to #3619).

passed STC:
https://tests.stockfishchess.org/tests/view/60fc734ad8a6b65b2f3a7922
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 48680 W: 3855 L: 3776 D: 41049
Ptnml(0-2): 118, 3145, 17744, 3206, 127

passed LTC:
https://tests.stockfishchess.org/tests/view/60fce7d5d8a6b65b2f3a794c
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 86528 W: 2471 L: 2450 D: 81607
Ptnml(0-2): 28, 2203, 38777, 2232, 24

closes https://github.com/official-stockfish/Stockfish/pull/3629

Bench: 4951406
2021-07-26 07:48:58 +02:00
MichaelB7 b939c80513 Update the default net to nn-76a8a7ffb820.nnue.
combined work by Serio Vieri, Michael Byrne, and Jonathan D (aka SFisGod) based on top of previous developments, by restarts from good nets.

Sergio generated the net https://tests.stockfishchess.org/api/nn/nn-d8609abe8caf.nnue:

The initial net nn-d8609abe8caf.nnue is trained by generating around 16B of training data from the last master net nn-9e3c6298299a.nnue, then trained, continuing from the master net, with lambda=0.2 and sampling ratio of 1. Starting with LR=2e-3, dropping LR with a factor of 0.5 until it reaches LR=5e-4. in_scaling is set to 361. No other significant changes made to the pytorch trainer.

Training data gen command (generates in chunks of 200k positions):

generate_training_data min_depth 9 max_depth 11 count 200000 random_move_count 10 random_move_max_ply 80 random_multi_pv 12 random_multi_pv_diff 100 random_multi_pv_depth 8 write_min_ply 10 eval_limit 1500 book noob_3moves.epd output_file_name gendata/$(date +"%Y%m%d-%H%M")_${HOSTNAME}.binpack

PyTorch trainer command (Note that this only trains for 20 epochs, repeatedly train until convergence):

python train.py --features "HalfKAv2^" --max_epochs 20 --smart-fen-skipping --random-fen-skipping 500 --batch-size 8192 --default_root_dir $dir --seed $RANDOM --threads 4 --num-workers 32 --gpus $gpuids --track_grad_norm 2 --gradient_clip_val 0.05 --lambda 0.2 --log_every_n_steps 50 $resumeopt $data $val

See https://github.com/sergiovieri/Stockfish/tree/tools_mod/rl for the scripts used to generate data.

Based on that Michael generated nn-76a8a7ffb820.nnue in the following way:

The net being submitted was trained with the pytorch trainer: https://github.com/glinscott/nnue-pytorch

python train.py i:/bin/all.binpack i:/bin/all.binpack --gpus 1 --threads 4 --num-workers 30 --batch-size 16384 --progress_bar_refresh_rate 30 --smart-fen-skipping --random-fen-skipping 3 --features=HalfKAv2^ --auto_lr_find True --lambda=1.0 --max_epochs=240 --seed %random%%random% --default_root_dir exp/run_109 --resume-from-model ./pt/nn-d8609abe8caf.pt

This run is thus started from Segio Vieri's net nn-d8609abe8caf.nnue

all.binpack equaled 4 parts Wrong_NNUE_2.binpack https://drive.google.com/file/d/1seGNOqcVdvK_vPNq98j-zV3XPE5zWAeq/view?usp=sharing plus two parts of Training_Data.binpack https://drive.google.com/file/d/1RFkQES3DpsiJqsOtUshENtzPfFgUmEff/view?usp=sharing
Each set was concatenated together - making one large Wrong_NNUE 2 binpack and one large Training so the were approximately equal in size. They were then interleaved together. The idea was to give Wrong_NNUE.binpack closer to equal weighting with the Training_Data binpack

model.py modifications:
loss = torch.pow(torch.abs(p - q), 2.6).mean()
LR = 8.0e-5 calculated as follows: 1.5e-3*(.992^360) - the idea here was to take a highly trained net and just use all.binpack as a finishing micro refinement touch for the last 2 Elo or so. This net was discovered on the 59th epoch.
optimizer = ranger.Ranger(train_params, betas=(.90, 0.999), eps=1.0e-7, gc_loc=False, use_gc=False)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.992)
For this micro optimization, I had set the period to "5" in train.py. This changes the checkpoint output so that every 5th checkpoint file is created

The final touches were to adjust the NNUE scale, as was done by Jonathan in tests running at the same time.

passed LTC
https://tests.stockfishchess.org/tests/view/60fa45aed8a6b65b2f3a77a4
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 53040 W: 1732 L: 1575 D: 49733
Ptnml(0-2): 14, 1432, 23474, 1583, 17

passed STC
https://tests.stockfishchess.org/tests/view/60f9fee2d8a6b65b2f3a7775
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 37928 W: 3178 L: 3001 D: 31749
Ptnml(0-2): 100, 2446, 13695, 2623, 100.

closes https://github.com/official-stockfish/Stockfish/pull/3626

Bench: 5169957
2021-07-24 18:04:59 +02:00
Giacomo Lorenzetti a85928e7ec Apply good/bad history reduction also when inCheck
Main idea is that, in some cases, 'in check' situations are not so different from 'not in check' ones.
Trying to use piece count in order to select only a few 'in check' situations have failed LTC testing.
It could be interesting to apply one of those ideas in other parts of the search function.

passed STC:
https://tests.stockfishchess.org/tests/view/60f1b68dd1189bed71812d40
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 53472 W: 4078 L: 4008 D: 45386
Ptnml(0-2): 127, 3297, 19795, 3413, 104

passed LTC:
https://tests.stockfishchess.org/tests/view/60f291e6d1189bed71812de3
LLR: 2.92 (-2.94,2.94) <-2.50,0.50>
Total: 89712 W: 2651 L: 2632 D: 84429
Ptnml(0-2): 60, 2261, 40188, 2294, 53

closes https://github.com/official-stockfish/Stockfish/pull/3619

Bench: 5185789
2021-07-23 19:02:58 +02:00
pb00067 760b7462bc Simplify lowply-history scoring logic
STC:
https://tests.stockfishchess.org/tests/view/60eee559d1189bed71812b16
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 33976 W: 2523 L: 2431 D: 29022
Ptnml(0-2): 66, 2030, 12730, 2070, 92

LTC:
https://tests.stockfishchess.org/tests/view/60eefa12d1189bed71812b24
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 107240 W: 3053 L: 3046 D: 101141
Ptnml(0-2): 56, 2668, 48154, 2697, 45

closes https://github.com/official-stockfish/Stockfish/pull/3616

bench: 5199177
2021-07-23 18:53:03 +02:00
Vizvezdenec d957179df7 Prune illegal moves in qsearch earlier
The main idea is that illegal moves influencing search or
qsearch obviously can't be any sort of good. The only reason
why initially legality checks for search and qsearch were done
after they actually can influence some heuristics is because
legality check is expensive computationally. Eventually in
search it was moved to the place where it makes sure that
illegal moves can't influence search.

This patch shows that the same can be done for qsearch + it
passed STC with elo-gaining bounds + it removes 3 lines of code
because one no longer needs to increment/decrement movecount
on illegal moves.

passed STC with elo-gaining bounds
https://tests.stockfishchess.org/tests/view/60f20aefd1189bed71812da0
LLR: 2.94 (-2.94,2.94) <-0.50,2.50>
Total: 61512 W: 4688 L: 4492 D: 52332
Ptnml(0-2): 139, 3730, 22848, 3874, 165

The same version functionally but with moving condition ever earlier
passed LTC with simplification bounds.
https://tests.stockfishchess.org/tests/view/60f292cad1189bed71812de9
LLR: 2.98 (-2.94,2.94) <-2.50,0.50>
Total: 60944 W: 1724 L: 1685 D: 57535
Ptnml(0-2): 11, 1556, 27298, 1597, 10

closes https://github.com/official-stockfish/Stockfish/pull/3618

bench 4709569
2021-07-23 18:47:30 +02:00
Liam Keegan bc654257e7 Add macOS and windows to CI
- macOS
  - system clang
  - gcc
- windows / msys2
  - mingw 64-bit gcc
  - mingw 32-bit gcc
- minor code fixes to get new CI jobs to pass
  - code: suppress unused-parameter warning on 32-bit windows
  - Makefile: if arch=any on macos, don't specify arch at all

fixes https://github.com/official-stockfish/Stockfish/issues/2958

closes https://github.com/official-stockfish/Stockfish/pull/3623

No functional change
2021-07-23 18:16:05 +02:00
VoyagerOne 36f8d3806b Don't save excluded move eval in TT
STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 17544 W: 1384 L: 1236 D: 14924
Ptnml(0-2): 37, 1031, 6499, 1157, 48
https://tests.stockfishchess.org/tests/view/60ec8d9bd1189bed71812999

LTC:
LLR: 2.95 (-2.94,2.94) <0.50,3.50>
Total: 26136 W: 823 L: 707 D: 24606
Ptnml(0-2): 6, 643, 11656, 755, 8
https://tests.stockfishchess.org/tests/view/60ecb11ed1189bed718129ba

closes https://github.com/official-stockfish/Stockfish/pull/3614

Bench: 5505251
2021-07-13 17:35:20 +02:00
Vizvezdenec dbd7f602d3 Remove second futility pruning depth limit
This patch removes futility pruning lmrDepth limit for futility pruning at parent nodes.
Since it's already capped by margin that is a function of lmrDepth there is no need to extra cap it with lmrDepth.

passed STC
https://tests.stockfishchess.org/tests/view/60e9b5dfd1189bed71812777
LLR: 2.97 (-2.94,2.94) <-2.50,0.50>
Total: 14872 W: 1264 L: 1145 D: 12463
Ptnml(0-2): 37, 942, 5369, 1041, 47

passed LTC
https://tests.stockfishchess.org/tests/view/60e9c635d1189bed71812790
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 40336 W: 1280 L: 1225 D: 37831
Ptnml(0-2): 24, 1057, 17960, 1094, 33

closes https://github.com/official-stockfish/Stockfish/pull/3612

bench: 5064969
2021-07-13 17:33:20 +02:00
pb00067 f4986f4596 SEE: simplify stm variable initialization
Pull #3458 removed the only usage of pos.see_ge() moving pieces that
don't belong to the side to move, so we can simplify this, adding an assert.

closes https://github.com/official-stockfish/Stockfish/pull/3607

No functional change
2021-07-13 17:31:15 +02:00
Vizvezdenec 09b6d28391 Remove futility pruning depth limit
This patch removes futility pruning depth limit for child node futility pruning.
In current master it was double capped by depth and by futility margin, which is also a function of depth, which didn't make much sense.

passed STC
https://tests.stockfishchess.org/tests/view/60e2418f9ea99d7c2d693e64
LLR: 2.95 (-2.94,2.94) <-2.50,0.50>
Total: 116168 W: 9100 L: 9097 D: 97971
Ptnml(0-2): 319, 7496, 42476, 7449, 344

passed LTC
https://tests.stockfishchess.org/tests/view/60e3374f9ea99d7c2d693f20
LLR: 2.96 (-2.94,2.94) <-2.50,0.50>
Total: 43304 W: 1282 L: 1231 D: 40791
Ptnml(0-2): 8, 1126, 19335, 1173, 10

closes https://github.com/official-stockfish/Stockfish/pull/3606

bench 4965493
2021-07-13 17:23:30 +02:00
SFisGOD 8fc297c506 Update default net to nn-9e3c6298299a.nnue
Optimization of nn-956480d8378f.nnue using SPSA
https://tests.stockfishchess.org/tests/view/60da2bf63beab81350ac9fe7

Same method as described in PR #3593

STC:
LLR: 2.93 (-2.94,2.94) <-0.50,2.50>
Total: 17792 W: 1525 L: 1372 D: 14895
Ptnml(0-2): 28, 1156, 6401, 1257, 54
https://tests.stockfishchess.org/tests/view/60deffc59ea99d7c2d693c19

LTC:
LLR: 2.96 (-2.94,2.94) <0.50,3.50>
Total: 36544 W: 1245 L: 1109 D: 34190
Ptnml(0-2): 12, 988, 16139, 1118, 15
https://tests.stockfishchess.org/tests/view/60df11339ea99d7c2d693c22

closes https://github.com/official-stockfish/Stockfish/pull/3601

Bench: 4687476
2021-07-03 10:03:32 +02:00
Paul Mulders 516ad1c9bf Allow passing RTLIB=compiler-rt to make
Not all linux users will have libatomic installed.
When using clang as the system compiler with compiler-rt as the default
runtime library instead of libgcc, atomic builtins may be provided by compiler-rt.
This change allows such users to pass RTLIB=compiler-rt to make sure
the build doesn't error out on the missing (unnecessary) libatomic.

closes https://github.com/official-stockfish/Stockfish/pull/3597

No functional change
2021-07-03 09:51:03 +02:00
candirufish ec8dfe7315 no cut node reduction for killer moves.
stc:
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 44344 W: 3474 L: 3294 D: 37576
Ptnml(0-2): 117, 2710, 16338, 2890, 117
https://tests.stockfishchess.org/tests/view/60d8ea673beab81350ac9eb8

ltc:
LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 82600 W: 2638 L: 2441 D: 77521
Ptnml(0-2): 38, 2147, 36749, 2312, 54
https://tests.stockfishchess.org/tests/view/60d9048f3beab81350ac9eed

closes https://github.com/official-stockfish/Stockfish/pull/3600

Bench: 5160239
2021-07-03 09:44:05 +02:00
xoto10 d297d1d8a7 Simplify lazy_skip.
Small speedup by removing operations in lazy_skip.

STC 10+0.1 :
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 55088 W: 4553 L: 4482 D: 46053
Ptnml(0-2): 163, 3546, 20045, 3637, 153
https://tests.stockfishchess.org/tests/view/60daa2cb3beab81350aca04d

LTC 60+0.6 :
LLR: 2.93 (-2.94,2.94) <-2.50,0.50>
Total: 46136 W: 1457 L: 1407 D: 43272
Ptnml(0-2): 10, 1282, 20442, 1316, 18
https://tests.stockfishchess.org/tests/view/60db0e753beab81350aca08e

closes https://github.com/official-stockfish/Stockfish/pull/3599

Bench 5122403
2021-07-03 09:26:58 +02:00
Stéphane Nicolet b51b094419 Simplify format_cp_aligned_dot()
closes https://github.com/official-stockfish/Stockfish/pull/3583

No functional change
2021-07-03 09:25:16 +02:00
Joost VandeVondele 7cfc1f9b15 Restore development version
No functional change
2021-07-03 09:20:06 +02:00
25 changed files with 1218 additions and 627 deletions
+79 -3
View File
@@ -25,29 +25,88 @@ jobs:
os: ubuntu-20.04,
compiler: g++,
comp: gcc,
run_expensive_tests: true
run_expensive_tests: true,
run_32bit_tests: true,
run_64bit_tests: true,
shell: 'bash {0}'
}
- {
name: "Ubuntu 20.04 Clang",
os: ubuntu-20.04,
compiler: clang++,
comp: clang,
run_expensive_tests: false
run_expensive_tests: false,
run_32bit_tests: true,
run_64bit_tests: true,
shell: 'bash {0}'
}
- {
name: "MacOS 10.15 Apple Clang",
os: macos-10.15,
compiler: clang++,
comp: clang,
run_expensive_tests: false,
run_32bit_tests: false,
run_64bit_tests: true,
shell: 'bash {0}'
}
- {
name: "MacOS 10.15 GCC 10",
os: macos-10.15,
compiler: g++-10,
comp: gcc,
run_expensive_tests: false,
run_32bit_tests: false,
run_64bit_tests: true,
shell: 'bash {0}'
}
- {
name: "Windows 2019 Mingw-w64 GCC x86_64",
os: windows-2019,
compiler: g++,
comp: gcc,
run_expensive_tests: false,
run_32bit_tests: false,
run_64bit_tests: true,
msys_sys: 'mingw64',
msys_env: 'x86_64',
shell: 'msys2 {0}'
}
- {
name: "Windows 2019 Mingw-w64 GCC i686",
os: windows-2019,
compiler: g++,
comp: gcc,
run_expensive_tests: false,
run_32bit_tests: true,
run_64bit_tests: false,
msys_sys: 'mingw32',
msys_env: 'i686',
shell: 'msys2 {0}'
}
defaults:
run:
working-directory: src
shell: ${{ matrix.config.shell }}
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Download required packages
- name: Download required linux packages
if: runner.os == 'Linux'
run: |
sudo apt update
sudo apt install expect valgrind g++-multilib
- name: Setup msys and install required packages
if: runner.os == 'Windows'
uses: msys2/setup-msys2@v2
with:
msystem: ${{matrix.config.msys_sys}}
install: mingw-w64-${{matrix.config.msys_env}}-gcc make git expect
- name: Download the used network from the fishtest framework
run: |
make net
@@ -68,6 +127,7 @@ jobs:
# x86-32 tests
- name: Test debug x86-32 build
if: ${{ matrix.config.run_32bit_tests }}
run: |
export CXXFLAGS="-Werror -D_GLIBCXX_DEBUG"
make clean
@@ -75,24 +135,28 @@ jobs:
../tests/signature.sh $benchref
- name: Test x86-32 build
if: ${{ matrix.config.run_32bit_tests }}
run: |
make clean
make -j2 ARCH=x86-32 build
../tests/signature.sh $benchref
- name: Test x86-32-sse41-popcnt build
if: ${{ matrix.config.run_32bit_tests }}
run: |
make clean
make -j2 ARCH=x86-32-sse41-popcnt build
../tests/signature.sh $benchref
- name: Test x86-32-sse2 build
if: ${{ matrix.config.run_32bit_tests }}
run: |
make clean
make -j2 ARCH=x86-32-sse2 build
../tests/signature.sh $benchref
- name: Test general-32 build
if: ${{ matrix.config.run_32bit_tests }}
run: |
make clean
make -j2 ARCH=general-32 build
@@ -101,6 +165,7 @@ jobs:
# x86-64 tests
- name: Test debug x86-64-modern build
if: ${{ matrix.config.run_64bit_tests }}
run: |
export CXXFLAGS="-Werror -D_GLIBCXX_DEBUG"
make clean
@@ -108,30 +173,35 @@ jobs:
../tests/signature.sh $benchref
- name: Test x86-64-modern build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-modern build
../tests/signature.sh $benchref
- name: Test x86-64-ssse3 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-ssse3 build
../tests/signature.sh $benchref
- name: Test x86-64-sse3-popcnt build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-sse3-popcnt build
../tests/signature.sh $benchref
- name: Test x86-64 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64 build
../tests/signature.sh $benchref
- name: Test general-64 build
if: matrix.config.run_64bit_tests
run: |
make clean
make -j2 ARCH=general-64 build
@@ -140,26 +210,31 @@ jobs:
# x86-64 with newer extensions tests
- name: Compile x86-64-avx2 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-avx2 build
- name: Compile x86-64-bmi2 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-bmi2 build
- name: Compile x86-64-avx512 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-avx512 build
- name: Compile x86-64-vnni512 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-vnni512 build
- name: Compile x86-64-vnni256 build
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-vnni256 build
@@ -167,6 +242,7 @@ jobs:
# Other tests
- name: Check perft and search reproducibility
if: ${{ matrix.config.run_64bit_tests }}
run: |
make clean
make -j2 ARCH=x86-64-modern build
+6 -1
View File
@@ -1,4 +1,4 @@
# List of authors for Stockfish, as of June 14, 2021
# List of authors for Stockfish
# Founders of the Stockfish project and fishtest infrastructure
Tord Romstad (romstad)
@@ -21,6 +21,7 @@ Alexander Kure
Alexander Pagel (Lolligerhans)
Alfredo Menezes (lonfom169)
Ali AlZhrani (Cooffe)
Andrei Vetrov (proukornew)
Andrew Grant (AndyGrant)
Andrey Neporada (nepal)
Andy Duplain
@@ -69,6 +70,7 @@ gamander
Gary Heckman (gheckman)
George Sobala (gsobala)
gguliash
Giacomo Lorenzetti (G-Lorenz)
Gian-Carlo Pascutto (gcp)
Gontran Lemaire (gonlem)
Goodkov Vasiliy Aleksandrovich (goodkov)
@@ -107,6 +109,7 @@ Kojirion
Krystian Kuzniarek (kuzkry)
Leonardo Ljubičić (ICCF World Champion)
Leonid Pechenik (lp--)
Liam Keegan (lkeegan)
Linus Arver (listx)
loco-loco
Lub van den Berg (ElbertoOne)
@@ -141,6 +144,7 @@ Nikolay Kostov (NikolayIT)
Nguyen Pham (nguyenpham)
Norman Schmidt (FireFather)
notruck
Ofek Shochat (OfekShochat, ghostway)
Ondrej Mosnáček (WOnder93)
Oskar Werkelin Ahlin
Pablo Vazquez
@@ -184,6 +188,7 @@ Tom Truscott
Tom Vijlbrief (tomtor)
Tomasz Sobczyk (Sopel97)
Torsten Franz (torfranz, tfranzer)
Torsten Hellwig (Torom)
Tracey Emery (basepr1me)
tttak
Unai Corzo (unaiic)
+14 -8
View File
@@ -41,7 +41,7 @@ endif
SRCS = benchmark.cpp bitbase.cpp bitboard.cpp endgame.cpp evaluate.cpp main.cpp \
material.cpp misc.cpp movegen.cpp movepick.cpp pawns.cpp position.cpp psqt.cpp \
search.cpp thread.cpp timeman.cpp tt.cpp uci.cpp ucioption.cpp tune.cpp syzygy/tbprobe.cpp \
nnue/evaluate_nnue.cpp nnue/features/half_ka_v2.cpp
nnue/evaluate_nnue.cpp nnue/features/half_ka_v2_hm.cpp
OBJS = $(notdir $(SRCS:.cpp=.o))
@@ -88,7 +88,7 @@ endif
# at the end of the line for flag values.
#
# Example of use for these flags:
# make build ARCH=x86-64-avx512 debug=on sanitize="address undefined"
# make build ARCH=x86-64-avx512 debug=yes sanitize="address undefined"
### 2.1. General and architecture defaults
@@ -368,7 +368,7 @@ ifeq ($(COMP),mingw)
CXX=g++
endif
CXXFLAGS += -Wextra -Wshadow
CXXFLAGS += -pedantic -Wextra -Wshadow
LDFLAGS += -static
endif
@@ -386,10 +386,12 @@ ifeq ($(COMP),clang)
ifneq ($(KERNEL),Darwin)
ifneq ($(KERNEL),OpenBSD)
ifneq ($(KERNEL),FreeBSD)
ifneq ($(RTLIB),compiler-rt)
LDFLAGS += -latomic
endif
endif
endif
endif
ifeq ($(arch),$(filter $(arch),armv7 armv8))
ifeq ($(OS),Android)
@@ -403,8 +405,12 @@ ifeq ($(COMP),clang)
endif
ifeq ($(KERNEL),Darwin)
CXXFLAGS += -arch $(arch) -mmacosx-version-min=10.14
LDFLAGS += -arch $(arch) -mmacosx-version-min=10.14
CXXFLAGS += -mmacosx-version-min=10.14
LDFLAGS += -mmacosx-version-min=10.14
ifneq ($(arch),any)
CXXFLAGS += -arch $(arch)
LDFLAGS += -arch $(arch)
endif
XCRUN = xcrun
endif
@@ -511,7 +517,7 @@ ifeq ($(bits),64)
CXXFLAGS += -DIS_64BIT
endif
### 3.5 prefetch
### 3.5 prefetch and popcount
ifeq ($(prefetch),yes)
ifeq ($(sse),yes)
CXXFLAGS += -msse
@@ -520,7 +526,6 @@ else
CXXFLAGS += -DNO_PREFETCH
endif
### 3.6 popcnt
ifeq ($(popcnt),yes)
ifeq ($(arch),$(filter $(arch),ppc64 armv7 armv8 arm64))
CXXFLAGS += -DUSE_POPCNT
@@ -531,6 +536,7 @@ ifeq ($(popcnt),yes)
endif
endif
### 3.6 SIMD architectures
ifeq ($(avx2),yes)
CXXFLAGS += -DUSE_AVX2
ifeq ($(comp),$(filter $(comp),gcc clang mingw))
@@ -906,7 +912,7 @@ icc-profile-use:
EXTRACXXFLAGS='-prof_use -prof_dir ./profdir' \
all
.depend:
.depend: $(SRCS)
-@$(CXX) $(DEPENDFLAGS) -MM $(SRCS) > $@ 2> /dev/null
-include .depend
+31 -44
View File
@@ -61,7 +61,7 @@ namespace Stockfish {
namespace Eval {
bool useNNUE;
string eval_file_loaded = "None";
string currentEvalFileName = "None";
/// NNUE::init() tries to load a NNUE network at startup time, or when the engine
/// receives a UCI command "setoption name EvalFile value nn-[a-z0-9]{12}.nnue"
@@ -78,6 +78,8 @@ namespace Eval {
return;
string eval_file = string(Options["EvalFile"]);
if (eval_file.empty())
eval_file = EvalFileDefaultName;
#if defined(DEFAULT_NNUE_DIRECTORY)
#define stringify2(x) #x
@@ -88,13 +90,13 @@ namespace Eval {
#endif
for (string directory : dirs)
if (eval_file_loaded != eval_file)
if (currentEvalFileName != eval_file)
{
if (directory != "<internal>")
{
ifstream stream(directory + eval_file, ios::binary);
if (load_eval(eval_file, stream))
eval_file_loaded = eval_file;
currentEvalFileName = eval_file;
}
if (directory == "<internal>" && eval_file == EvalFileDefaultName)
@@ -109,7 +111,7 @@ namespace Eval {
istream stream(&buffer);
if (load_eval(eval_file, stream))
eval_file_loaded = eval_file;
currentEvalFileName = eval_file;
}
}
}
@@ -118,16 +120,16 @@ namespace Eval {
void NNUE::verify() {
string eval_file = string(Options["EvalFile"]);
if (eval_file.empty())
eval_file = EvalFileDefaultName;
if (useNNUE && eval_file_loaded != eval_file)
if (useNNUE && currentEvalFileName != eval_file)
{
UCI::OptionsMap defaults;
UCI::init(defaults);
string msg1 = "If the UCI option \"Use NNUE\" is set to true, network evaluation parameters compatible with the engine must be available.";
string msg2 = "The option is set to true, but the network file " + eval_file + " was not loaded successfully.";
string msg3 = "The UCI option EvalFile might need to specify the full path, including the directory name, to the network file.";
string msg4 = "The default net can be downloaded from: https://tests.stockfishchess.org/api/nn/" + string(defaults["EvalFile"]);
string msg4 = "The default net can be downloaded from: https://tests.stockfishchess.org/api/nn/" + std::string(EvalFileDefaultName);
string msg5 = "The engine will be terminated now.";
sync_cout << "info string ERROR: " << msg1 << sync_endl;
@@ -190,8 +192,8 @@ using namespace Trace;
namespace {
// Threshold for lazy and space evaluation
constexpr Value LazyThreshold1 = Value(1565);
constexpr Value LazyThreshold2 = Value(1102);
constexpr Value LazyThreshold1 = Value(3130);
constexpr Value LazyThreshold2 = Value(2204);
constexpr Value SpaceThreshold = Value(11551);
// KingAttackWeights[PieceType] contains king attack weights by piece type
@@ -986,7 +988,7 @@ namespace {
// Early exit if score is high
auto lazy_skip = [&](Value lazyThreshold) {
return abs(mg_value(score) + eg_value(score)) / 2 > lazyThreshold + pos.non_pawn_material() / 64;
return abs(mg_value(score) + eg_value(score)) > lazyThreshold + pos.non_pawn_material() / 32;
};
if (lazy_skip(LazyThreshold1))
@@ -1051,26 +1053,22 @@ make_v:
if ( pos.piece_on(SQ_A1) == W_BISHOP
&& pos.piece_on(SQ_B2) == W_PAWN)
correction += !pos.empty(SQ_B3) ? -CorneredBishop * 4
: -CorneredBishop * 3;
correction -= CorneredBishop;
if ( pos.piece_on(SQ_H1) == W_BISHOP
&& pos.piece_on(SQ_G2) == W_PAWN)
correction += !pos.empty(SQ_G3) ? -CorneredBishop * 4
: -CorneredBishop * 3;
correction -= CorneredBishop;
if ( pos.piece_on(SQ_A8) == B_BISHOP
&& pos.piece_on(SQ_B7) == B_PAWN)
correction += !pos.empty(SQ_B6) ? CorneredBishop * 4
: CorneredBishop * 3;
correction += CorneredBishop;
if ( pos.piece_on(SQ_H8) == B_BISHOP
&& pos.piece_on(SQ_G7) == B_PAWN)
correction += !pos.empty(SQ_G6) ? CorneredBishop * 4
: CorneredBishop * 3;
correction += CorneredBishop;
return pos.side_to_move() == WHITE ? Value(correction)
: -Value(correction);
return pos.side_to_move() == WHITE ? Value(5 * correction)
: -Value(5 * correction);
}
} // namespace Eval
@@ -1083,33 +1081,22 @@ Value Eval::evaluate(const Position& pos) {
Value v;
if (!Eval::useNNUE)
v = Evaluation<NO_TRACE>(pos).value();
// Deciding between classical and NNUE eval: for high PSQ imbalance we use classical,
// but we switch to NNUE during long shuffling or with high material on the board.
if ( !useNNUE
|| abs(eg_value(pos.psq_score())) * 5 > (850 + pos.non_pawn_material() / 64) * (5 + pos.rule50_count()))
v = Evaluation<NO_TRACE>(pos).value(); // classical
else
{
// Scale and shift NNUE for compatibility with search and classical evaluation
auto adjusted_NNUE = [&]()
{
int scale = 903
+ 32 * pos.count<PAWN>()
+ 32 * pos.non_pawn_material() / 1024;
int scale = 883
+ 32 * pos.count<PAWN>()
+ 32 * pos.non_pawn_material() / 1024;
Value nnue = NNUE::evaluate(pos, true) * scale / 1024;
v = NNUE::evaluate(pos, true) * scale / 1024; // NNUE
if (pos.is_chess960())
nnue += fix_FRC(pos);
return nnue;
};
// If there is PSQ imbalance we use the classical eval, but we switch to
// NNUE eval faster when shuffling or if the material on the board is high.
int r50 = pos.rule50_count();
Value psq = Value(abs(eg_value(pos.psq_score())));
bool classical = psq * 5 > (750 + pos.non_pawn_material() / 64) * (5 + r50);
v = classical ? Evaluation<NO_TRACE>(pos).value() // classical
: adjusted_NNUE(); // NNUE
if (pos.is_chess960())
v += fix_FRC(pos);
}
// Damp down the evaluation linearly when shuffling
+2 -2
View File
@@ -34,12 +34,12 @@ namespace Eval {
Value evaluate(const Position& pos);
extern bool useNNUE;
extern std::string eval_file_loaded;
extern std::string currentEvalFileName;
// The default net name MUST follow the format nn-[SHA256 first 12 digits].nnue
// for the build process (profile-build and fishtest) to work. Do not change the
// name of the macro, as it is used in the Makefile.
#define EvalFileDefaultName "nn-3475407dc199.nnue"
#define EvalFileDefaultName "nn-13406b1dcbe0.nnue"
namespace NNUE {
+10 -8
View File
@@ -67,7 +67,7 @@ namespace {
/// Version number. If Version is left empty, then compile date in the format
/// DD-MM-YY and show in engine_info.
const string Version = "14";
const string Version = "14.1";
/// Our fancy logging facility. The trick here is to replace cin.rdbuf() and
/// cout.rdbuf() with two Tie objects that tie cin and cout to a file stream. We
@@ -110,7 +110,14 @@ public:
static Logger l;
if (!fname.empty() && !l.file.is_open())
if (l.file.is_open())
{
cout.rdbuf(l.out.buf);
cin.rdbuf(l.in.buf);
l.file.close();
}
if (!fname.empty())
{
l.file.open(fname, ifstream::out);
@@ -123,12 +130,6 @@ public:
cin.rdbuf(&l.in);
cout.rdbuf(&l.out);
}
else if (fname.empty() && l.file.is_open())
{
cout.rdbuf(l.out.buf);
cin.rdbuf(l.in.buf);
l.file.close();
}
}
};
@@ -378,6 +379,7 @@ void std_aligned_free(void* ptr) {
static void* aligned_large_pages_alloc_windows(size_t allocSize) {
#if !defined(_WIN64)
(void)allocSize; // suppress unused-parameter compiler warning
return nullptr;
#else
+23 -13
View File
@@ -85,19 +85,30 @@ static inline const union { uint32_t i; char c[4]; } Le = { 0x01020304 };
static inline const bool IsLittleEndian = (Le.c[0] == 4);
template <typename T>
class ValueListInserter {
public:
ValueListInserter(T* v, std::size_t& s) :
values(v),
size(&s)
{
}
// RunningAverage : a class to calculate a running average of a series of values.
// For efficiency, all computations are done with integers.
class RunningAverage {
public:
void push_back(const T& value) { values[(*size)++] = value; }
private:
T* values;
std::size_t* size;
// Constructor
RunningAverage() {}
// Reset the running average to rational value p / q
void set(int64_t p, int64_t q)
{ average = p * PERIOD * RESOLUTION / q; }
// Update average with value v
void update(int64_t v)
{ average = RESOLUTION * v + (PERIOD - 1) * average / PERIOD; }
// Test if average is strictly greater than rational a / b
bool is_greater(int64_t a, int64_t b)
{ return b * average > a * PERIOD * RESOLUTION ; }
private :
static constexpr int64_t PERIOD = 4096;
static constexpr int64_t RESOLUTION = 1024;
int64_t average;
};
template <typename T, std::size_t MaxSize>
@@ -113,7 +124,6 @@ public:
const T& operator[](std::size_t index) const { return values_[index]; }
const T* begin() const { return values_; }
const T* end() const { return values_ + size_; }
operator ValueListInserter<T>() { return ValueListInserter(values_, size_); }
void swap(ValueList& other) {
const std::size_t maxSize = std::max(size_, other.size_);
+3 -3
View File
@@ -52,9 +52,9 @@ namespace {
constexpr Direction UpRight = (Us == WHITE ? NORTH_EAST : SOUTH_WEST);
constexpr Direction UpLeft = (Us == WHITE ? NORTH_WEST : SOUTH_EAST);
const Bitboard emptySquares = Type == QUIETS || Type == QUIET_CHECKS ? target : ~pos.pieces();
const Bitboard enemies = Type == EVASIONS ? pos.checkers()
: Type == CAPTURES ? target : pos.pieces(Them);
const Bitboard emptySquares = ~pos.pieces();
const Bitboard enemies = Type == EVASIONS ? pos.checkers()
: pos.pieces(Them);
Bitboard pawnsOn7 = pos.pieces(Us, PAWN) & TRank7BB;
Bitboard pawnsNotOn7 = pos.pieces(Us, PAWN) & ~TRank7BB;
+1 -1
View File
@@ -111,7 +111,7 @@ void MovePicker::score() {
+ (*continuationHistory[1])[pos.moved_piece(m)][to_sq(m)]
+ (*continuationHistory[3])[pos.moved_piece(m)][to_sq(m)]
+ (*continuationHistory[5])[pos.moved_piece(m)][to_sq(m)]
+ (ply < MAX_LPH ? std::min(4, depth / 3) * (*lowPlyHistory)[ply][from_to(m)] : 0);
+ (ply < MAX_LPH ? 6 * (*lowPlyHistory)[ply][from_to(m)] : 0);
else // Type == EVASIONS
{
+3 -3
View File
@@ -86,11 +86,11 @@ enum StatsType { NoCaptures, Captures };
/// unsuccessful during the current search, and is used for reduction and move
/// ordering decisions. It uses 2 tables (one for each color) indexed by
/// the move's from and to squares, see www.chessprogramming.org/Butterfly_Boards
typedef Stats<int16_t, 13365, COLOR_NB, int(SQUARE_NB) * int(SQUARE_NB)> ButterflyHistory;
typedef Stats<int16_t, 14365, COLOR_NB, int(SQUARE_NB) * int(SQUARE_NB)> ButterflyHistory;
/// At higher depths LowPlyHistory records successful quiet moves near the root
/// and quiet moves which are/were in the PV (ttPv). It is cleared with each new
/// search and filled during iterative deepening.
/// and quiet moves which are/were in the PV (ttPv). LowPlyHistory is populated during
/// iterative deepening and at each new search the data is shifted down by 2 plies
constexpr int MAX_LPH = 4;
typedef Stats<int16_t, 10692, MAX_LPH, int(SQUARE_NB) * int(SQUARE_NB)> LowPlyHistory;
+30 -58
View File
@@ -143,6 +143,7 @@ namespace Stockfish::Eval::NNUE {
// overaligning stack variables with alignas() doesn't work correctly.
constexpr uint64_t alignment = CacheLineSize;
int delta = 7;
#if defined(ALIGNAS_ON_STACK_VARIABLES_BROKEN)
TransformedFeatureType transformedFeaturesUnaligned[
@@ -162,20 +163,14 @@ namespace Stockfish::Eval::NNUE {
const std::size_t bucket = (pos.count<ALL_PIECES>() - 1) / 4;
const auto psqt = featureTransformer->transform(pos, transformedFeatures, bucket);
const auto output = network[bucket]->propagate(transformedFeatures, buffer);
const auto positional = network[bucket]->propagate(transformedFeatures, buffer)[0];
int materialist = psqt;
int positional = output[0];
int delta_npm = abs(pos.non_pawn_material(WHITE) - pos.non_pawn_material(BLACK));
int entertainment = (adjusted && delta_npm <= BishopValueMg - KnightValueMg ? 7 : 0);
int A = 128 - entertainment;
int B = 128 + entertainment;
int sum = (A * materialist + B * positional) / 128;
return static_cast<Value>( sum / OutputScale );
// Give more value to positional evaluation when material is balanced
if ( adjusted
&& abs(pos.non_pawn_material(WHITE) - pos.non_pawn_material(BLACK)) <= RookValueMg - BishopValueMg)
return static_cast<Value>(((128 - delta) * psqt + (128 + delta) * positional) / 128 / OutputScale);
else
return static_cast<Value>((psqt + positional) / OutputScale);
}
struct NnueEvalTrace {
@@ -227,69 +222,46 @@ namespace Stockfish::Eval::NNUE {
static const std::string PieceToChar(" PNBRQK pnbrqk");
// Requires the buffer to have capacity for at least 5 values
// format_cp_compact() converts a Value into (centi)pawns and writes it in a buffer.
// The buffer must have capacity for at least 5 chars.
static void format_cp_compact(Value v, char* buffer) {
buffer[0] = (v < 0 ? '-' : v > 0 ? '+' : ' ');
int cp = std::abs(100 * v / PawnValueEg);
if (cp >= 10000)
{
buffer[1] = '0' + cp / 10000; cp %= 10000;
buffer[2] = '0' + cp / 1000; cp %= 1000;
buffer[3] = '0' + cp / 100; cp %= 100;
buffer[4] = ' ';
buffer[1] = '0' + cp / 10000; cp %= 10000;
buffer[2] = '0' + cp / 1000; cp %= 1000;
buffer[3] = '0' + cp / 100; cp %= 100;
buffer[4] = ' ';
}
else if (cp >= 1000)
{
buffer[1] = '0' + cp / 1000; cp %= 1000;
buffer[2] = '0' + cp / 100; cp %= 100;
buffer[3] = '.';
buffer[4] = '0' + cp / 10;
buffer[1] = '0' + cp / 1000; cp %= 1000;
buffer[2] = '0' + cp / 100; cp %= 100;
buffer[3] = '.';
buffer[4] = '0' + cp / 10;
}
else
{
buffer[1] = '0' + cp / 100; cp %= 100;
buffer[2] = '.';
buffer[3] = '0' + cp / 10; cp %= 10;
buffer[4] = '0' + cp / 1;
buffer[1] = '0' + cp / 100; cp %= 100;
buffer[2] = '.';
buffer[3] = '0' + cp / 10; cp %= 10;
buffer[4] = '0' + cp / 1;
}
}
// Requires the buffer to have capacity for at least 7 values
// format_cp_aligned_dot() converts a Value into (centi)pawns and writes it in a buffer,
// always keeping two decimals. The buffer must have capacity for at least 7 chars.
static void format_cp_aligned_dot(Value v, char* buffer) {
buffer[0] = (v < 0 ? '-' : v > 0 ? '+' : ' ');
int cp = std::abs(100 * v / PawnValueEg);
if (cp >= 10000)
{
buffer[1] = '0' + cp / 10000; cp %= 10000;
buffer[2] = '0' + cp / 1000; cp %= 1000;
buffer[3] = '0' + cp / 100; cp %= 100;
buffer[4] = '.';
buffer[5] = '0' + cp / 10; cp %= 10;
buffer[6] = '0' + cp;
}
else if (cp >= 1000)
{
buffer[1] = ' ';
buffer[2] = '0' + cp / 1000; cp %= 1000;
buffer[3] = '0' + cp / 100; cp %= 100;
buffer[4] = '.';
buffer[5] = '0' + cp / 10; cp %= 10;
buffer[6] = '0' + cp;
}
else
{
buffer[1] = ' ';
buffer[2] = ' ';
buffer[3] = '0' + cp / 100; cp %= 100;
buffer[4] = '.';
buffer[5] = '0' + cp / 10; cp %= 10;
buffer[6] = '0' + cp / 1;
}
double cp = 1.0 * std::abs(int(v)) / PawnValueEg;
sprintf(&buffer[1], "%6.2f", cp);
}
@@ -419,7 +391,7 @@ namespace Stockfish::Eval::NNUE {
actualFilename = filename.value();
else
{
if (eval_file_loaded != EvalFileDefaultName)
if (currentEvalFileName != EvalFileDefaultName)
{
msg = "Failed to export a net. A non-embedded net can only be saved if the filename is specified";
@@ -16,31 +16,32 @@
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
//Definition of input features HalfKAv2 of NNUE evaluation function
//Definition of input features HalfKAv2_hm of NNUE evaluation function
#include "half_ka_v2.h"
#include "half_ka_v2_hm.h"
#include "../../position.h"
namespace Stockfish::Eval::NNUE::Features {
// Orient a square according to perspective (rotates by 180 for black)
inline Square HalfKAv2::orient(Color perspective, Square s) {
return Square(int(s) ^ (bool(perspective) * 56));
inline Square HalfKAv2_hm::orient(Color perspective, Square s, Square ksq) {
return Square(int(s) ^ (bool(perspective) * SQ_A8) ^ ((file_of(ksq) < FILE_E) * SQ_H1));
}
// Index of a feature for a given king position and another piece on some square
inline IndexType HalfKAv2::make_index(Color perspective, Square s, Piece pc, Square ksq) {
return IndexType(orient(perspective, s) + PieceSquareIndex[perspective][pc] + PS_NB * ksq);
inline IndexType HalfKAv2_hm::make_index(Color perspective, Square s, Piece pc, Square ksq) {
Square o_ksq = orient(perspective, ksq, ksq);
return IndexType(orient(perspective, s, ksq) + PieceSquareIndex[perspective][pc] + PS_NB * KingBuckets[o_ksq]);
}
// Get a list of indices for active features
void HalfKAv2::append_active_indices(
void HalfKAv2_hm::append_active_indices(
const Position& pos,
Color perspective,
ValueListInserter<IndexType> active
IndexList& active
) {
Square ksq = orient(perspective, pos.square<KING>(perspective));
Square ksq = pos.square<KING>(perspective);
Bitboard bb = pos.pieces();
while (bb)
{
@@ -52,33 +53,30 @@ namespace Stockfish::Eval::NNUE::Features {
// append_changed_indices() : get a list of indices for recently changed features
void HalfKAv2::append_changed_indices(
void HalfKAv2_hm::append_changed_indices(
Square ksq,
StateInfo* st,
const DirtyPiece& dp,
Color perspective,
ValueListInserter<IndexType> removed,
ValueListInserter<IndexType> added
IndexList& removed,
IndexList& added
) {
const auto& dp = st->dirtyPiece;
Square oriented_ksq = orient(perspective, ksq);
for (int i = 0; i < dp.dirty_num; ++i) {
Piece pc = dp.piece[i];
if (dp.from[i] != SQ_NONE)
removed.push_back(make_index(perspective, dp.from[i], pc, oriented_ksq));
removed.push_back(make_index(perspective, dp.from[i], dp.piece[i], ksq));
if (dp.to[i] != SQ_NONE)
added.push_back(make_index(perspective, dp.to[i], pc, oriented_ksq));
added.push_back(make_index(perspective, dp.to[i], dp.piece[i], ksq));
}
}
int HalfKAv2::update_cost(StateInfo* st) {
int HalfKAv2_hm::update_cost(const StateInfo* st) {
return st->dirtyPiece.dirty_num;
}
int HalfKAv2::refresh_cost(const Position& pos) {
int HalfKAv2_hm::refresh_cost(const Position& pos) {
return pos.count<ALL_PIECES>();
}
bool HalfKAv2::requires_refresh(StateInfo* st, Color perspective) {
bool HalfKAv2_hm::requires_refresh(const StateInfo* st, Color perspective) {
return st->dirtyPiece.piece[0] == make_piece(perspective, KING);
}
@@ -18,8 +18,8 @@
//Definition of input features HalfKP of NNUE evaluation function
#ifndef NNUE_FEATURES_HALF_KA_V2_H_INCLUDED
#define NNUE_FEATURES_HALF_KA_V2_H_INCLUDED
#ifndef NNUE_FEATURES_HALF_KA_V2_HM_H_INCLUDED
#define NNUE_FEATURES_HALF_KA_V2_HM_H_INCLUDED
#include "../nnue_common.h"
@@ -32,9 +32,9 @@ namespace Stockfish {
namespace Stockfish::Eval::NNUE::Features {
// Feature HalfKAv2: Combination of the position of own king
// and the position of pieces
class HalfKAv2 {
// Feature HalfKAv2_hm: Combination of the position of own king
// and the position of pieces. Position mirrored such that king always on e..h files.
class HalfKAv2_hm {
// unique number for each piece type on each square
enum {
@@ -50,7 +50,7 @@ namespace Stockfish::Eval::NNUE::Features {
PS_W_QUEEN = 8 * SQUARE_NB,
PS_B_QUEEN = 9 * SQUARE_NB,
PS_KING = 10 * SQUARE_NB,
PS_NB = 11 * SQUARE_NB
PS_NB = 11 * SQUARE_NB
};
static constexpr IndexType PieceSquareIndex[COLOR_NB][PIECE_NB] = {
@@ -63,49 +63,62 @@ namespace Stockfish::Eval::NNUE::Features {
};
// Orient a square according to perspective (rotates by 180 for black)
static Square orient(Color perspective, Square s);
static Square orient(Color perspective, Square s, Square ksq);
// Index of a feature for a given king position and another piece on some square
static IndexType make_index(Color perspective, Square s, Piece pc, Square ksq);
public:
// Feature name
static constexpr const char* Name = "HalfKAv2(Friend)";
static constexpr const char* Name = "HalfKAv2_hm(Friend)";
// Hash value embedded in the evaluation file
static constexpr std::uint32_t HashValue = 0x5f234cb8u;
static constexpr std::uint32_t HashValue = 0x7f234cb8u;
// Number of feature dimensions
static constexpr IndexType Dimensions =
static_cast<IndexType>(SQUARE_NB) * static_cast<IndexType>(PS_NB);
static_cast<IndexType>(SQUARE_NB) * static_cast<IndexType>(PS_NB) / 2;
static constexpr int KingBuckets[64] = {
-1, -1, -1, -1, 31, 30, 29, 28,
-1, -1, -1, -1, 27, 26, 25, 24,
-1, -1, -1, -1, 23, 22, 21, 20,
-1, -1, -1, -1, 19, 18, 17, 16,
-1, -1, -1, -1, 15, 14, 13, 12,
-1, -1, -1, -1, 11, 10, 9, 8,
-1, -1, -1, -1, 7, 6, 5, 4,
-1, -1, -1, -1, 3, 2, 1, 0
};
// Maximum number of simultaneously active features.
static constexpr IndexType MaxActiveDimensions = 32;
using IndexList = ValueList<IndexType, MaxActiveDimensions>;
// Get a list of indices for active features
static void append_active_indices(
const Position& pos,
Color perspective,
ValueListInserter<IndexType> active);
IndexList& active);
// Get a list of indices for recently changed features
static void append_changed_indices(
Square ksq,
StateInfo* st,
const DirtyPiece& dp,
Color perspective,
ValueListInserter<IndexType> removed,
ValueListInserter<IndexType> added);
IndexList& removed,
IndexList& added
);
// Returns the cost of updating one perspective, the most costly one.
// Assumes no refresh needed.
static int update_cost(StateInfo* st);
static int update_cost(const StateInfo* st);
static int refresh_cost(const Position& pos);
// Returns whether the change stored in this StateInfo means that
// a full accumulator refresh is required.
static bool requires_refresh(StateInfo* st, Color perspective);
static bool requires_refresh(const StateInfo* st, Color perspective);
};
} // namespace Stockfish::Eval::NNUE::Features
#endif // #ifndef NNUE_FEATURES_HALF_KA_V2_H_INCLUDED
#endif // #ifndef NNUE_FEATURES_HALF_KA_V2_HM_H_INCLUDED
+427 -308
View File
@@ -22,13 +22,357 @@
#define NNUE_LAYERS_AFFINE_TRANSFORM_H_INCLUDED
#include <iostream>
#include <algorithm>
#include <type_traits>
#include "../nnue_common.h"
#include "../../simd.h"
/*
This file contains the definition for a fully connected layer (aka affine transform).
Two approaches are employed, depending on the sizes of the transform.
Approach 1:
- used when the PaddedInputDimensions >= 128
- uses AVX512 if possible
- processes inputs in batches of 2*InputSimdWidth
- so in batches of 128 for AVX512
- the weight blocks of size InputSimdWidth are transposed such that
access is sequential
- N columns of the weight matrix are processed a time, where N
depends on the architecture (the amount of registers)
- accumulate + hadd is used
Approach 2:
- used when the PaddedInputDimensions < 128
- does not use AVX512
- expected use-case is for when PaddedInputDimensions == 32 and InputDimensions <= 32.
- that's why AVX512 is hard to implement
- expected use-case is small layers
- not optimized as well as the approach 1
- inputs are processed in chunks of 4, weights are respectively transposed
- accumulation happens directly to int32s
*/
namespace Stockfish::Eval::NNUE::Layers {
// Affine transformation layer
// Fallback implementation for older/other architectures.
// Identical for both approaches. Requires the input to be padded to at least 16 values.
#if !defined(USE_SSSE3)
template <IndexType InputDimensions, IndexType PaddedInputDimensions, IndexType OutputDimensions>
static void affine_transform_non_ssse3(std::int32_t* output, const std::int8_t* weights, const std::int32_t* biases, const std::uint8_t* input)
{
# if defined(USE_SSE2)
// At least a multiple of 16, with SSE2.
static_assert(PaddedInputDimensions % 16 == 0);
constexpr IndexType NumChunks = PaddedInputDimensions / 16;
const __m128i Zeros = _mm_setzero_si128();
const auto inputVector = reinterpret_cast<const __m128i*>(input);
# elif defined(USE_MMX)
static_assert(InputDimensions % 8 == 0);
constexpr IndexType NumChunks = InputDimensions / 8;
const __m64 Zeros = _mm_setzero_si64();
const auto inputVector = reinterpret_cast<const __m64*>(input);
# elif defined(USE_NEON)
static_assert(PaddedInputDimensions % 16 == 0);
constexpr IndexType NumChunks = PaddedInputDimensions / 16;
const auto inputVector = reinterpret_cast<const int8x8_t*>(input);
# endif
for (IndexType i = 0; i < OutputDimensions; ++i) {
const IndexType offset = i * PaddedInputDimensions;
# if defined(USE_SSE2)
__m128i sumLo = _mm_cvtsi32_si128(biases[i]);
__m128i sumHi = Zeros;
const auto row = reinterpret_cast<const __m128i*>(&weights[offset]);
for (IndexType j = 0; j < NumChunks; ++j) {
__m128i row_j = _mm_load_si128(&row[j]);
__m128i input_j = _mm_load_si128(&inputVector[j]);
__m128i extendedRowLo = _mm_srai_epi16(_mm_unpacklo_epi8(row_j, row_j), 8);
__m128i extendedRowHi = _mm_srai_epi16(_mm_unpackhi_epi8(row_j, row_j), 8);
__m128i extendedInputLo = _mm_unpacklo_epi8(input_j, Zeros);
__m128i extendedInputHi = _mm_unpackhi_epi8(input_j, Zeros);
__m128i productLo = _mm_madd_epi16(extendedRowLo, extendedInputLo);
__m128i productHi = _mm_madd_epi16(extendedRowHi, extendedInputHi);
sumLo = _mm_add_epi32(sumLo, productLo);
sumHi = _mm_add_epi32(sumHi, productHi);
}
__m128i sum = _mm_add_epi32(sumLo, sumHi);
__m128i sumHigh_64 = _mm_shuffle_epi32(sum, _MM_SHUFFLE(1, 0, 3, 2));
sum = _mm_add_epi32(sum, sumHigh_64);
__m128i sum_second_32 = _mm_shufflelo_epi16(sum, _MM_SHUFFLE(1, 0, 3, 2));
sum = _mm_add_epi32(sum, sum_second_32);
output[i] = _mm_cvtsi128_si32(sum);
# elif defined(USE_MMX)
__m64 sumLo = _mm_cvtsi32_si64(biases[i]);
__m64 sumHi = Zeros;
const auto row = reinterpret_cast<const __m64*>(&weights[offset]);
for (IndexType j = 0; j < NumChunks; ++j) {
__m64 row_j = row[j];
__m64 input_j = inputVector[j];
__m64 extendedRowLo = _mm_srai_pi16(_mm_unpacklo_pi8(row_j, row_j), 8);
__m64 extendedRowHi = _mm_srai_pi16(_mm_unpackhi_pi8(row_j, row_j), 8);
__m64 extendedInputLo = _mm_unpacklo_pi8(input_j, Zeros);
__m64 extendedInputHi = _mm_unpackhi_pi8(input_j, Zeros);
__m64 productLo = _mm_madd_pi16(extendedRowLo, extendedInputLo);
__m64 productHi = _mm_madd_pi16(extendedRowHi, extendedInputHi);
sumLo = _mm_add_pi32(sumLo, productLo);
sumHi = _mm_add_pi32(sumHi, productHi);
}
__m64 sum = _mm_add_pi32(sumLo, sumHi);
sum = _mm_add_pi32(sum, _mm_unpackhi_pi32(sum, sum));
output[i] = _mm_cvtsi64_si32(sum);
# elif defined(USE_NEON)
int32x4_t sum = {biases[i]};
const auto row = reinterpret_cast<const int8x8_t*>(&weights[offset]);
for (IndexType j = 0; j < NumChunks; ++j) {
int16x8_t product = vmull_s8(inputVector[j * 2], row[j * 2]);
product = vmlal_s8(product, inputVector[j * 2 + 1], row[j * 2 + 1]);
sum = vpadalq_s16(sum, product);
}
output[i] = sum[0] + sum[1] + sum[2] + sum[3];
# else
std::int32_t sum = biases[i];
for (IndexType j = 0; j < InputDimensions; ++j) {
sum += weights[offset + j] * input[j];
}
output[i] = sum;
# endif
}
# if defined(USE_MMX)
_mm_empty();
# endif
}
#endif
template <typename PreviousLayer, IndexType OutDims, typename Enabled = void>
class AffineTransform;
// A specialization for large inputs.
template <typename PreviousLayer, IndexType OutDims>
class AffineTransform {
class AffineTransform<PreviousLayer, OutDims, std::enable_if_t<(PreviousLayer::OutputDimensions >= 2*64-1)>> {
public:
// Input/output type
using InputType = typename PreviousLayer::OutputType;
using OutputType = std::int32_t;
static_assert(std::is_same<InputType, std::uint8_t>::value, "");
// Number of input/output dimensions
static constexpr IndexType InputDimensions = PreviousLayer::OutputDimensions;
static constexpr IndexType OutputDimensions = OutDims;
static constexpr IndexType PaddedInputDimensions =
ceil_to_multiple<IndexType>(InputDimensions, MaxSimdWidth);
static_assert(PaddedInputDimensions >= 128, "Something went wrong. This specialization should not have been chosen.");
#if defined (USE_AVX512)
static constexpr const IndexType InputSimdWidth = 64;
static constexpr const IndexType MaxNumOutputRegs = 16;
#elif defined (USE_AVX2)
static constexpr const IndexType InputSimdWidth = 32;
static constexpr const IndexType MaxNumOutputRegs = 8;
#elif defined (USE_SSSE3)
static constexpr const IndexType InputSimdWidth = 16;
static constexpr const IndexType MaxNumOutputRegs = 8;
#else
// The fallback implementation will not have permuted weights.
// We define these to avoid a lot of ifdefs later.
static constexpr const IndexType InputSimdWidth = 1;
static constexpr const IndexType MaxNumOutputRegs = 1;
#endif
// A big block is a region in the weight matrix of the size [PaddedInputDimensions, NumOutputRegs].
// A small block is a region of size [InputSimdWidth, 1]
static constexpr const IndexType NumOutputRegs = std::min(MaxNumOutputRegs, OutputDimensions);
static constexpr const IndexType SmallBlockSize = InputSimdWidth;
static constexpr const IndexType BigBlockSize = NumOutputRegs * PaddedInputDimensions;
static constexpr const IndexType NumSmallBlocksInBigBlock = BigBlockSize / SmallBlockSize;
static constexpr const IndexType NumSmallBlocksPerOutput = PaddedInputDimensions / SmallBlockSize;
static constexpr const IndexType NumBigBlocks = OutputDimensions / NumOutputRegs;
static_assert(OutputDimensions % NumOutputRegs == 0);
// Size of forward propagation buffer used in this layer
static constexpr std::size_t SelfBufferSize =
ceil_to_multiple(OutputDimensions * sizeof(OutputType), CacheLineSize);
// Size of the forward propagation buffer used from the input layer to this layer
static constexpr std::size_t BufferSize =
PreviousLayer::BufferSize + SelfBufferSize;
// Hash value embedded in the evaluation file
static constexpr std::uint32_t get_hash_value() {
std::uint32_t hashValue = 0xCC03DAE4u;
hashValue += OutputDimensions;
hashValue ^= PreviousLayer::get_hash_value() >> 1;
hashValue ^= PreviousLayer::get_hash_value() << 31;
return hashValue;
}
/*
Transposes the small blocks within a block.
Effectively means that weights can be traversed sequentially during inference.
*/
static IndexType get_weight_index(IndexType i)
{
const IndexType smallBlock = (i / SmallBlockSize) % NumSmallBlocksInBigBlock;
const IndexType smallBlockCol = smallBlock / NumSmallBlocksPerOutput;
const IndexType smallBlockRow = smallBlock % NumSmallBlocksPerOutput;
const IndexType bigBlock = i / BigBlockSize;
const IndexType rest = i % SmallBlockSize;
const IndexType idx =
bigBlock * BigBlockSize
+ smallBlockRow * SmallBlockSize * NumOutputRegs
+ smallBlockCol * SmallBlockSize
+ rest;
return idx;
}
// Read network parameters
bool read_parameters(std::istream& stream) {
if (!previousLayer.read_parameters(stream)) return false;
for (std::size_t i = 0; i < OutputDimensions; ++i)
biases[i] = read_little_endian<BiasType>(stream);
for (std::size_t i = 0; i < OutputDimensions * PaddedInputDimensions; ++i)
weights[get_weight_index(i)] = read_little_endian<WeightType>(stream);
return !stream.fail();
}
// Write network parameters
bool write_parameters(std::ostream& stream) const {
if (!previousLayer.write_parameters(stream)) return false;
for (std::size_t i = 0; i < OutputDimensions; ++i)
write_little_endian<BiasType>(stream, biases[i]);
for (std::size_t i = 0; i < OutputDimensions * PaddedInputDimensions; ++i)
write_little_endian<WeightType>(stream, weights[get_weight_index(i)]);
return !stream.fail();
}
// Forward propagation
const OutputType* propagate(
const TransformedFeatureType* transformedFeatures, char* buffer) const {
const auto input = previousLayer.propagate(
transformedFeatures, buffer + SelfBufferSize);
OutputType* output = reinterpret_cast<OutputType*>(buffer);
#if defined (USE_AVX512)
using vec_t = __m512i;
#define vec_setzero _mm512_setzero_si512
#define vec_set_32 _mm512_set1_epi32
#define vec_add_dpbusd_32 Simd::m512_add_dpbusd_epi32
#define vec_add_dpbusd_32x2 Simd::m512_add_dpbusd_epi32x2
#define vec_hadd Simd::m512_hadd
#define vec_haddx4 Simd::m512_haddx4
#elif defined (USE_AVX2)
using vec_t = __m256i;
#define vec_setzero _mm256_setzero_si256
#define vec_set_32 _mm256_set1_epi32
#define vec_add_dpbusd_32 Simd::m256_add_dpbusd_epi32
#define vec_add_dpbusd_32x2 Simd::m256_add_dpbusd_epi32x2
#define vec_hadd Simd::m256_hadd
#define vec_haddx4 Simd::m256_haddx4
#elif defined (USE_SSSE3)
using vec_t = __m128i;
#define vec_setzero _mm_setzero_si128
#define vec_set_32 _mm_set1_epi32
#define vec_add_dpbusd_32 Simd::m128_add_dpbusd_epi32
#define vec_add_dpbusd_32x2 Simd::m128_add_dpbusd_epi32x2
#define vec_hadd Simd::m128_hadd
#define vec_haddx4 Simd::m128_haddx4
#endif
#if defined (USE_SSSE3)
const vec_t* invec = reinterpret_cast<const vec_t*>(input);
// Perform accumulation to registers for each big block
for (IndexType bigBlock = 0; bigBlock < NumBigBlocks; ++bigBlock)
{
vec_t acc[NumOutputRegs] = { vec_setzero() };
// Each big block has NumOutputRegs small blocks in each "row", one per register.
// We process two small blocks at a time to save on one addition without VNNI.
for (IndexType smallBlock = 0; smallBlock < NumSmallBlocksPerOutput; smallBlock += 2)
{
const vec_t* weightvec =
reinterpret_cast<const vec_t*>(
weights
+ bigBlock * BigBlockSize
+ smallBlock * SmallBlockSize * NumOutputRegs);
const vec_t in0 = invec[smallBlock + 0];
const vec_t in1 = invec[smallBlock + 1];
for (IndexType k = 0; k < NumOutputRegs; ++k)
vec_add_dpbusd_32x2(acc[k], in0, weightvec[k], in1, weightvec[k + NumOutputRegs]);
}
// Horizontally add all accumulators.
if constexpr (NumOutputRegs % 4 == 0)
{
__m128i* outputvec = reinterpret_cast<__m128i*>(output);
const __m128i* biasvec = reinterpret_cast<const __m128i*>(biases);
for (IndexType k = 0; k < NumOutputRegs; k += 4)
{
const IndexType idx = (bigBlock * NumOutputRegs + k) / 4;
outputvec[idx] = vec_haddx4(acc[k+0], acc[k+1], acc[k+2], acc[k+3], biasvec[idx]);
}
}
else
{
for (IndexType k = 0; k < NumOutputRegs; ++k)
{
const IndexType idx = (bigBlock * NumOutputRegs + k);
output[idx] = vec_hadd(acc[k], biases[idx]);
}
}
}
# undef vec_setzero
# undef vec_set_32
# undef vec_add_dpbusd_32
# undef vec_add_dpbusd_32x2
# undef vec_hadd
# undef vec_haddx4
#else
// Use old implementation for the other architectures.
affine_transform_non_ssse3<
InputDimensions,
PaddedInputDimensions,
OutputDimensions>(output, weights, biases, input);
#endif
return output;
}
private:
using BiasType = OutputType;
using WeightType = std::int8_t;
PreviousLayer previousLayer;
alignas(CacheLineSize) BiasType biases[OutputDimensions];
alignas(CacheLineSize) WeightType weights[OutputDimensions * PaddedInputDimensions];
};
template <typename PreviousLayer, IndexType OutDims>
class AffineTransform<PreviousLayer, OutDims, std::enable_if_t<(PreviousLayer::OutputDimensions < 2*64-1)>> {
public:
// Input/output type
using InputType = typename PreviousLayer::OutputType;
@@ -41,19 +385,21 @@ namespace Stockfish::Eval::NNUE::Layers {
static constexpr IndexType OutputDimensions = OutDims;
static constexpr IndexType PaddedInputDimensions =
ceil_to_multiple<IndexType>(InputDimensions, MaxSimdWidth);
#if defined (USE_AVX512)
static constexpr const IndexType OutputSimdWidth = SimdWidth / 2;
#elif defined (USE_SSSE3)
static_assert(PaddedInputDimensions < 128, "Something went wrong. This specialization should not have been chosen.");
#if defined (USE_SSSE3)
static constexpr const IndexType OutputSimdWidth = SimdWidth / 4;
static constexpr const IndexType InputSimdWidth = SimdWidth;
#endif
// Size of forward propagation buffer used in this layer
static constexpr std::size_t SelfBufferSize =
ceil_to_multiple(OutputDimensions * sizeof(OutputType), CacheLineSize);
ceil_to_multiple(OutputDimensions * sizeof(OutputType), CacheLineSize);
// Size of the forward propagation buffer used from the input layer to this layer
static constexpr std::size_t BufferSize =
PreviousLayer::BufferSize + SelfBufferSize;
PreviousLayer::BufferSize + SelfBufferSize;
// Hash value embedded in the evaluation file
static constexpr std::uint32_t get_hash_value() {
@@ -64,21 +410,30 @@ namespace Stockfish::Eval::NNUE::Layers {
return hashValue;
}
static IndexType get_weight_index_scrambled(IndexType i)
{
return
(i / 4) % (PaddedInputDimensions / 4) * OutputDimensions * 4 +
i / PaddedInputDimensions * 4 +
i % 4;
}
static IndexType get_weight_index(IndexType i)
{
#if defined (USE_SSSE3)
return get_weight_index_scrambled(i);
#else
return i;
#endif
}
// Read network parameters
bool read_parameters(std::istream& stream) {
if (!previousLayer.read_parameters(stream)) return false;
for (std::size_t i = 0; i < OutputDimensions; ++i)
biases[i] = read_little_endian<BiasType>(stream);
for (std::size_t i = 0; i < OutputDimensions * PaddedInputDimensions; ++i)
#if !defined (USE_SSSE3)
weights[i] = read_little_endian<WeightType>(stream);
#else
weights[
(i / 4) % (PaddedInputDimensions / 4) * OutputDimensions * 4 +
i / PaddedInputDimensions * 4 +
i % 4
] = read_little_endian<WeightType>(stream);
#endif
weights[get_weight_index(i)] = read_little_endian<WeightType>(stream);
return !stream.fail();
}
@@ -87,334 +442,98 @@ namespace Stockfish::Eval::NNUE::Layers {
bool write_parameters(std::ostream& stream) const {
if (!previousLayer.write_parameters(stream)) return false;
for (std::size_t i = 0; i < OutputDimensions; ++i)
write_little_endian<BiasType>(stream, biases[i]);
#if !defined (USE_SSSE3)
for (std::size_t i = 0; i < OutputDimensions * PaddedInputDimensions; ++i)
write_little_endian<WeightType>(stream, weights[i]);
#else
std::unique_ptr<WeightType[]> unscrambledWeights = std::make_unique<WeightType[]>(OutputDimensions * PaddedInputDimensions);
for (std::size_t i = 0; i < OutputDimensions * PaddedInputDimensions; ++i) {
unscrambledWeights[i] =
weights[
(i / 4) % (PaddedInputDimensions / 4) * OutputDimensions * 4 +
i / PaddedInputDimensions * 4 +
i % 4
];
}
write_little_endian<BiasType>(stream, biases[i]);
for (std::size_t i = 0; i < OutputDimensions * PaddedInputDimensions; ++i)
write_little_endian<WeightType>(stream, unscrambledWeights[i]);
#endif
write_little_endian<WeightType>(stream, weights[get_weight_index(i)]);
return !stream.fail();
}
// Forward propagation
const OutputType* propagate(
const TransformedFeatureType* transformedFeatures, char* buffer) const {
const auto input = previousLayer.propagate(
transformedFeatures, buffer + SelfBufferSize);
transformedFeatures, buffer + SelfBufferSize);
const auto output = reinterpret_cast<OutputType*>(buffer);
#if defined (USE_AVX512)
[[maybe_unused]] const __m512i Ones512 = _mm512_set1_epi16(1);
[[maybe_unused]] auto m512_hadd = [](__m512i sum, int bias) -> int {
return _mm512_reduce_add_epi32(sum) + bias;
};
[[maybe_unused]] auto m512_add_dpbusd_epi32 = [=](__m512i& acc, __m512i a, __m512i b) {
#if defined (USE_VNNI)
acc = _mm512_dpbusd_epi32(acc, a, b);
#else
__m512i product0 = _mm512_maddubs_epi16(a, b);
product0 = _mm512_madd_epi16(product0, Ones512);
acc = _mm512_add_epi32(acc, product0);
#endif
};
[[maybe_unused]] auto m512_add_dpbusd_epi32x4 = [=](__m512i& acc, __m512i a0, __m512i b0, __m512i a1, __m512i b1,
__m512i a2, __m512i b2, __m512i a3, __m512i b3) {
#if defined (USE_VNNI)
acc = _mm512_dpbusd_epi32(acc, a0, b0);
acc = _mm512_dpbusd_epi32(acc, a1, b1);
acc = _mm512_dpbusd_epi32(acc, a2, b2);
acc = _mm512_dpbusd_epi32(acc, a3, b3);
#else
__m512i product0 = _mm512_maddubs_epi16(a0, b0);
__m512i product1 = _mm512_maddubs_epi16(a1, b1);
__m512i product2 = _mm512_maddubs_epi16(a2, b2);
__m512i product3 = _mm512_maddubs_epi16(a3, b3);
product0 = _mm512_adds_epi16(product0, product1);
product0 = _mm512_madd_epi16(product0, Ones512);
product2 = _mm512_adds_epi16(product2, product3);
product2 = _mm512_madd_epi16(product2, Ones512);
acc = _mm512_add_epi32(acc, _mm512_add_epi32(product0, product2));
#endif
};
#endif
#if defined (USE_AVX2)
[[maybe_unused]] const __m256i Ones256 = _mm256_set1_epi16(1);
[[maybe_unused]] auto m256_hadd = [](__m256i sum, int bias) -> int {
__m128i sum128 = _mm_add_epi32(_mm256_castsi256_si128(sum), _mm256_extracti128_si256(sum, 1));
sum128 = _mm_add_epi32(sum128, _mm_shuffle_epi32(sum128, _MM_PERM_BADC));
sum128 = _mm_add_epi32(sum128, _mm_shuffle_epi32(sum128, _MM_PERM_CDAB));
return _mm_cvtsi128_si32(sum128) + bias;
};
[[maybe_unused]] auto m256_add_dpbusd_epi32 = [=](__m256i& acc, __m256i a, __m256i b) {
#if defined (USE_VNNI)
acc = _mm256_dpbusd_epi32(acc, a, b);
#else
__m256i product0 = _mm256_maddubs_epi16(a, b);
product0 = _mm256_madd_epi16(product0, Ones256);
acc = _mm256_add_epi32(acc, product0);
#endif
};
[[maybe_unused]] auto m256_add_dpbusd_epi32x4 = [=](__m256i& acc, __m256i a0, __m256i b0, __m256i a1, __m256i b1,
__m256i a2, __m256i b2, __m256i a3, __m256i b3) {
#if defined (USE_VNNI)
acc = _mm256_dpbusd_epi32(acc, a0, b0);
acc = _mm256_dpbusd_epi32(acc, a1, b1);
acc = _mm256_dpbusd_epi32(acc, a2, b2);
acc = _mm256_dpbusd_epi32(acc, a3, b3);
#else
__m256i product0 = _mm256_maddubs_epi16(a0, b0);
__m256i product1 = _mm256_maddubs_epi16(a1, b1);
__m256i product2 = _mm256_maddubs_epi16(a2, b2);
__m256i product3 = _mm256_maddubs_epi16(a3, b3);
product0 = _mm256_adds_epi16(product0, product1);
product0 = _mm256_madd_epi16(product0, Ones256);
product2 = _mm256_adds_epi16(product2, product3);
product2 = _mm256_madd_epi16(product2, Ones256);
acc = _mm256_add_epi32(acc, _mm256_add_epi32(product0, product2));
#endif
};
#endif
#if defined (USE_SSSE3)
[[maybe_unused]] const __m128i Ones128 = _mm_set1_epi16(1);
[[maybe_unused]] auto m128_hadd = [](__m128i sum, int bias) -> int {
sum = _mm_add_epi32(sum, _mm_shuffle_epi32(sum, 0x4E)); //_MM_PERM_BADC
sum = _mm_add_epi32(sum, _mm_shuffle_epi32(sum, 0xB1)); //_MM_PERM_CDAB
return _mm_cvtsi128_si32(sum) + bias;
};
[[maybe_unused]] auto m128_add_dpbusd_epi32 = [=](__m128i& acc, __m128i a, __m128i b) {
__m128i product0 = _mm_maddubs_epi16(a, b);
product0 = _mm_madd_epi16(product0, Ones128);
acc = _mm_add_epi32(acc, product0);
};
[[maybe_unused]] auto m128_add_dpbusd_epi32x4 = [=](__m128i& acc, __m128i a0, __m128i b0, __m128i a1, __m128i b1,
__m128i a2, __m128i b2, __m128i a3, __m128i b3) {
__m128i product0 = _mm_maddubs_epi16(a0, b0);
__m128i product1 = _mm_maddubs_epi16(a1, b1);
__m128i product2 = _mm_maddubs_epi16(a2, b2);
__m128i product3 = _mm_maddubs_epi16(a3, b3);
product0 = _mm_adds_epi16(product0, product1);
product0 = _mm_madd_epi16(product0, Ones128);
product2 = _mm_adds_epi16(product2, product3);
product2 = _mm_madd_epi16(product2, Ones128);
acc = _mm_add_epi32(acc, _mm_add_epi32(product0, product2));
};
#endif
#if defined (USE_AVX512)
using vec_t = __m512i;
#define vec_setzero _mm512_setzero_si512
#define vec_set_32 _mm512_set1_epi32
auto& vec_add_dpbusd_32 = m512_add_dpbusd_epi32;
auto& vec_add_dpbusd_32x4 = m512_add_dpbusd_epi32x4;
auto& vec_hadd = m512_hadd;
#elif defined (USE_AVX2)
using vec_t = __m256i;
#define vec_setzero _mm256_setzero_si256
#define vec_set_32 _mm256_set1_epi32
auto& vec_add_dpbusd_32 = m256_add_dpbusd_epi32;
auto& vec_add_dpbusd_32x4 = m256_add_dpbusd_epi32x4;
auto& vec_hadd = m256_hadd;
#define vec_add_dpbusd_32 Simd::m256_add_dpbusd_epi32
#define vec_add_dpbusd_32x2 Simd::m256_add_dpbusd_epi32x2
#define vec_add_dpbusd_32x4 Simd::m256_add_dpbusd_epi32x4
#define vec_hadd Simd::m256_hadd
#define vec_haddx4 Simd::m256_haddx4
#elif defined (USE_SSSE3)
using vec_t = __m128i;
#define vec_setzero _mm_setzero_si128
#define vec_set_32 _mm_set1_epi32
auto& vec_add_dpbusd_32 = m128_add_dpbusd_epi32;
auto& vec_add_dpbusd_32x4 = m128_add_dpbusd_epi32x4;
auto& vec_hadd = m128_hadd;
#define vec_add_dpbusd_32 Simd::m128_add_dpbusd_epi32
#define vec_add_dpbusd_32x2 Simd::m128_add_dpbusd_epi32x2
#define vec_add_dpbusd_32x4 Simd::m128_add_dpbusd_epi32x4
#define vec_hadd Simd::m128_hadd
#define vec_haddx4 Simd::m128_haddx4
#endif
#if defined (USE_SSSE3)
// Different layout, we process 4 inputs at a time, always.
static_assert(InputDimensions % 4 == 0);
const auto output = reinterpret_cast<OutputType*>(buffer);
const auto inputVector = reinterpret_cast<const vec_t*>(input);
static_assert(InputDimensions % 8 == 0);
static_assert(OutputDimensions % OutputSimdWidth == 0 || OutputDimensions == 1);
// OutputDimensions is either 1 or a multiple of SimdWidth
// because then it is also an input dimension.
if constexpr (OutputDimensions % OutputSimdWidth == 0)
{
constexpr IndexType NumChunks = InputDimensions / 4;
constexpr IndexType NumChunks = InputDimensions / 4;
constexpr IndexType NumRegs = OutputDimensions / OutputSimdWidth;
const auto input32 = reinterpret_cast<const std::int32_t*>(input);
vec_t* outptr = reinterpret_cast<vec_t*>(output);
std::memcpy(output, biases, OutputDimensions * sizeof(OutputType));
const auto input32 = reinterpret_cast<const std::int32_t*>(input);
const vec_t* biasvec = reinterpret_cast<const vec_t*>(biases);
vec_t acc[NumRegs];
for (IndexType k = 0; k < NumRegs; ++k)
acc[k] = biasvec[k];
for (int i = 0; i < (int)NumChunks - 3; i += 4)
{
const vec_t in0 = vec_set_32(input32[i + 0]);
const vec_t in1 = vec_set_32(input32[i + 1]);
const vec_t in2 = vec_set_32(input32[i + 2]);
const vec_t in3 = vec_set_32(input32[i + 3]);
const auto col0 = reinterpret_cast<const vec_t*>(&weights[(i + 0) * OutputDimensions * 4]);
const auto col1 = reinterpret_cast<const vec_t*>(&weights[(i + 1) * OutputDimensions * 4]);
const auto col2 = reinterpret_cast<const vec_t*>(&weights[(i + 2) * OutputDimensions * 4]);
const auto col3 = reinterpret_cast<const vec_t*>(&weights[(i + 3) * OutputDimensions * 4]);
for (int j = 0; j * OutputSimdWidth < OutputDimensions; ++j)
vec_add_dpbusd_32x4(outptr[j], in0, col0[j], in1, col1[j], in2, col2[j], in3, col3[j]);
}
for (IndexType i = 0; i < NumChunks; i += 2)
{
const vec_t in0 = vec_set_32(input32[i + 0]);
const vec_t in1 = vec_set_32(input32[i + 1]);
const auto col0 = reinterpret_cast<const vec_t*>(&weights[(i + 0) * OutputDimensions * 4]);
const auto col1 = reinterpret_cast<const vec_t*>(&weights[(i + 1) * OutputDimensions * 4]);
for (IndexType k = 0; k < NumRegs; ++k)
vec_add_dpbusd_32x2(acc[k], in0, col0[k], in1, col1[k]);
}
vec_t* outptr = reinterpret_cast<vec_t*>(output);
for (IndexType k = 0; k < NumRegs; ++k)
outptr[k] = acc[k];
}
else if constexpr (OutputDimensions == 1)
{
#if defined (USE_AVX512)
if constexpr (PaddedInputDimensions % (SimdWidth * 2) != 0)
{
constexpr IndexType NumChunks = PaddedInputDimensions / SimdWidth;
const auto inputVector256 = reinterpret_cast<const __m256i*>(input);
constexpr IndexType NumChunks = PaddedInputDimensions / SimdWidth;
vec_t sum0 = vec_setzero();
const auto row0 = reinterpret_cast<const vec_t*>(&weights[0]);
__m256i sum0 = _mm256_setzero_si256();
const auto row0 = reinterpret_cast<const __m256i*>(&weights[0]);
for (int j = 0; j < (int)NumChunks; ++j)
{
const __m256i in = inputVector256[j];
m256_add_dpbusd_epi32(sum0, in, row0[j]);
}
output[0] = m256_hadd(sum0, biases[0]);
}
else
#endif
{
#if defined (USE_AVX512)
constexpr IndexType NumChunks = PaddedInputDimensions / (SimdWidth * 2);
#else
constexpr IndexType NumChunks = PaddedInputDimensions / SimdWidth;
#endif
vec_t sum0 = vec_setzero();
const auto row0 = reinterpret_cast<const vec_t*>(&weights[0]);
for (int j = 0; j < (int)NumChunks; ++j)
{
const vec_t in = inputVector[j];
vec_add_dpbusd_32(sum0, in, row0[j]);
}
output[0] = vec_hadd(sum0, biases[0]);
}
for (int j = 0; j < (int)NumChunks; ++j)
{
const vec_t in = inputVector[j];
vec_add_dpbusd_32(sum0, in, row0[j]);
}
output[0] = vec_hadd(sum0, biases[0]);
}
# undef vec_setzero
# undef vec_set_32
# undef vec_add_dpbusd_32
# undef vec_add_dpbusd_32x2
# undef vec_add_dpbusd_32x4
# undef vec_hadd
# undef vec_haddx4
#else
// Use old implementation for the other architectures.
auto output = reinterpret_cast<OutputType*>(buffer);
#if defined(USE_SSE2)
// At least a multiple of 16, with SSE2.
static_assert(InputDimensions % SimdWidth == 0);
constexpr IndexType NumChunks = InputDimensions / SimdWidth;
const __m128i Zeros = _mm_setzero_si128();
const auto inputVector = reinterpret_cast<const __m128i*>(input);
#elif defined(USE_MMX)
static_assert(InputDimensions % SimdWidth == 0);
constexpr IndexType NumChunks = InputDimensions / SimdWidth;
const __m64 Zeros = _mm_setzero_si64();
const auto inputVector = reinterpret_cast<const __m64*>(input);
#elif defined(USE_NEON)
static_assert(InputDimensions % SimdWidth == 0);
constexpr IndexType NumChunks = InputDimensions / SimdWidth;
const auto inputVector = reinterpret_cast<const int8x8_t*>(input);
#endif
for (IndexType i = 0; i < OutputDimensions; ++i) {
const IndexType offset = i * PaddedInputDimensions;
#if defined(USE_SSE2)
__m128i sumLo = _mm_cvtsi32_si128(biases[i]);
__m128i sumHi = Zeros;
const auto row = reinterpret_cast<const __m128i*>(&weights[offset]);
for (IndexType j = 0; j < NumChunks; ++j) {
__m128i row_j = _mm_load_si128(&row[j]);
__m128i input_j = _mm_load_si128(&inputVector[j]);
__m128i extendedRowLo = _mm_srai_epi16(_mm_unpacklo_epi8(row_j, row_j), 8);
__m128i extendedRowHi = _mm_srai_epi16(_mm_unpackhi_epi8(row_j, row_j), 8);
__m128i extendedInputLo = _mm_unpacklo_epi8(input_j, Zeros);
__m128i extendedInputHi = _mm_unpackhi_epi8(input_j, Zeros);
__m128i productLo = _mm_madd_epi16(extendedRowLo, extendedInputLo);
__m128i productHi = _mm_madd_epi16(extendedRowHi, extendedInputHi);
sumLo = _mm_add_epi32(sumLo, productLo);
sumHi = _mm_add_epi32(sumHi, productHi);
}
__m128i sum = _mm_add_epi32(sumLo, sumHi);
__m128i sumHigh_64 = _mm_shuffle_epi32(sum, _MM_SHUFFLE(1, 0, 3, 2));
sum = _mm_add_epi32(sum, sumHigh_64);
__m128i sum_second_32 = _mm_shufflelo_epi16(sum, _MM_SHUFFLE(1, 0, 3, 2));
sum = _mm_add_epi32(sum, sum_second_32);
output[i] = _mm_cvtsi128_si32(sum);
#elif defined(USE_MMX)
__m64 sumLo = _mm_cvtsi32_si64(biases[i]);
__m64 sumHi = Zeros;
const auto row = reinterpret_cast<const __m64*>(&weights[offset]);
for (IndexType j = 0; j < NumChunks; ++j) {
__m64 row_j = row[j];
__m64 input_j = inputVector[j];
__m64 extendedRowLo = _mm_srai_pi16(_mm_unpacklo_pi8(row_j, row_j), 8);
__m64 extendedRowHi = _mm_srai_pi16(_mm_unpackhi_pi8(row_j, row_j), 8);
__m64 extendedInputLo = _mm_unpacklo_pi8(input_j, Zeros);
__m64 extendedInputHi = _mm_unpackhi_pi8(input_j, Zeros);
__m64 productLo = _mm_madd_pi16(extendedRowLo, extendedInputLo);
__m64 productHi = _mm_madd_pi16(extendedRowHi, extendedInputHi);
sumLo = _mm_add_pi32(sumLo, productLo);
sumHi = _mm_add_pi32(sumHi, productHi);
}
__m64 sum = _mm_add_pi32(sumLo, sumHi);
sum = _mm_add_pi32(sum, _mm_unpackhi_pi32(sum, sum));
output[i] = _mm_cvtsi64_si32(sum);
#elif defined(USE_NEON)
int32x4_t sum = {biases[i]};
const auto row = reinterpret_cast<const int8x8_t*>(&weights[offset]);
for (IndexType j = 0; j < NumChunks; ++j) {
int16x8_t product = vmull_s8(inputVector[j * 2], row[j * 2]);
product = vmlal_s8(product, inputVector[j * 2 + 1], row[j * 2 + 1]);
sum = vpadalq_s16(sum, product);
}
output[i] = sum[0] + sum[1] + sum[2] + sum[3];
#else
OutputType sum = biases[i];
for (IndexType j = 0; j < InputDimensions; ++j) {
sum += weights[offset + j] * input[j];
}
output[i] = sum;
#endif
}
#if defined(USE_MMX)
_mm_empty();
#endif
// Use old implementation for the other architectures.
affine_transform_non_ssse3<
InputDimensions,
PaddedInputDimensions,
OutputDimensions>(output, weights, biases, input);
#endif
return output;
+12 -2
View File
@@ -35,9 +35,10 @@ namespace Stockfish::Eval::NNUE::Layers {
static_assert(std::is_same<InputType, std::int32_t>::value, "");
// Number of input/output dimensions
static constexpr IndexType InputDimensions =
PreviousLayer::OutputDimensions;
static constexpr IndexType InputDimensions = PreviousLayer::OutputDimensions;
static constexpr IndexType OutputDimensions = InputDimensions;
static constexpr IndexType PaddedOutputDimensions =
ceil_to_multiple<IndexType>(OutputDimensions, 32);
// Size of forward propagation buffer used in this layer
static constexpr std::size_t SelfBufferSize =
@@ -179,6 +180,15 @@ namespace Stockfish::Eval::NNUE::Layers {
output[i] = static_cast<OutputType>(
std::max(0, std::min(127, input[i] >> WeightScaleBits)));
}
// Affine transform layers expect that there is at least
// ceil_to_multiple(OutputDimensions, 32) initialized values.
// We cannot do this in the affine transform because it requires
// preallocating space here.
for (IndexType i = OutputDimensions; i < PaddedOutputDimensions; ++i) {
output[i] = 0;
}
return output;
}
+4 -4
View File
@@ -23,7 +23,7 @@
#include "nnue_common.h"
#include "features/half_ka_v2.h"
#include "features/half_ka_v2_hm.h"
#include "layers/input_slice.h"
#include "layers/affine_transform.h"
@@ -32,10 +32,10 @@
namespace Stockfish::Eval::NNUE {
// Input features used in evaluation function
using FeatureSet = Features::HalfKAv2;
using FeatureSet = Features::HalfKAv2_hm;
// Number of input feature dimensions after conversion
constexpr IndexType TransformedFeatureDimensions = 512;
constexpr IndexType TransformedFeatureDimensions = 1024;
constexpr IndexType PSQTBuckets = 8;
constexpr IndexType LayerStacks = 8;
@@ -43,7 +43,7 @@ namespace Stockfish::Eval::NNUE {
// Define network structure
using InputLayer = InputSlice<TransformedFeatureDimensions * 2>;
using HiddenLayer1 = ClippedReLU<AffineTransform<InputLayer, 16>>;
using HiddenLayer1 = ClippedReLU<AffineTransform<InputLayer, 8>>;
using HiddenLayer2 = ClippedReLU<AffineTransform<HiddenLayer1, 32>>;
using OutputLayer = AffineTransform<HiddenLayer2, 1>;
+4 -5
View File
@@ -370,7 +370,6 @@ namespace Stockfish::Eval::NNUE {
// That might depend on the feature set and generally relies on the
// feature set's update cost calculation to be correct and never
// allow updates with more added/removed features than MaxActiveDimensions.
using IndexList = ValueList<IndexType, FeatureSet::MaxActiveDimensions>;
#ifdef VECTOR
// Gcc-10.2 unnecessarily spills AVX2 registers if this array
@@ -404,12 +403,12 @@ namespace Stockfish::Eval::NNUE {
// Gather all features to be updated.
const Square ksq = pos.square<KING>(perspective);
IndexList removed[2], added[2];
FeatureSet::IndexList removed[2], added[2];
FeatureSet::append_changed_indices(
ksq, next, perspective, removed[0], added[0]);
ksq, next->dirtyPiece, perspective, removed[0], added[0]);
for (StateInfo *st2 = pos.state(); st2 != next; st2 = st2->previous)
FeatureSet::append_changed_indices(
ksq, st2, perspective, removed[1], added[1]);
ksq, st2->dirtyPiece, perspective, removed[1], added[1]);
// Mark the accumulators as computed.
next->accumulator.computed[perspective] = true;
@@ -534,7 +533,7 @@ namespace Stockfish::Eval::NNUE {
// Refresh the accumulator
auto& accumulator = pos.state()->accumulator;
accumulator.computed[perspective] = true;
IndexList active;
FeatureSet::IndexList active;
FeatureSet::append_active_indices(pos, perspective, active);
#ifdef VECTOR
+3 -2
View File
@@ -1013,9 +1013,9 @@ void Position::do_null_move(StateInfo& newSt) {
}
st->key ^= Zobrist::side;
++st->rule50;
prefetch(TT.first_entry(key()));
++st->rule50;
st->pliesFromNull = 0;
sideToMove = ~sideToMove;
@@ -1080,8 +1080,9 @@ bool Position::see_ge(Move m, Value threshold) const {
if (swap <= 0)
return true;
assert(color_of(piece_on(from)) == sideToMove);
Bitboard occupied = pieces() ^ from ^ to;
Color stm = color_of(piece_on(from));
Color stm = sideToMove;
Bitboard attackers = attackers_to(to, occupied);
Bitboard stmAttackers, bb;
int res = 1;
+157 -118
View File
@@ -61,9 +61,6 @@ namespace {
// Different node types, used as a template parameter
enum NodeType { NonPV, PV, Root };
constexpr uint64_t TtHitAverageWindow = 4096;
constexpr uint64_t TtHitAverageResolution = 1024;
// Futility margin
Value futility_margin(Depth d, bool improving) {
return Value(214 * (d - improving));
@@ -72,9 +69,9 @@ namespace {
// Reductions lookup table, initialized at startup
int Reductions[MAX_MOVES]; // [depth or moveNumber]
Depth reduction(bool i, Depth d, int mn) {
Depth reduction(bool i, Depth d, int mn, bool rangeReduction) {
int r = Reductions[d] * Reductions[mn];
return (r + 534) / 1024 + (!i && r > 904);
return (r + 534) / 1024 + (!i && r > 904) + rangeReduction;
}
constexpr int futility_move_count(bool improving, Depth depth) {
@@ -83,7 +80,7 @@ namespace {
// History and stats update bonus, based on depth
int stat_bonus(Depth d) {
return d > 14 ? 73 : 6 * d * d + 229 * d - 215;
return std::min((6 * d + 229) * d - 215 , 2000);
}
// Add a small random component to draw evaluations to avoid 3-fold blindness
@@ -91,6 +88,30 @@ namespace {
return VALUE_DRAW + Value(2 * (thisThread->nodes & 1) - 1);
}
// Check if the current thread is in a search explosion
ExplosionState search_explosion(Thread* thisThread) {
uint64_t nodesNow = thisThread->nodes;
bool explosive = thisThread->doubleExtensionAverage[WHITE].is_greater(2, 100)
|| thisThread->doubleExtensionAverage[BLACK].is_greater(2, 100);
if (explosive)
thisThread->nodesLastExplosive = nodesNow;
else
thisThread->nodesLastNormal = nodesNow;
if ( explosive
&& thisThread->state == EXPLOSION_NONE
&& nodesNow - thisThread->nodesLastNormal > 6000000)
thisThread->state = MUST_CALM_DOWN;
if ( thisThread->state == MUST_CALM_DOWN
&& nodesNow - thisThread->nodesLastExplosive > 6000000)
thisThread->state = EXPLOSION_NONE;
return thisThread->state;
}
// Skill structure is used to implement strength limit
struct Skill {
explicit Skill(int l) : level(l) {}
@@ -152,7 +173,7 @@ namespace {
void Search::init() {
for (int i = 1; i < MAX_MOVES; ++i)
Reductions[i] = int(21.9 * std::log(i));
Reductions[i] = int((21.9 + std::log(Threads.size()) / 2) * std::log(i));
}
@@ -310,8 +331,13 @@ void Thread::search() {
multiPV = std::max(multiPV, (size_t)4);
multiPV = std::min(multiPV, rootMoves.size());
ttHitAverage = TtHitAverageWindow * TtHitAverageResolution / 2;
doubleExtensionAverage[WHITE].set(0, 100); // initialize the running average at 0%
doubleExtensionAverage[BLACK].set(0, 100); // initialize the running average at 0%
nodesLastExplosive = nodes;
nodesLastNormal = nodes;
state = EXPLOSION_NONE;
trend = SCORE_ZERO;
int searchAgainCounter = 0;
@@ -518,6 +544,14 @@ namespace {
template <NodeType nodeType>
Value search(Position& pos, Stack* ss, Value alpha, Value beta, Depth depth, bool cutNode) {
Thread* thisThread = pos.this_thread();
// Step 0. Limit search explosion
if ( ss->ply > 10
&& search_explosion(thisThread) == MUST_CALM_DOWN
&& depth > (ss-1)->depth)
depth = (ss-1)->depth;
constexpr bool PvNode = nodeType != NonPV;
constexpr bool rootNode = nodeType == Root;
const Depth maxNextDepth = rootNode ? depth : depth + 1;
@@ -556,14 +590,13 @@ namespace {
bool captureOrPromotion, doFullDepthSearch, moveCountPruning,
ttCapture, singularQuietLMR;
Piece movedPiece;
int moveCount, captureCount, quietCount;
int moveCount, captureCount, quietCount, bestMoveCount, improvement;
// Step 1. Initialize node
Thread* thisThread = pos.this_thread();
ss->inCheck = pos.checkers();
priorCapture = pos.captured_piece();
Color us = pos.side_to_move();
moveCount = captureCount = quietCount = ss->moveCount = 0;
moveCount = bestMoveCount = captureCount = quietCount = ss->moveCount = 0;
bestValue = -VALUE_INFINITE;
maxValue = VALUE_INFINITE;
@@ -602,8 +635,12 @@ namespace {
(ss+1)->excludedMove = bestMove = MOVE_NONE;
(ss+2)->killers[0] = (ss+2)->killers[1] = MOVE_NONE;
ss->doubleExtensions = (ss-1)->doubleExtensions;
ss->depth = depth;
Square prevSq = to_sq((ss-1)->currentMove);
// Update the running average statistics for double extensions
thisThread->doubleExtensionAverage[us].update(ss->depth > (ss-1)->depth);
// Initialize statScore to zero for the grandchildren of the current position.
// So statScore is shared between all grandchildren and only the first grandchild
// starts with statScore = 0. Later grandchildren start with the last calculated
@@ -621,6 +658,7 @@ namespace {
ttValue = ss->ttHit ? value_from_tt(tte->value(), ss->ply, pos.rule50_count()) : VALUE_NONE;
ttMove = rootNode ? thisThread->rootMoves[thisThread->pvIdx].pv[0]
: ss->ttHit ? tte->move() : MOVE_NONE;
ttCapture = ttMove && pos.capture_or_promotion(ttMove);
if (!excludedMove)
ss->ttPv = PvNode || (ss->ttHit && tte->is_pv());
@@ -632,14 +670,10 @@ namespace {
&& is_ok((ss-1)->currentMove))
thisThread->lowPlyHistory[ss->ply - 1][from_to((ss-1)->currentMove)] << stat_bonus(depth - 5);
// thisThread->ttHitAverage can be used to approximate the running average of ttHit
thisThread->ttHitAverage = (TtHitAverageWindow - 1) * thisThread->ttHitAverage / TtHitAverageWindow
+ TtHitAverageResolution * ss->ttHit;
// At non-PV nodes we check for an early TT cutoff
if ( !PvNode
&& ss->ttHit
&& tte->depth() >= depth
&& tte->depth() > depth - (thisThread->id() % 2 == 1)
&& ttValue != VALUE_NONE // Possible in case of TT access race
&& (ttValue >= beta ? (tte->bound() & BOUND_LOWER)
: (tte->bound() & BOUND_UPPER)))
@@ -650,7 +684,7 @@ namespace {
if (ttValue >= beta)
{
// Bonus for a quiet ttMove that fails high
if (!pos.capture_or_promotion(ttMove))
if (!ttCapture)
update_quiet_stats(pos, ss, ttMove, stat_bonus(depth), depth);
// Extra penalty for early quiet moves of the previous ply
@@ -658,7 +692,7 @@ namespace {
update_continuation_histories(ss-1, pos.piece_on(prevSq), prevSq, -stat_bonus(depth + 1));
}
// Penalty for a quiet ttMove that fails low
else if (!pos.capture_or_promotion(ttMove))
else if (!ttCapture)
{
int penalty = -stat_bonus(depth);
thisThread->mainHistory[us][from_to(ttMove)] << penalty;
@@ -732,6 +766,7 @@ namespace {
// Skip early pruning when in check
ss->staticEval = eval = VALUE_NONE;
improving = false;
improvement = 0;
goto moves_loop;
}
else if (ss->ttHit)
@@ -752,15 +787,11 @@ namespace {
}
else
{
// In case of null move search use previous static eval with a different sign
// and addition of two tempos
if ((ss-1)->currentMove != MOVE_NULL)
ss->staticEval = eval = evaluate(pos);
else
ss->staticEval = eval = -(ss-1)->staticEval;
ss->staticEval = eval = evaluate(pos);
// Save static evaluation into transposition table
tte->save(posKey, VALUE_NONE, ss->ttPv, BOUND_NONE, DEPTH_NONE, MOVE_NONE, eval);
if (!excludedMove)
tte->save(posKey, VALUE_NONE, ss->ttPv, BOUND_NONE, DEPTH_NONE, MOVE_NONE, eval);
}
// Use static evaluation difference to improve quiet move ordering
@@ -770,15 +801,18 @@ namespace {
thisThread->mainHistory[~us][from_to((ss-1)->currentMove)] << bonus;
}
// Set up improving flag that is used in various pruning heuristics
// We define position as improving if static evaluation of position is better
// Than the previous static evaluation at our turn
// In case of us being in check at our previous move we look at move prior to it
improving = (ss-2)->staticEval == VALUE_NONE
? ss->staticEval > (ss-4)->staticEval || (ss-4)->staticEval == VALUE_NONE
: ss->staticEval > (ss-2)->staticEval;
// Set up the improvement variable, which is the difference between the current
// static evaluation and the previous static evaluation at our turn (if we were
// in check at our previous move we look at the move prior to it). The improvement
// margin and the improving flag are used in various pruning heuristics.
improvement = (ss-2)->staticEval != VALUE_NONE ? ss->staticEval - (ss-2)->staticEval
: (ss-4)->staticEval != VALUE_NONE ? ss->staticEval - (ss-4)->staticEval
: 200;
// Step 7. Futility pruning: child node (~50 Elo)
improving = improvement > 0;
// Step 7. Futility pruning: child node (~50 Elo).
// The depth condition is important for mate finding.
if ( !PvNode
&& depth < 9
&& eval - futility_margin(depth, improving) >= beta
@@ -791,7 +825,7 @@ namespace {
&& (ss-1)->statScore < 23767
&& eval >= beta
&& eval >= ss->staticEval
&& ss->staticEval >= beta - 20 * depth - 22 * improving + 168 * ss->ttPv + 159
&& ss->staticEval >= beta - 20 * depth - improvement / 15 + 204
&& !excludedMove
&& pos.non_pawn_material(us)
&& (ss->ply >= thisThread->nmpMinPly || us != thisThread->nmpColor))
@@ -799,7 +833,7 @@ namespace {
assert(eval - beta >= 0);
// Null move dynamic reduction based on depth and value
Depth R = (1090 + 81 * depth) / 256 + std::min(int(eval - beta) / 205, 3);
Depth R = std::min(int(eval - beta) / 205, 3) + depth / 3 + 4;
ss->currentMove = MOVE_NULL;
ss->continuationHistory = &thisThread->continuationHistory[0][0][NO_PIECE][0];
@@ -855,19 +889,16 @@ namespace {
assert(probCutBeta < VALUE_INFINITE);
MovePicker mp(pos, ttMove, probCutBeta - ss->staticEval, &captureHistory);
int probCutCount = 0;
bool ttPv = ss->ttPv;
ss->ttPv = false;
while ( (move = mp.next_move()) != MOVE_NONE
&& probCutCount < 2 + 2 * cutNode)
while ((move = mp.next_move()) != MOVE_NONE)
if (move != excludedMove && pos.legal(move))
{
assert(pos.capture_or_promotion(move));
assert(depth >= 5);
captureOrPromotion = true;
probCutCount++;
ss->currentMove = move;
ss->continuationHistory = &thisThread->continuationHistory[ss->inCheck]
@@ -901,15 +932,20 @@ namespace {
ss->ttPv = ttPv;
}
// Step 10. If the position is not in TT, decrease depth by 2
// Step 10. If the position is not in TT, decrease depth by 2 or 1 depending on node type
if ( PvNode
&& depth >= 6
&& !ttMove)
depth -= 2;
moves_loop: // When in check, search starts from here
if ( cutNode
&& depth >= 9
&& !ttMove)
depth--;
ttCapture = ttMove && pos.capture_or_promotion(ttMove);
moves_loop: // When in check, search starts here
int rangeReduction = 0;
// Step 11. A small Probcut idea, when we are in check
probCutBeta = beta + 409;
@@ -942,7 +978,6 @@ moves_loop: // When in check, search starts from here
value = bestValue;
singularQuietLMR = moveCountPruning = false;
bool doubleExtension = false;
// Indicate PvNodes that will probably fail low if the node was searched
// at a depth equal or greater than the current depth, and the result of this search was a fail low.
@@ -989,7 +1024,7 @@ moves_loop: // When in check, search starts from here
// Calculate new depth for this move
newDepth = depth - 1;
// Step 13. Pruning at shallow depth (~200 Elo)
// Step 13. Pruning at shallow depth (~200 Elo). Depth conditions are important for mate finding.
if ( !rootNode
&& pos.non_pawn_material(us)
&& bestValue > VALUE_TB_LOSS_IN_MAX_PLY)
@@ -998,7 +1033,7 @@ moves_loop: // When in check, search starts from here
moveCountPruning = moveCount >= futility_move_count(improving, depth);
// Reduced depth of the next LMR search
int lmrDepth = std::max(newDepth - reduction(improving, depth, moveCount), 0);
int lmrDepth = std::max(newDepth - reduction(improving, depth, moveCount, rangeReduction > 2), 0);
if ( captureOrPromotion
|| givesCheck)
@@ -1016,23 +1051,20 @@ moves_loop: // When in check, search starts from here
else
{
// Continuation history based pruning (~20 Elo)
if ( lmrDepth < 5
&& (*contHist[0])[movedPiece][to_sq(move)] < CounterMovePruneThreshold
&& (*contHist[1])[movedPiece][to_sq(move)] < CounterMovePruneThreshold)
if (lmrDepth < 5
&& (*contHist[0])[movedPiece][to_sq(move)]
+ (*contHist[1])[movedPiece][to_sq(move)]
+ (*contHist[3])[movedPiece][to_sq(move)] < -3000 * depth + 3000)
continue;
// Futility pruning: parent node (~5 Elo)
if ( lmrDepth < 7
&& !ss->inCheck
&& ss->staticEval + 174 + 157 * lmrDepth <= alpha
&& (*contHist[0])[movedPiece][to_sq(move)]
+ (*contHist[1])[movedPiece][to_sq(move)]
+ (*contHist[3])[movedPiece][to_sq(move)]
+ (*contHist[5])[movedPiece][to_sq(move)] / 3 < 28255)
if ( !ss->inCheck
&& lmrDepth < 8
&& ss->staticEval + 172 + 145 * lmrDepth <= alpha)
continue;
// Prune moves with negative SEE (~20 Elo)
if (!pos.see_ge(move, Value(-(30 - std::min(lmrDepth, 18)) * lmrDepth * lmrDepth)))
if (!pos.see_ge(move, Value(-21 * lmrDepth * lmrDepth - 21 * lmrDepth)))
continue;
}
}
@@ -1053,7 +1085,7 @@ moves_loop: // When in check, search starts from here
&& (tte->bound() & BOUND_LOWER)
&& tte->depth() >= depth - 3)
{
Value singularBeta = ttValue - 2 * depth;
Value singularBeta = ttValue - 3 * depth;
Depth singularDepth = (depth - 1) / 2;
ss->excludedMove = move;
@@ -1065,14 +1097,11 @@ moves_loop: // When in check, search starts from here
extension = 1;
singularQuietLMR = !ttCapture;
// Avoid search explosion by limiting the number of double extensions to at most 3
// Avoid search explosion by limiting the number of double extensions
if ( !PvNode
&& value < singularBeta - 93
&& ss->doubleExtensions < 3)
{
&& value < singularBeta - 75
&& ss->doubleExtensions <= 6)
extension = 2;
doubleExtension = true;
}
}
// Multi-cut pruning
@@ -1083,21 +1112,28 @@ moves_loop: // When in check, search starts from here
else if (singularBeta >= beta)
return singularBeta;
// If the eval of ttMove is greater than beta we try also if there is another
// move that pushes it over beta, if so also produce a cutoff.
// If the eval of ttMove is greater than beta, we reduce it (negative extension)
else if (ttValue >= beta)
{
ss->excludedMove = move;
value = search<NonPV>(pos, ss, beta - 1, beta, (depth + 3) / 2, cutNode);
ss->excludedMove = MOVE_NONE;
if (value >= beta)
return beta;
}
extension = -2;
}
// Capture extensions for PvNodes and cutNodes
else if ( (PvNode || cutNode)
&& captureOrPromotion
&& moveCount != 1)
extension = 1;
// Check extensions
else if ( givesCheck
&& depth > 6
&& abs(ss->staticEval) > Value(100))
&& abs(ss->staticEval) > 100)
extension = 1;
// Quiet ttMove extensions
else if ( PvNode
&& move == ttMove
&& move == ss->killers[0]
&& (*contHist[0])[movedPiece][to_sq(move)] >= 10000)
extension = 1;
// Add extension to new depth
@@ -1123,18 +1159,15 @@ moves_loop: // When in check, search starts from here
// cases where we extend a son if it has good chances to be "interesting".
if ( depth >= 3
&& moveCount > 1 + 2 * rootNode
&& ( !captureOrPromotion
|| (cutNode && (ss-1)->moveCount > 1)
|| !ss->ttPv)
&& (!PvNode || ss->ply > 1 || thisThread->id() % 4 != 3))
&& ( !ss->ttPv
|| !captureOrPromotion
|| (cutNode && (ss-1)->moveCount > 1)))
{
Depth r = reduction(improving, depth, moveCount);
Depth r = reduction(improving, depth, moveCount, rangeReduction > 2);
if (PvNode)
r--;
// Decrease reduction if the ttHit running average is large (~0 Elo)
if (thisThread->ttHitAverage > 537 * TtHitAverageResolution * TtHitAverageWindow / 1024)
// Decrease reduction if on the PV (~2 Elo)
if ( PvNode
&& bestMoveCount <= 3)
r--;
// Decrease reduction if position is or has been on the PV
@@ -1157,33 +1190,38 @@ moves_loop: // When in check, search starts from here
r--;
// Increase reduction for cut nodes (~3 Elo)
if (cutNode)
r += 1 + !captureOrPromotion;
if (cutNode && move != ss->killers[0])
r += 2;
if (!captureOrPromotion)
{
// Increase reduction if ttMove is a capture (~3 Elo)
if (ttCapture)
r++;
// Increase reduction if ttMove is a capture (~3 Elo)
if (ttCapture)
r++;
ss->statScore = thisThread->mainHistory[us][from_to(move)]
+ (*contHist[0])[movedPiece][to_sq(move)]
+ (*contHist[1])[movedPiece][to_sq(move)]
+ (*contHist[3])[movedPiece][to_sq(move)]
- 4923;
ss->statScore = thisThread->mainHistory[us][from_to(move)]
+ (*contHist[0])[movedPiece][to_sq(move)]
+ (*contHist[1])[movedPiece][to_sq(move)]
+ (*contHist[3])[movedPiece][to_sq(move)]
- 4923;
// Decrease/increase reduction for moves with a good/bad history (~30 Elo)
if (!ss->inCheck)
r -= ss->statScore / 14721;
}
// Decrease/increase reduction for moves with a good/bad history (~30 Elo)
r -= ss->statScore / 14721;
// In general we want to cap the LMR depth search at newDepth. But if
// reductions are really negative and movecount is low, we allow this move
// to be searched deeper than the first move, unless ttMove was extended by 2.
Depth d = std::clamp(newDepth - r, 1, newDepth + (r < -1 && moveCount <= 5 && !doubleExtension));
// In general we want to cap the LMR depth search at newDepth. But if reductions
// are really negative and movecount is low, we allow this move to be searched
// deeper than the first move (this may lead to hidden double extensions).
int deeper = r >= -1 ? 0
: moveCount <= 5 ? 2
: PvNode && depth > 6 ? 1
: 0;
Depth d = std::clamp(newDepth - r, 1, newDepth + deeper);
value = -search<NonPV>(pos, ss+1, -(alpha+1), -alpha, d, true);
// Range reductions (~3 Elo)
if (ss->staticEval - value < 30 && depth > 7)
rangeReduction++;
// If the son is reduced and fails high it will be re-searched at full depth
doFullDepthSearch = value > alpha && d < newDepth;
didLMR = true;
@@ -1250,9 +1288,11 @@ moves_loop: // When in check, search starts from here
for (Move* m = (ss+1)->pv; *m != MOVE_NONE; ++m)
rm.pv.push_back(*m);
// We record how often the best move has been changed in each
// iteration. This information is used for time management and LMR
if (moveCount > 1)
// We record how often the best move has been changed in each iteration.
// This information is used for time management and LMR. In MultiPV mode,
// we must take care to only do this for the first PV line.
if ( moveCount > 1
&& !thisThread->pvIdx)
++thisThread->bestMoveChanges;
}
else
@@ -1274,7 +1314,10 @@ moves_loop: // When in check, search starts from here
update_pv(ss->pv, move, (ss+1)->pv);
if (PvNode && value < beta) // Update alpha! Always alpha < beta
{
alpha = value;
bestMoveCount++;
}
else
{
assert(value >= beta); // Fail high
@@ -1322,7 +1365,7 @@ moves_loop: // When in check, search starts from here
// Bonus for prior countermove that caused the fail low
else if ( (depth >= 3 || PvNode)
&& !priorCapture)
update_continuation_histories(ss-1, pos.piece_on(prevSq), prevSq, stat_bonus(depth));
update_continuation_histories(ss-1, pos.piece_on(prevSq), prevSq, stat_bonus(depth) * (1 + (PvNode || cutNode)));
if (PvNode)
bestValue = std::min(bestValue, maxValue);
@@ -1433,7 +1476,6 @@ moves_loop: // When in check, search starts from here
}
else
// In case of null move search use previous static eval with a different sign
// and addition of two tempos
ss->staticEval = bestValue =
(ss-1)->currentMove != MOVE_NULL ? evaluate(pos)
: -(ss-1)->staticEval;
@@ -1473,6 +1515,10 @@ moves_loop: // When in check, search starts from here
{
assert(is_ok(move));
// Check for legality
if (!pos.legal(move))
continue;
givesCheck = pos.gives_check(move);
captureOrPromotion = pos.capture_or_promotion(move);
@@ -1511,13 +1557,6 @@ moves_loop: // When in check, search starts from here
// Speculative prefetch as early as possible
prefetch(TT.first_entry(pos.key_after(move)));
// Check for legality just before making the move
if (!pos.legal(move))
{
moveCount--;
continue;
}
ss->currentMove = move;
ss->continuationHistory = &thisThread->continuationHistory[ss->inCheck]
[captureOrPromotion]
@@ -1646,8 +1685,8 @@ moves_loop: // When in check, search starts from here
PieceType captured = type_of(pos.piece_on(to_sq(bestMove)));
bonus1 = stat_bonus(depth + 1);
bonus2 = bestValue > beta + PawnValueMg ? bonus1 // larger bonus
: std::min(bonus1, stat_bonus(depth)); // smaller bonus
bonus2 = bestValue > beta + PawnValueMg ? bonus1 // larger bonus
: stat_bonus(depth); // smaller bonus
if (!pos.capture_or_promotion(bestMove))
{
+1
View File
@@ -47,6 +47,7 @@ struct Stack {
Move excludedMove;
Move killers[2];
Value staticEval;
Depth depth;
int statScore;
int moveCount;
bool inCheck;
+341
View File
@@ -0,0 +1,341 @@
/*
Stockfish, a UCI chess playing engine derived from Glaurung 2.1
Copyright (C) 2004-2021 The Stockfish developers (see AUTHORS file)
Stockfish is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Stockfish is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef STOCKFISH_SIMD_H_INCLUDED
#define STOCKFISH_SIMD_H_INCLUDED
#if defined(USE_AVX2)
# include <immintrin.h>
#elif defined(USE_SSE41)
# include <smmintrin.h>
#elif defined(USE_SSSE3)
# include <tmmintrin.h>
#elif defined(USE_SSE2)
# include <emmintrin.h>
#elif defined(USE_MMX)
# include <mmintrin.h>
#elif defined(USE_NEON)
# include <arm_neon.h>
#endif
// The inline asm is only safe for GCC, where it is necessary to get good codegen.
// See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693
// Clang does fine without it.
// Play around here: https://godbolt.org/z/7EWqrYq51
#if (defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER))
#define USE_INLINE_ASM
#endif
namespace Stockfish::Simd {
#if defined (USE_AVX512)
[[maybe_unused]] static int m512_hadd(__m512i sum, int bias) {
return _mm512_reduce_add_epi32(sum) + bias;
}
/*
Parameters:
sum0 = [zmm0.i128[0], zmm0.i128[1], zmm0.i128[2], zmm0.i128[3]]
sum1 = [zmm1.i128[0], zmm1.i128[1], zmm1.i128[2], zmm1.i128[3]]
sum2 = [zmm2.i128[0], zmm2.i128[1], zmm2.i128[2], zmm2.i128[3]]
sum3 = [zmm3.i128[0], zmm3.i128[1], zmm3.i128[2], zmm3.i128[3]]
Returns:
ret = [
reduce_add_epi32(zmm0.i128[0]), reduce_add_epi32(zmm1.i128[0]), reduce_add_epi32(zmm2.i128[0]), reduce_add_epi32(zmm3.i128[0]),
reduce_add_epi32(zmm0.i128[1]), reduce_add_epi32(zmm1.i128[1]), reduce_add_epi32(zmm2.i128[1]), reduce_add_epi32(zmm3.i128[1]),
reduce_add_epi32(zmm0.i128[2]), reduce_add_epi32(zmm1.i128[2]), reduce_add_epi32(zmm2.i128[2]), reduce_add_epi32(zmm3.i128[2]),
reduce_add_epi32(zmm0.i128[3]), reduce_add_epi32(zmm1.i128[3]), reduce_add_epi32(zmm2.i128[3]), reduce_add_epi32(zmm3.i128[3])
]
*/
[[maybe_unused]] static __m512i m512_hadd128x16_interleave(
__m512i sum0, __m512i sum1, __m512i sum2, __m512i sum3) {
__m512i sum01a = _mm512_unpacklo_epi32(sum0, sum1);
__m512i sum01b = _mm512_unpackhi_epi32(sum0, sum1);
__m512i sum23a = _mm512_unpacklo_epi32(sum2, sum3);
__m512i sum23b = _mm512_unpackhi_epi32(sum2, sum3);
__m512i sum01 = _mm512_add_epi32(sum01a, sum01b);
__m512i sum23 = _mm512_add_epi32(sum23a, sum23b);
__m512i sum0123a = _mm512_unpacklo_epi64(sum01, sum23);
__m512i sum0123b = _mm512_unpackhi_epi64(sum01, sum23);
return _mm512_add_epi32(sum0123a, sum0123b);
}
[[maybe_unused]] static __m128i m512_haddx4(
__m512i sum0, __m512i sum1, __m512i sum2, __m512i sum3,
__m128i bias) {
__m512i sum = m512_hadd128x16_interleave(sum0, sum1, sum2, sum3);
__m256i sum256lo = _mm512_castsi512_si256(sum);
__m256i sum256hi = _mm512_extracti64x4_epi64(sum, 1);
sum256lo = _mm256_add_epi32(sum256lo, sum256hi);
__m128i sum128lo = _mm256_castsi256_si128(sum256lo);
__m128i sum128hi = _mm256_extracti128_si256(sum256lo, 1);
return _mm_add_epi32(_mm_add_epi32(sum128lo, sum128hi), bias);
}
[[maybe_unused]] static void m512_add_dpbusd_epi32(
__m512i& acc,
__m512i a,
__m512i b) {
# if defined (USE_VNNI)
# if defined (USE_INLINE_ASM)
asm(
"vpdpbusd %[b], %[a], %[acc]\n\t"
: [acc]"+v"(acc)
: [a]"v"(a), [b]"vm"(b)
);
# else
acc = _mm512_dpbusd_epi32(acc, a, b);
# endif
# else
# if defined (USE_INLINE_ASM)
__m512i tmp = _mm512_maddubs_epi16(a, b);
asm(
"vpmaddwd %[tmp], %[ones], %[tmp]\n\t"
"vpaddd %[acc], %[tmp], %[acc]\n\t"
: [acc]"+v"(acc), [tmp]"+&v"(tmp)
: [ones]"v"(_mm512_set1_epi16(1))
);
# else
__m512i product0 = _mm512_maddubs_epi16(a, b);
product0 = _mm512_madd_epi16(product0, _mm512_set1_epi16(1));
acc = _mm512_add_epi32(acc, product0);
# endif
# endif
}
[[maybe_unused]] static void m512_add_dpbusd_epi32x2(
__m512i& acc,
__m512i a0, __m512i b0,
__m512i a1, __m512i b1) {
# if defined (USE_VNNI)
# if defined (USE_INLINE_ASM)
asm(
"vpdpbusd %[b0], %[a0], %[acc]\n\t"
"vpdpbusd %[b1], %[a1], %[acc]\n\t"
: [acc]"+v"(acc)
: [a0]"v"(a0), [b0]"vm"(b0), [a1]"v"(a1), [b1]"vm"(b1)
);
# else
acc = _mm512_dpbusd_epi32(acc, a0, b0);
acc = _mm512_dpbusd_epi32(acc, a1, b1);
# endif
# else
# if defined (USE_INLINE_ASM)
__m512i tmp0 = _mm512_maddubs_epi16(a0, b0);
__m512i tmp1 = _mm512_maddubs_epi16(a1, b1);
asm(
"vpaddsw %[tmp0], %[tmp1], %[tmp0]\n\t"
"vpmaddwd %[tmp0], %[ones], %[tmp0]\n\t"
"vpaddd %[acc], %[tmp0], %[acc]\n\t"
: [acc]"+v"(acc), [tmp0]"+&v"(tmp0)
: [tmp1]"v"(tmp1), [ones]"v"(_mm512_set1_epi16(1))
);
# else
__m512i product0 = _mm512_maddubs_epi16(a0, b0);
__m512i product1 = _mm512_maddubs_epi16(a1, b1);
product0 = _mm512_adds_epi16(product0, product1);
product0 = _mm512_madd_epi16(product0, _mm512_set1_epi16(1));
acc = _mm512_add_epi32(acc, product0);
# endif
# endif
}
#endif
#if defined (USE_AVX2)
[[maybe_unused]] static int m256_hadd(__m256i sum, int bias) {
__m128i sum128 = _mm_add_epi32(_mm256_castsi256_si128(sum), _mm256_extracti128_si256(sum, 1));
sum128 = _mm_add_epi32(sum128, _mm_shuffle_epi32(sum128, _MM_PERM_BADC));
sum128 = _mm_add_epi32(sum128, _mm_shuffle_epi32(sum128, _MM_PERM_CDAB));
return _mm_cvtsi128_si32(sum128) + bias;
}
[[maybe_unused]] static __m128i m256_haddx4(
__m256i sum0, __m256i sum1, __m256i sum2, __m256i sum3,
__m128i bias) {
sum0 = _mm256_hadd_epi32(sum0, sum1);
sum2 = _mm256_hadd_epi32(sum2, sum3);
sum0 = _mm256_hadd_epi32(sum0, sum2);
__m128i sum128lo = _mm256_castsi256_si128(sum0);
__m128i sum128hi = _mm256_extracti128_si256(sum0, 1);
return _mm_add_epi32(_mm_add_epi32(sum128lo, sum128hi), bias);
}
[[maybe_unused]] static void m256_add_dpbusd_epi32(
__m256i& acc,
__m256i a,
__m256i b) {
# if defined (USE_VNNI)
# if defined (USE_INLINE_ASM)
asm(
"vpdpbusd %[b], %[a], %[acc]\n\t"
: [acc]"+v"(acc)
: [a]"v"(a), [b]"vm"(b)
);
# else
acc = _mm256_dpbusd_epi32(acc, a, b);
# endif
# else
# if defined (USE_INLINE_ASM)
__m256i tmp = _mm256_maddubs_epi16(a, b);
asm(
"vpmaddwd %[tmp], %[ones], %[tmp]\n\t"
"vpaddd %[acc], %[tmp], %[acc]\n\t"
: [acc]"+v"(acc), [tmp]"+&v"(tmp)
: [ones]"v"(_mm256_set1_epi16(1))
);
# else
__m256i product0 = _mm256_maddubs_epi16(a, b);
product0 = _mm256_madd_epi16(product0, _mm256_set1_epi16(1));
acc = _mm256_add_epi32(acc, product0);
# endif
# endif
}
[[maybe_unused]] static void m256_add_dpbusd_epi32x2(
__m256i& acc,
__m256i a0, __m256i b0,
__m256i a1, __m256i b1) {
# if defined (USE_VNNI)
# if defined (USE_INLINE_ASM)
asm(
"vpdpbusd %[b0], %[a0], %[acc]\n\t"
"vpdpbusd %[b1], %[a1], %[acc]\n\t"
: [acc]"+v"(acc)
: [a0]"v"(a0), [b0]"vm"(b0), [a1]"v"(a1), [b1]"vm"(b1)
);
# else
acc = _mm256_dpbusd_epi32(acc, a0, b0);
acc = _mm256_dpbusd_epi32(acc, a1, b1);
# endif
# else
# if defined (USE_INLINE_ASM)
__m256i tmp0 = _mm256_maddubs_epi16(a0, b0);
__m256i tmp1 = _mm256_maddubs_epi16(a1, b1);
asm(
"vpaddsw %[tmp0], %[tmp1], %[tmp0]\n\t"
"vpmaddwd %[tmp0], %[ones], %[tmp0]\n\t"
"vpaddd %[acc], %[tmp0], %[acc]\n\t"
: [acc]"+v"(acc), [tmp0]"+&v"(tmp0)
: [tmp1]"v"(tmp1), [ones]"v"(_mm256_set1_epi16(1))
);
# else
__m256i product0 = _mm256_maddubs_epi16(a0, b0);
__m256i product1 = _mm256_maddubs_epi16(a1, b1);
product0 = _mm256_adds_epi16(product0, product1);
product0 = _mm256_madd_epi16(product0, _mm256_set1_epi16(1));
acc = _mm256_add_epi32(acc, product0);
# endif
# endif
}
#endif
#if defined (USE_SSSE3)
[[maybe_unused]] static int m128_hadd(__m128i sum, int bias) {
sum = _mm_add_epi32(sum, _mm_shuffle_epi32(sum, 0x4E)); //_MM_PERM_BADC
sum = _mm_add_epi32(sum, _mm_shuffle_epi32(sum, 0xB1)); //_MM_PERM_CDAB
return _mm_cvtsi128_si32(sum) + bias;
}
[[maybe_unused]] static __m128i m128_haddx4(
__m128i sum0, __m128i sum1, __m128i sum2, __m128i sum3,
__m128i bias) {
sum0 = _mm_hadd_epi32(sum0, sum1);
sum2 = _mm_hadd_epi32(sum2, sum3);
sum0 = _mm_hadd_epi32(sum0, sum2);
return _mm_add_epi32(sum0, bias);
}
[[maybe_unused]] static void m128_add_dpbusd_epi32(
__m128i& acc,
__m128i a,
__m128i b) {
# if defined (USE_INLINE_ASM)
__m128i tmp = _mm_maddubs_epi16(a, b);
asm(
"pmaddwd %[ones], %[tmp]\n\t"
"paddd %[tmp], %[acc]\n\t"
: [acc]"+v"(acc), [tmp]"+&v"(tmp)
: [ones]"v"(_mm_set1_epi16(1))
);
# else
__m128i product0 = _mm_maddubs_epi16(a, b);
product0 = _mm_madd_epi16(product0, _mm_set1_epi16(1));
acc = _mm_add_epi32(acc, product0);
# endif
}
[[maybe_unused]] static void m128_add_dpbusd_epi32x2(
__m128i& acc,
__m128i a0, __m128i b0,
__m128i a1, __m128i b1) {
# if defined (USE_INLINE_ASM)
__m128i tmp0 = _mm_maddubs_epi16(a0, b0);
__m128i tmp1 = _mm_maddubs_epi16(a1, b1);
asm(
"paddsw %[tmp1], %[tmp0]\n\t"
"pmaddwd %[ones], %[tmp0]\n\t"
"paddd %[tmp0], %[acc]\n\t"
: [acc]"+v"(acc), [tmp0]"+&v"(tmp0)
: [tmp1]"v"(tmp1), [ones]"v"(_mm_set1_epi16(1))
);
# else
__m128i product0 = _mm_maddubs_epi16(a0, b0);
__m128i product1 = _mm_maddubs_epi16(a1, b1);
product0 = _mm_adds_epi16(product0, product1);
product0 = _mm_madd_epi16(product0, _mm_set1_epi16(1));
acc = _mm_add_epi32(acc, product0);
# endif
}
#endif
}
#endif // STOCKFISH_SIMD_H_INCLUDED
+5 -2
View File
@@ -60,10 +60,13 @@ public:
Pawns::Table pawnsTable;
Material::Table materialTable;
size_t pvIdx, pvLast;
uint64_t ttHitAverage;
RunningAverage doubleExtensionAverage[COLOR_NB];
uint64_t nodesLastExplosive;
uint64_t nodesLastNormal;
std::atomic<uint64_t> nodes, tbHits, bestMoveChanges;
int selDepth, nmpMinPly;
Color nmpColor;
std::atomic<uint64_t> nodes, tbHits, bestMoveChanges;
ExplosionState state;
Position rootPos;
StateInfo rootState;
+7 -3
View File
@@ -68,6 +68,9 @@ void TimeManagement::init(Search::LimitsType& limits, Color us, int ply) {
TimePoint timeLeft = std::max(TimePoint(1),
limits.time[us] + limits.inc[us] * (mtg - 1) - moveOverhead * (2 + mtg));
// Use extra time with larger increments
double optExtra = std::clamp(1.0 + 12.0 * limits.inc[us] / limits.time[us], 1.0, 1.12);
// A user may scale time usage by setting UCI option "Slow Mover"
// Default is 100 and changing this value will probably lose elo.
timeLeft = slowMover * timeLeft / 100;
@@ -78,15 +81,16 @@ void TimeManagement::init(Search::LimitsType& limits, Color us, int ply) {
if (limits.movestogo == 0)
{
optScale = std::min(0.0084 + std::pow(ply + 3.0, 0.5) * 0.0042,
0.2 * limits.time[us] / double(timeLeft));
0.2 * limits.time[us] / double(timeLeft))
* optExtra;
maxScale = std::min(7.0, 4.0 + ply / 12.0);
}
// x moves in y seconds (+ z increment)
else
{
optScale = std::min((0.8 + ply / 128.0) / mtg,
0.8 * limits.time[us] / double(timeLeft));
optScale = std::min((0.88 + ply / 116.4) / mtg,
0.88 * limits.time[us] / double(timeLeft));
maxScale = std::min(6.3, 1.5 + 0.11 * mtg);
}
+5
View File
@@ -173,6 +173,11 @@ enum Bound {
BOUND_EXACT = BOUND_UPPER | BOUND_LOWER
};
enum ExplosionState {
EXPLOSION_NONE,
MUST_CALM_DOWN
};
enum Value : int {
VALUE_ZERO = 0,
VALUE_DRAW = 0,
+1 -1
View File
@@ -164,7 +164,7 @@ Option& Option::operator=(const string& v) {
assert(!type.empty());
if ( (type != "button" && v.empty())
if ( (type != "button" && type != "string" && v.empty())
|| (type == "check" && v != "true" && v != "false")
|| (type == "spin" && (stof(v) < min || stof(v) > max)))
return *this;