Avoid shifting negative signed integers and use typed
enum to avoids decrementing a variable beyond its defined
range, like:
for (Rank r = RANK_8; r >= RANK_1; --r)
Changes were tested individually and passed SPRT[-3, 1].
With this patch gcc --sanitize builds cleanly.
No functional change.
Instead of outputting "info nodes ... time ..." when the last
iteration is interrupted, simply call UCI::pv() to output the PV.
I thought about calling UCI:pv() with bounds -VALUE_INFINITE, VALUE_INFINITE
to avoid "lowerbound" or "upperbound" appearing in it, but I'm not sure that
would be any better.
This patch fixes rare inconsistencies between the first move of
the last PV output and the bestmove played. It also makes sure
that all the latest statistics are sent to the GUI (not only nodes
and time but also nps, tbhits, hashfull).
No functional change.
Restore original behaviour to reset
the counter before a new move search.
Also fixed some warnings and added const
qualifier to a couple of functions, as
suggested by m_stembera.
Thanks to Werner Bergmans for reporting
the regression.
No functional change.
Use a per-thread counter to reduce contention
with many cores and endgame positions.
Measured around 1% speed-up on a 12 core and 8%
on 28 cores with 6-men, searching on:
7R/1p3k2/2p2P2/3nR1P1/8/3b1P2/7K/r7 b - - 3 38
Also retire the unused set_nodes_searched() and fix
a couple of return types and naming conventions.
No functional change.
For a default bench, this fixes the last valgrind
error (jump on uninitialised value).
Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 187869 W: 33303 L: 33463 D: 121103
No functional change.
We reference (ss-1)->currentMove, i.e. we peek
current move of the parent node, so currentMove
should be valid in the main move loop, when we
search() the subtree, but outside of main loop
it is useless.
No functional change.
Also a speedup since we don't need to recalculate SEE
for extensions...as it already determined to be positive.
Results for 12 tests for each version:
Base Test Diff
Mean 2132395 2191002 -58607
StDev 128058 85917 134239
p-value: 0.669
speedup: 0.027
Non functional change.
The target:
Odroid U3 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275)
Debian Jessie
As listed in #550 and #638 three modifications are needed for compilation to work:
float-abi flag for GCC If an FPU is present and supported by the installed os then passed value need to be hard.
I didn't find any better solution than using readelf to check for the availibilty of Tag_ABI_VFP_args which sould indicate support for the FPU. The check is only done if the arch is arm and if readelf is not present
on the system, there will be an error (/bin/sh: 1: readelf: not found) but it will not break and will continue with the default softfp value. Outputing the error is not really acceptable but I wanted some feedback on the
check itself.
-lpthread is needed on armv7 outside of Android
I replaced UNAME with KERNEL and OS to allow to differentiate Android.
m32 flag
My understanding is that outside of Android the flag is generating errors on armv7.
These modifications should introduce change only for non Android armv7 build.
No functional change.
The target:
Odroid U3 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=g138745696275)
Debian Jessie
As listed in #550 and #638 three modifications are needed for compilation to work:
float-abi flag for GCC If an FPU is present and supported by the installed os then passed value need to be hard.
I didn't find any better solution than using readelf to check for the availibilty of Tag_ABI_VFP_args which sould indicate support for the FPU. The check is only done if the arch is arm and if readelf is not present
on the system, there will be an error (/bin/sh: 1: readelf: not found) but it will not break and will continue with the default softfp value. Outputing the error is not really acceptable but I wanted some feedback on the
check itself.
-lpthread is needed on armv7 outside of Android
I replaced UNAME with KERNEL and OS to allow to differentiate Android.
m32 flag
My understanding is that outside of Android the flag is generating errors on armv7.
These modifications should introduce change only for non Android armv7 build.
No functional change.
Stephane's patch removes the only usage of Position::see, where the
returned value isn't immediately compared with a value. So I replaced
this function by its optimised and more specific version see_ge. This
function also supersedes the function Position::see_sign.
bool Position::see_ge(Move m, Value v) const;
This function tests if the SEE of a move is greater or equal than a
given value. We use forward iteration on captures instread of backward
one, therefore we don't need the swapList array. Also we stop as soon
as we have enough information to obtain the result, avoiding unnecessary
calls to the min_attacker function.
Speed tests (Windows 7), 20 runs for each engine:
Test engine: mean 866648, st. dev. 5964
Base engine: mean 846751, st. dev. 22846
Speedup: 1.023
Speed test by Stephane Nicolet
Fishtest STC test:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26040 W: 4675 L: 4442 D: 16923
http://tests.stockfishchess.org/tests/view/57f648990ebc59038170fa03
No functional change.
This is a bit tricky because we don't want
to prune the only legal evasions, even if
with negative SEE. So add an assert to avoid
this subtle bug to slip in later.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14140 W: 2625 L: 2421 D: 9094
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 11558 W: 1555 L: 1379 D: 8624
bench: 5256717
Rename shift_bb() to shift(), and DELTA_S to SOUTH, etc.
to improve code readability, especially in evaluate.cpp
when they are used together:
old b = shift_bb<DELTA_S>(pos.pieces(PAWN))
new b = shift<SOUTH>(pos.pieces(PAWN))
While there fix some small code style issues.
No functional change.
Drop useless condition
abs(ttValue) < VALUE_KNOWN_WIN
And extend singular extension search to cases when ttValue
stores a mate score. This improves mate finding and does
not introduce any regression.
Yery tested this patch against current master on the 6500+
Chest mate suite with 200K fixed nodes:
shortest mates found: master: 1206 patch:1205
any mate found: master: 1903 patch: 2003
with 1 sec time:
shortest mates found: master: 2667 patch: 2628
any mate found: master: 3585 patch: 3646
Verified for no regression:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25655 W: 4578 L: 4465 D: 16612
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 66247 W: 8618 L: 8557 D: 49072
bench: 6335042
Both Tablebases::filter_root_moves() and
extract_ponder_from_tt(9 were unable to handle
a mate/stalemate position.
Spotted and reported by Dann Corbit.
Added some mate/stalemate positions to bench so
to early catch this regression in the future.
No functional change.
Use the following transformations:
- to check that A is included in B, testing "(A & ~B) == 0" is faster
than "(A & B) == A"
- to remove the intersection of A and B from A, doing "A &= ~B;" is as
fast as "if (A & B) A &= ~B;" but is simpler.
Overall, the simpler patch version is 0.3% than current master.
No functional change.
Correct pinners calculation and fix bug with pinned
pieces giving check. With this patch 'pinners' only
returns sliders with exactly one defensive piece between
the slider and the attacked square (in other words, pinners
returns exact pinners).
This was a co-operation between Marco Costalba,
Stphane Nicolet and me.
Special thanks to Ronald de Man for reporting the bug with
pinned pieces giving check, discussed here:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/S_4E_Xs5HaE
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 132118 W: 23578 L: 23645 D: 84895
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36424 W: 4770 L: 4670 D: 26984
bench: 6272231
Instrumentation shows that in make_score(mg, eg) calls, the mg value is
zero in 25,9% of the calls while the eg value is zero in 36,8% of the
calls.
Swapping the internal fields of mg and eg in the internal
representation of Score allows the compiler to optimize away the shift
in (eg << 16) + mg in more cases, thus resulting in a 0.3% speed-up
overall.
No functional change
Rescales the king danger variables in evaluate_king() to
suppress the KingDanger[] array. This avoids the cost of
the memory accesses to the array and simplifies the non-linear
transformation used.
Full credits to "hxim" for the seminal idea and implementation,
see pull request #786.
https://github.com/official-stockfish/Stockfish/pull/786
Passed STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 9649 W: 1829 L: 1689 D: 6131
Passed LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 53494 W: 7254 L: 7178 D: 39062
Bench: 6116200
Drops a scalability bottleneck due to memory contention
of a single shared table across threads. The effect starts
to be sensible with a high number of threads. Specifically
we have a small regression with 7 threads both at 60 and
180 seconds TC:
10000 @ 60+0.6 th 7
ELO: -2.46 +-3.2 (95%) LOS: 6.5%
Total: 9896 W: 1037 L: 1107 D: 7752
5000 @ 180+0.6 th 7
ELO: -1.95 +-4.1 (95%) LOS: 17.7%
Total: 5000 W: 444 L: 472 D: 4084
We have a regression because counterMoveHistory table is
quite big and it takes time for a single thread to fill it.
Sharing the table yields to a higher fill rate and better
quality of moves and up to 7 threads the benefits of sharing
more then compensate the loss in speed due to contention.
Interestingly even with a 3X longer TC, so with more time
for the single thread to catch up, the improvment is quite
limited and below noise level. It seems we really need much
longer TC to saturate the table.
When we move to high threads number it's another story:
5000 @ 60+0.6 th 22
ELO: 3.49 +-4.3 (95%) LOS: 94.6%
Total: 4880 W: 490 L: 441 D: 3949
2000 @ 60+0.6 th 32
ELO: 8.34 +-6.9 (95%) LOS: 99.1%
Total: 2000 W: 229 L: 181 D: 1590
As expected the speed-up more than compensates the filling
rate, and we expect that with tournament TC, where single
thread is able to saturate the table, the difference will
be even stronger. For instance for TCEC 9 super-final time
control will be 180 minutes + 15 seconds and this scalability
improvement seems definitely the way to go.
So, summarizing:
GOOD:
Measured big improvement in high core scenario
Suitable for TCEC 9 superfinal (big hardware, very long TC)
Consistent and natural patch that extends to counterMoveHistory
what we already do for remaining history tables, that are all per-thread
Non functional change for the common case of a single core
Very simple (just 6 lines modified, no added ones)
BAD:
Small regression (within 2-3 ELO) with few threads and short TC
bench: 5341477
Measured bench speed up goes from 0,7% to 2%,
given the unreliable measure a reverse simmplification
test was done on fishtest:
master vs patch
LLR: -2.94 (-2.94,2.94) [-3.00,1.00]
Total: 15499 W: 2685 L: 2867 D: 9947
Test result is positive, master is weaker.
No functional change.
This is the most compact and neatest version
is was able to produce.
On normal builds I have a small slowdown:
normal builds base vs. simplification (gcc 4.8.1 Win7-64 i7-3770 @ 3.4GHz x86-64-modern)
Results for 20 tests for each version:
Base Test Diff
Mean 1974744 1969333 5411
StDev 11825 10281 5874
p-value: 0,178
speedup: -0,003
On pgo-builds however I measure a nice 1.1% speedup
pgo-builds base vs. simplification
Results for 20 tests for each version:
Base Test Diff
Mean 1974119 1995444 -21325
StDev 8703 5717 4623
p-value: 1
speedup: 0,011
No functional change.
Don't allow pinned pieces to attack the exchange-square as long all
pinners (this includes also potential ones) are on their original
square.
As soon a pinner moves to the exchange-square or get captured on it, we
fall back to standard SEE behaviour.
This correctly handles the majority of cases with absolute pins.
bench: 6883133
In evaluate, we start by initializing the pos.psq_score
and adding the material imbalance. After that, we check
whether a specialized eval exists and if yes we return
that value and discard whatever we have computed until now.
It sounds more logical to first probe material entry and
return if we have a specialized eval, and only if it is
not the case initialize eval with some values. There is
no measurable speed-difference on my computer.
Non functional change.
In case we have installed a not complete set of 6-men tables and
there is 6 piece position on board, but no corresponding
tablebase engine is not using any syzygy at all.
Reported by Jouni Uski, fix by Peter Österlund,
confirmed as a bug by Ronald de Man.
bench: 7591630
If the opponent has a cramped position, opening a file often
helps him/her to exchange pieces, so it makes sense to reduce
the space bonus if there are open files.
Credits: Leonardo Ljubičić for the strategic idea, Alain Savard for the
implementation of the open files calculation, "CrunchyNYC" for the
compensation of the numerator.
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 49112 W: 9239 L: 8900 D: 30973
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 89415 W: 12014 L: 11601 D: 65800
Bench: 7591630
This greately simplifies usage because hides to the
search the implementation specific CheckInfo.
This is based on the work done by Marco in pull request #716,
implementing on top of it the ideas in the discussion: caching
the calls to slider_blockers() in the CheckInfo structure,
and simplifying the slider_blockers() function by removing its
first parameter.
Compared to master, bench is identical but the number of calls
to slider_blockers() during bench goes down from 22461515 to 18853422,
hopefully being a little bit faster overall.
archlinux, gcc-6
make profile-build ARCH=x86-64-bmi2
50 runs each
bench:
base = 2356320 +/- 981
test = 2403811 +/- 981
diff = 47490 +/- 1828
speedup = 0.0202
P(speedup > 0) = 1.0000
perft 6:
base = 175498484 +/- 429925
test = 183997959 +/- 429925
diff = 8499474 +/- 469401
speedup = 0.0484
P(speedup > 0) = 1.0000
perft 7 (but only 10 runs):
base = 185403228 +/- 468705
test = 188777591 +/- 468705
diff = 3374363 +/- 476687
speedup = 0.0182
P(speedup > 0) = 1.0000
$ ./pyshbench ../Stockfish/master ../Stockfish/test 20
run base test diff
...
base = 2501728 +/- 182034
test = 2532997 +/- 182034
diff = 31268 +/- 5116
speedup = 0.0125
P(speedup > 0) = 1.0000
No functional change.
This non-functional change patch is a deep work to allow SF to be independent
from the actual value of ONE_PLY (currently set to 1). I have verified SF is
now independent for ONE_PLY values 1, 2, 4, 8, 16, 32 and 256.
This patch gives consistency to search code and enables future work, opening
the door to safely tweaking the ONE_PLY value for any reason.
Verified for no speed regression at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 95643 W: 17728 L: 17737 D: 60178
No functional change.
Rewritten in a way to have explicit in the search
the bonus/penalty we apply: hopefully this will lead
to further simplification/fix of current rather messy
stats update code.
No functional change.
STC: (Yellow)
LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 69115 W: 12880 L: 12797 D: 43438
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 124163 W: 16923 L: 16442 D: 90798
Note: Note based off past experiments / patches... history pruning
is quite TC sensitive. I believe the reason for this TC dependency
is that the CMH/FMH is a very large table that takes time to fill
up with. In addition having more time for will increase the accuracy
of the stats' value.
Bench: 7351698
GCC uses SSE instructions to move data but in 32-bit gcc version used
by abrok the stack is not 16-byte aligned due to a bug.
This patch workaround teh bug by not using the stack
to store KingFlank[]
Fixes issue #721.
No functional change.
Checking for legality of a possible ponder move
must be done before we undo the first pv move,
of course. (spotted by mohammed li.)
This obviously only has any effect when playing in ponder mode.
No functional change.
Take advantage that VALUE_NONE = 32002 to remove
the condition.
Commented out and not removed becuase it is tricky
to rely on the hidden value of VALUE_NONE and code
can break in case we change VALUE_NONE in the future.
No functional change.
This code was added before the accurate pv patch, when
we retrieved PV directly from TT.
It's not required for correct (and long) PVs any more and
should be safe to remove it.
Also, allowing helper threads to repeatedly over-write
TT doesn't seem to make sense(that was probably an un-intended
side-effect of lazy smp). Before Lazy SMP only Main Thread used
to run ID loop and insert PV into TT.
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74346 W: 13946 L: 13918 D: 46482
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 47265 W: 6531 L: 6447 D: 34287
bench: 8819179
Allow to specifiy the log file name, this comes
handy in case of self-matches so that each SF
instance writes into a different log file.
No functional change.
Currently root moves are copied to all teh threads
but are DTZ filtered only in main thread at the
beginning of teh search.
This patch moves the TB filtering before the
copy of root moves fixing issue #679https://github.com/official-stockfish/Stockfish/issues/679
No bench change.
There are two concepts with this patch:
Limit check extensions by using move count.
The idea is to limit search explosion.
Always extend check if the first move gives check.
The idea is to save expensive SEE calls, since the vast
majority of first move will have SEE value >= 0, also
first move may still be strong even if the SEE is negative.
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 16503 W: 3068 L: 2873 D: 10562
LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 37202 W: 5261 L: 5014 D: 26927
bench: 8543366
Moving a few lines from evaluate_threats to evaluate_pieces allows to
a) Remove a condition pos.count<QUEEN>(Them) == 1
b) use precalculated s instead of pos.square(Them)
c) do not check the condition at all in queenless endings
Passed STC
http://tests.stockfishchess.org/tests/view/5752e0410ebc59029919b1f4
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 67175 W: 12194 L: 12152 D: 42829
and LTC
http://tests.stockfishchess.org/tests/view/57587b140ebc59029919b2f4
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 20276 W: 2774 L: 2653 D: 14849
bench: 7907962
In this position: 3K4/8/3k4/8/4p3/4B3/5P2/8 w - - 0 5
Current DTZ probe returns 1 instead of 15
What happens is that the double push f4 is erroneously detected as a win move.
After the push we have:
[D]3K4/8/3k4/8/4pP2/4B3/8/8 b - f3 0 5
And here the code misses the possible ep capture exf3.
The bug is in probe_dtz_no_ep() where is used probe_ab() that is
blind to ep captures so it returns v == 2 (win) for position
3K4/8/3k4/8/4pP2/4B3/8/8 b - f3 0 5
Note that at the caller site the original position did not have any
possible ep capture, so probe_dtz() returns immediately after calling
probe_dtz_no_ep().
The fix is to call the ep-aware probe_wdl() instead of probe_ab()
I have verified that DTZ is correct now and also there are no more
mistmatches compared to the new 'syzygy' branch. Tested on a set of
more than 600 endgame positions, included some tricky ones.
For people interested to redo the test or doing additional tests
please pull branch tb_dbg from https://github.com/mcostalba/Stockfish repo.
bench: 8450534 (bench unaffected because syzygy is not exercized during bench)
A middle ground patch of two successful tuning patches,
one at STC, the other at LTC, which now passed both.
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 67893 W: 12777 L: 12384 D: 42732
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 30165 W: 4189 L: 3960 D: 22016
bench: 9209507
Introducing a new multi-purpose penalty related to King safety, which
includes all kind of potential checks (from unsafe or unavailable
squares currently occupied by some other piece)
This will indirectly detect and reward some pins, discovered checks, and
motifs such as square vacation, or rook behind its pawn and aligned with
King (example Black Rg8, g7 against Kg1),
and penalize some pawn blockers (if they move, it allows a discovered
check by the pawn).
And since it looks also at protected squares, it detects some potential
defense overloading.
Finally, the rook contact checks had been removed some time ago. This
test will give a small bonus for them, as well as for bishop contact
checks.
Passed STC
http://tests.stockfishchess.org/tests/view/5729ec740ebc59301a354b36
LLR: 2.94 (-2.94,2.94) [0.00,5.00]
Total: 13306 W: 2477 L: 2296 D: 8533
and LTC
http://tests.stockfishchess.org/tests/view/572a5be00ebc59301a354b65
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 20369 W: 2750 L: 2565 D: 15054
bench: 9298175
Just use _mm_popcnt_u64() that is available
both for MSVC abd Intel compiler.
Verified on MSVC that the produced assembly
has the hardware 'popcnt' instruction.
No functional change.
Currently, helper threads will only search up to the
specified depth limit. Now let them search until the
main thread has finished the specified depth.
On the other hand, we don't want to pick a thread with
a higher search depth.
This may be considered cheating. ;-)
No functional change.
It seems that icc used our fallback version of popcount.
Now use intrinsics.
icc version 16.0.2 (gcc version 5.3.0 compatibility)
bmi2 compile
uname -r 4.5.1-1-ARCH
20xbench gives a nice speedup
./stockfish-icc-master 2161515 +- 34462
./stockfish-icc-sse42 2260857 +- 50349
I could not find anything documented that is necessary that prepending -mbmi to -mbmi2 gives some benefit.
Instead at
https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions
The following built-in functions are available when -mbmi is used. All of them generate the machine instruction that is part of the name.
unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
The following built-in functions are available when -mbmi2 is used. All of them generate the machine instruction that is part of the name.
unsigned int _bzhi_u32 (unsigned int, unsigned int)
unsigned int _pdep_u32 (unsigned int, unsigned int)
unsigned int _pext_u32 (unsigned int, unsigned int)
unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
unsigned long long _pext_u64 (unsigned long long, unsigned long long)
and at
https://gcc.gnu.org/ml/gcc/2014-02/msg00204.html
( "... The real optimization comes from being able to use pext
(parallel bit extract), which can implement several bextr expressions in
parallel.")
Apart from that we don't use all -msse -msse2 -msse3 -msse4.2 etc. but just -msse3 (or -msse4.2) only.
As regards to the speedup within noise level - this pull request is actually reversal of mcostalba#198 wherein prepending -mbmi to -mbmi2 was claimed to be 0.3% faster and here (removing -mbmi) gives 0.4% speed gain.
Seems to give around 1% speed-up for CPUs with popcnt support.
Seems to give a very minor speed-up for CPUs without popcnt.
No functional change
Resolves#646
In this position we should have draw for repetition:
position fen rnbqkbnr/2pppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 moves g1f3 g8f6 f3g1
go infinite
But latest patch broke it.
Actually we had two(!) very subtle bugs, the first is that Position::set()
clears the passed state and in particular 'previous' member, so
that on passing setupStates, 'previous' pointer was reset.
Second bug is even more subtle: SetupStates was based on std::vector
as container, but when vector grows, std::vector copies all its contents
to a new location invalidating all references to its entries. Because
all StateInfo records are linked by 'previous' pointer, this made pointers
go stale upon adding more element to setupStates. So revert to use a
std::deque that ensures references are preserved when pushing back new
elements.
No functional change.
And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.
The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.
After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.
Also rewritten and simplified Position constructors.
Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654
No functional change.
When starting search in a mate or stalemate position, Stockfish does not
even care to reinitialize and start worker threads. However after search
all threads are checked for the best move.
This can lead to bestmove and info beeing carried over from the last
search.
Example session:
setoption name threads value 7
go movetime 4000
position startpos moves f2f3 e7e5 g2g4 d8h4
go movetime 4000
Actual output is like (almost always):
[...]
bestmove e2e4
info depth 0 score mate 0
info depth 20 seldepth 29 multipv 1 score cp 28 [...] pv e2e4
bestmove e2e4
Expected output / output after fix:
[...]
bestmove e2e4 ponder e7e6
info depth 0 score mate 0
bestmove (none)
Resolves#623
Broken after "32-bit/64-bit Makefile fix" commit.
Ubuntu "Precise" 12.04.5 supports multilib only until
g++ 4.6 that is not enough to compile Stockfish.
So move to Ubuntu 14.04.4 LTS (Trusty Tahr)
No functional change.
There was already a penalty for squares only defended by King (undefended)
This test records a penalty for completely undefended squares in the so called extended king-ring
(so if we exclude squares defended by a Kg8 for example, we only look at h6 g6 and f6)
We also exclude squares occupied by opponent pieces in this computation,
based on the following results
Was yellow at STC
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 112499 W: 20649 L: 20293 D: 71557
and passed LTC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 36805 W: 5100 L: 4857 D: 26848
Bench: 8430233
Resolves: #619
On top of the usual conditions
a) some opponent in front (but no lever)
b) some neighbours (in front) (but no neighbour behind or same rank)
c) < rank_5
to find out if a pawn is backward we look at the squares in front of this pawn to reach the same rank as the next neighbour.
In current master, a pawn is backward if any of those squares is controlled by an enemy pawn on an adjacent file
In this version, a pawn is ALSO backward if any of those squares is occupied by an enemy pawn.
STC:
http://tests.stockfishchess.org/tests/view/56fe7efd0ebc59301a3541f1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 19051 W: 3557 L: 3433 D: 12061
LTC:
http://tests.stockfishchess.org/tests/view/56febc2d0ebc59301a354209
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40810 W: 5619 L: 5526 D: 29665
Bench: 7525245
Resolves#614
Also a speedup(about 1%) on 64-bit w/o hardware popcnt
Retire Max15 and Full template parameters
(Contributed by Marco Costalba)
Now that we have just SW and HW versions, use
template default parameter to get rid of explicit
template parameters.
Retire bitcount.h and move the only defined
function to bitboard.h
No functional change
Resolves#620
Counter intuitively, make build ARCH=x86-32 does NOT produce a 32-bit compile
when running a 64-bit OS. Nor would ARCH=x86-64 produce a 64-bit compile when
running a 32-bit OS (assuming it compiled w/o errors).
No functional change
Resolves#621
lsb(b) and msb(b) are undefined when b == 0. This can lead to subtle bugs, where
the resulting code behaves differently on different configurations:
- It can be the home grown software LSB/MSB
- It can be the compiler generated software LSB/MSB (when using compiler
intrinsics without the right compiler flags to allow compiler to use hardware
LSB/MSB). Which of course depends on the compiler.
- It can be hardware LSB/MSB generated by the compiler.
- Not to mention that hardware LSB/MSB can return different value on different
hardware when b == 0.
No functional change
Resolves#610
Use compiler intrinsics when possible to
avoid writing platform specific asm code.
Tested on Windows 7 with MSVC 2013 and mingw 4.8.3 (32 and 64 bit)
and on Linux Mint with g++ 4.8.4 and clang 3.4 (32 and 64 bit).
No functional change
Resolves#609
Give more weight to the pawns number and
the vertical king distance in evaluate_initiative()
Passed STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26729 W: 5067 L: 4825 D: 16837
and LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 60480 W: 8338 L: 8016 D: 44126
Bench: 8295162
Resolves#594
PredictedDepth can be negative, causing the futility_margin to be negative.
It will be very difficult to tweak moveCount pruning and reduction formula, as they are tuned to prevent this behavior.
No functional change
Resolves#587
There is no reason to compile 3 different copies of search(). PV nodes are on
the cold path, and PvNode is a template parameter, so there is no cost in
computing:
const bool RootNode = PvNode && (ss-1)->ply == 0;
And this simplifies code a tiny bit as well.
Speed impact is negligible on my machine (i7-3770k, linux 4.2, gcc 5.2):
nps +/-
test 2378605 3118
master 2383128 2793
diff -4523 2746
Bench: 7751425
No functional change.
Resolves#568
In case of a very high material score, we can
overflow VALUE_INFINITE.
This patch fixes an assert with:
position fen 7k/QQQQR3/2B5/4KN1Q/3QQ3/8/8/4R3 b - - 0 1
go depth 1
No functional change.
Resolves#546
Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.
No functional change.
Resolves#537
Tuned the global mobility factor for each piece, as well as some +- delta,
The master mobility factor was {266,334} and tuning gave
{267, 362} +S(-2,-2) for the Knight
{249, 328} +S( 0,-2) for the Bishop
{298, 353} +S(1,1) for the Rook
{265, 358} +S(2,-1) for the Queen
Passed STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 49402 W: 9367 L: 9037 D: 30998
and LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 26831 W: 3871 L: 3658 D: 19302
Bench: 8355485
Resolves#536
Fix a bug where we could stop the search after only 10% of time used due to a matching easy move but later switch to a different move that was never pre-screened as easy due to SMP thread select.
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 27227 W: 4910 L: 4800 D: 17517
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40368 W: 5826 L: 5733 D: 28809
Resolves#521
Simplify time management code by removing hard stops for unchanging first root moves.
Search is now stopped earlier at the end iteration if it did not have fail-lows at root.
This simplification also fixes pondering bug. Ponder flag was true by default
and cutechess-cli doesn't change it to false even though no pondering is possible.
Fix the issue by setting the default value of 'Ponder' flag to false.
10+0.1:
ELO: 3.51 +-3.0 (95%) LOS: 99.0%
Total: 20000 W: 3898 L: 3696 D: 12406
40+0.4:
ELO: 1.39 +-2.7 (95%) LOS: 84.7%
Total: 20000 W: 3104 L: 3024 D: 13872
60+0.06:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 37231 W: 5333 L: 5236 D: 26662
Stopped run at 100+1:
LLR: 1.09 (-2.94,2.94) [-3.00,1.00]
Total: 37253 W: 4862 L: 4856 D: 27535
Resolves#523Fixes#510
Also inline defintions of SpaceMask and CenterBindMask.
Verified from assembly that compiler computes the values
at compile time, so it is also theoretical faster.
While there factor out scale factor evaluation.
No functional change.
This is used by std::stable_sort() to sort moves from highest score to lowest score.
1) The comment is incorrect since highest to lowest means descending.
2) It's more natural to implement a less operator using another less operator rather than a greater operator.
No functional change.
Resolves#504
Comment is based on a misunderstanding of what unaligned memory access is. Here
is an article that explains it very clearly:
https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt
No matter how we define TTEntry or TTCluster, there will never be any unaligned
memory access. This is because the complier knows the alignment rules, and does
the necessary adjustments to make sure unaligned memory access does not occur.
The issue being adressed here has nothing to do with unaligned memory access. It
is about cache performance. In order to achieve best cache performance:
- we prefetch the cacheline as soon as possible.
- we ensure that TT clusters do not spread across two cachelines. If they did,
we would need to prefetch 2 cachelines, which could hurt cache performance.
Therefore the true conditions to achieve this are:
1/ start adress of TT is cache line aligned. void TranspositionTable::resize()
enforces this.
2/ TT cluster size should *divide* the cache line size. Currently, we pack 2
clusters per cache lines. It used to be 1 before "TT sardines". Does not matter
what the ratio is, all we want is to fit an integer number of clusters per cache
line.
No functional change.
Resolves#506
Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().
In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.
After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.
Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965
No functional change.
Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.
Also assorted reformatting while there.
To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103
No functional change.
When we reach the maximum depth, we can finish the
search without a raise of Signals.stop. However, if
we are pondering or in an infinite search, the UCI
protocol states that we shouldn't print the best move
before the GUI sends a "stop" or "ponderhit" command.
It was broken by lazy smp. Fix it by moving the stopping
of the threads after waiting for GUI.
No functional change.
Indeed, if we use a depth >= DEPTH_MAX, we start having negative depth in the
TT (due to int8_t cast).
No functional change in single thread mode
Resolves#490
Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.
Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."
So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.
Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629
TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083
And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646
And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555
bench: 8639247
The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).
No functional change.
Reduce the variation in Root Depth between different threads. This
prevents threads from searching at a depth much higher than Main Thread.
Performed well at STC 24 Threads:
ELO: 3.44 +-3.8 (95%) LOS: 96.1%
Total: 10000 W: 1627 L: 1528 D: 6845
And LTC 24 Threads
LLR: 1.43 (-2.94,2.94) [0.00,4.00]
Total: 3804 W: 500 L: 420 D: 2884
ELO : +7.31
p-value: 73.16%
Passed no regression at STC 3 Threads:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40457 W: 7148 L: 7060 D: 26249
And LTC 3 Threads:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 17704 W: 2489 L: 2364 D: 12851
Raising a pull request early as 24 Thread tests are very expensive and
this is clearly a positive gain at high thread counts and high time
controls. The change is a small parameter tweak with no additional
logic.
No functional change for single thread mode.
Resolves#481
Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.
v1: Sequentially consistent atomics
No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:
threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041
threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728
v2: Acquire/Release semantics
This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:
threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542
threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468
As suggested by Joona, another test at LTC:
threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931
v3: Final version: SeqCst/Relaxed
threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609
Resolves#474
Using less parameters and code to compute Threats
Includes also a few spacing edits.
Run as a simplification.
Passed STC 10+0.1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 18879 W: 3725 L: 3600 D: 11554
Passed LTC 60+0.4
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74116 W: 11001 L: 10958 D: 52157
bench: 8004751
Fishtest is a key factor of SF success.
Thanks to Fishtest we have not only greately
improved ELO but, even more important, we
have enabled a kind of joint development and
testing that it is the herat of on open
source project like SF.
Open sourcing is not just about open code, it is
about commuity development. In case of a chess engine
this has never been possible before due to missing
a strong and strict testing environment that allows
many people to contribute in a safe and coordinate way.
Fishtest is a new way of developing chess engines,
something that has never exsisted before.
No functional change.
Collect and give a second try to some almost passed tuning attempts and
one-line tweaks from the last month.
Passed STC
LLR: 3.07 (-2.94,2.94) [0.00,4.00]
Total: 15124 W: 2974 L: 2756 D: 9394
And LTC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 21577 W: 3507 L: 3289 D: 14781
Bench: 8855226
Resolves#464
Start all threads searching on root position and
use only the shared TT table as synching scheme.
It seems this scheme scales better than YBWC for
high number of threads.
Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194
Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480
Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407
Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036
Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520
Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760
Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).
Bench: 8116244
Apply bonus for the prior CMH that caused a fail low.
Balance Stats: CMH and History bonuses are updated differently.
This eliminates the "fudge" factor weight when scoring moves. Also
eliminated discontinuity in the gravity history stat formula. (i.e. stat
scores will no longer inverse when depth exceeds 22)
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 21802 W: 4107 L: 3887 D: 13808
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 46036 W: 7046 L: 6756 D: 32234
Bench: 7677367
Add Travis CI support to GitHub repo.
After every push to master, Travis will build
the sources directly from GitHub repo according
to .travis.yml and verify everything is ok.
No functional change.
Fix issues after a run of PVS-STUDIO analyzer.
Mainly false positives but warnings are anyhow
useful to point out not very readable code.
Noteworthy is the memset() one, where PVS prefers ss-2
instead of stack. This is because memeset() could
be optimized away by the compiler when using 'stack',
due to stack being a local variable no more used after
memset. This should normally not happen, but when
it happens it leads to very sublte and difficult
to find bug, so better to be safe than sorry.
No functional change.
When changing 'search' and 'splitPointsSize' we have to
use thread locks, not split point ones, because can_join()
is called under the formers.
Verified succesfully with 24 hours toruture tests with 20
cores machine by Louis Zulli: it does not hangs.
Verifyed for no regressions with STC, 7 threads:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 52804 W: 8159 L: 8087 D: 36558
No functional change.
Only refresh TT entry when it's really necessary.
This should give a small speed boost for some machines.
And it's a risk-free change.
No functional change.
Resolves#429
Louis Zulli reported that Stockfish suffers from very occasional hangs with his 20 cores machine.
Careful SMP debugging revealed that this was caused by "a ghost split point slave", where thread
was marked as a split point slave, but wasn't actually working on it.
The only logical explanation for this was double booking, where due to SMP race, the same thread
is booked for two different split points simultaneously.
Due to very intermittent nature of the problem, we can't say exactly how this happens.
The current handling of Thread specific variables is risky though. Volatile variables are in some
cases changed without spinlock being hold. In this case standard doesn't give us any kind of
guarantees about how the updated values are propagated to other threads.
We resolve the situation by enforcing very strict locking rules:
- Values for key thread variables (splitPointsSize, activeSplitPoint, searching)
can only be changed when the thread specific spinlock is held.
- Structural changes for splitPoints[] are only allowed when the thread specific spinlock is held.
- Thread booking decisions (per split point) can only be done when the thread specific spinlock is held.
With these changes hangs didn't occur anymore during 2 days torture testing on Zulli's machine.
We probably have a slight performance penalty in SMP mode due to more locking.
STC (7 threads):
ELO: -1.00 +-2.2 (95%) LOS: 18.4%
Total: 30000 W: 4538 L: 4624 D: 20838
However stability is worth more than 1-2 ELO points in this case.
No functional change
Resolves#422
v = value without ep capture being considered
v1 = value of the ep capture
The correct logic is:
if without e.p. capture we are losing, and the value of e.p is either draw, or win or "loss, but 50 move rule saves us", then we should use the value of ep capture.
Credit and thanks to syzygy1 and lantonov !
No functional change (except with syzygy bases)
Resolves#415Resolves#394
Instead of using hard coded Min and Max values for history,
always adjust the old value slightly downwards before adding a new value.
The adjustment acts like gravity that prevents the value escaping too
far from zero.
Bench: 8020484
Resolves#407
Apart from usual renaiming, take advantage of
C++11 function template default parmeter to
get rid of Eval trampoline functions.
Some triviality fixes while there.
No functional change.
Align the behaviour with reductions. Initially castling moves had to be
treated differently, because the SEE did not handle them correctly. But now it
does.
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 83750 W: 15722 L: 15711 D: 52317
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 97183 W: 15120 L: 15115 D: 66948
bench 7759837
Resolves#403
PawnSafePush, with the value S(5,5) proved not "necessary"
possibly due to recent changes to MobilityArea and other changes to Connected bonus.
STC:
LLR: 3.22 (-2.94,2.94) [-3.00,1.00]
Total: 98528 W: 18757 L: 18759 D: 61012
LTC:
LLR: 5.30 (-2.94,2.94) [-3.00,1.00]
Total: 204194 W: 31698 L: 31734 D: 140762
Bench: 7620871
Resolves#396
Use Position::square and Position::squares instead.
This allow us to remove king_square(), simplify
endgames and to have more naming uniformity.
Moreover, this is a prerequisite step in case
in the future we decide to retire piece lists
altoghter and use pop_lsb() to loop across
pieces and serialize the moves. In this way
we just need to change definition of Position::square
to something like:
template<PieceType Pt> inline
Square Position::square(Color c) const {
return lsb(byColorBB[c]);
}
No functional change.
Based off of Pull request #383:
Include squares occupied by some pawns in the MobilityArea
a) not blocked
b) on rank 4 and above
c) or captures
Passed STC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8157 W: 1644 L: 1516 D: 4997
And LTC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 26086 W: 4274 L: 4051 D: 17761
-----------
Then, a simplification test failed, trying to remove b and c)
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 6048 W: 1117 L: 1288 D: 3643
Another simplification test, was run to remove just (c)
Passed STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 28073 W: 5364 L: 5255 D: 17454
And LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 34652 W: 5448 L: 5348 D: 23856
A parameter tweak test showed that changing b) for "on rank 3 and above"
does not work
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 5233 W: 937 L: 1077 D: 3219
Finally, a small rewrite, and we have this version
Include squares occupied by some pawns in the MobilityArea which are
a) not blocked
b) on rank 4 and above
Bench: 8977899
Resolves#385
Under Windows with MSVC we use the IDE to compile,
in this case we infer some compiler flags usually
set by Makefile.
The condition to check this was wrong, namely when compiling
with mingw under Windows 64 bit we always set IS_64BIT and
USE_BSFQ even if compiled with ARCH=x86-32 (this is how I
found it).
Small code style touches while there.
No functional change.
Error is:
a value of type "int" cannot be assigned to an entity of type "Value"
Also retire the now unused squares_of_color() function.
No functional change.
In Chess960 we can have legal positions with
opponent rook in A or H file and with castling
available, for instance:
4k3/pppppppp/8/8/8/8/PPPPPPPP/rR2K3 w Q - 0 1
In those cases we pick up the wrong rook when
setting castling.
Fix it by checking the color of the rook.
Bug reported by Matthew Lai.
No functional change.
Instead of crafting a clever formula to calculate the array offset, simply use a
3 dimensional array. Remove the comment while at it, because now the code is
self-documenting.
No functional change.
Resolves#344
Introduce helper function Search::reset() which clears all kind of search
memory, in order to restore a deterministic search state.
Generalize TT.clear() into Search::reset() for the following use cases:
- bench: needed to guarantee deterministic bench (ie. if you call bench from
interactive command line twice in a row you get the same value).
- Clear Hash: restore clean search state, which is the purpose of this button.
- ucinewgame: ditto.
No functional change.
Resolves#346
Based on an idea and patch by VoyagerOne.
Small simplification, but was tedted for an ELO gain anyway.
STC:
LLR: 2.95 (-2.94,2.94) [-1.00,4.00]
Total: 5375 W: 1119 L: 977 D: 3279
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 17893 W: 2984 L: 2792 D: 12117
bench 8322847
Use symmetry along vertical middle axis of the board
to reduce the number of parameters.
For instance psqt value of SQ_A5 == SQ_A4 and value of
SQ_F8 == SQ_F1.
This is always true, at least until now nobody came in
with an asymmetric psqt table that worked.
Original patch by Lucas.
No functional change.
Easier for tuning psq tables:
TUNE(myParameters, PSQT::init);
Also move PSQT code in a new *.cpp file, and retire the
old and hacky psqtab.h that required to be included only
once to work correctly, this is not idiomatic for a header
file.
Give wide visibility to psq tables (previously visible only
in position.cpp), this will easy the use of psq tables outside
Position, for instance in move ordering.
Finally trivial code style fixes of the latest patches.
Original patch of Lucas Braesch.
No functional change.
In ei.attackedBy, Queen does not x-ray through Rook, but the Rook does
X-ray through the Queen.
So most of the rook contact checks supported by queen are, in fact,
Queen Contact Checks and they are already scored separately.
Bench: 7762189
Resolves#338
Currently Zobrist::castling[] are not properly zeroed
and rely on the compiler to do this at startup, but this
makes Position::init() to set different values every time
it is called!
This is a bit odd, and although not impacting normal usage,
can yield to subtle misbehaviour, very difficult to track
down, in case we happen to call it more than once for some
reason. I found this while developing tuning support and
it took me a while to track it down.
So properly init Zobrist::castling[]
No functional change.
Resolves#329
When running more games in parallel, or simply when running a game
with a background process, due to how OS scheduling works, there is no
guarantee that the CPU resources allocated evenly between the two
players. This introduces noise in the result that leads to unreliable
result and in the worst cases can even invalidate the result. For
instance in SF test framework we avoid running from clouds virtual
machines because are a known source of very unstable CPU speed.
To overcome this issue, without requiring changes to the GUI, the idea
is to use searched nodes instead of time, and to convert time to
available nodes upfront, at the beginning of the game.
When nodestime UCI option is set at a given nodes per milliseconds
(npmsec), at the beginning of the game (and only once), the engine
reads the available time to think, sent by the GUI with 'go wtime x'
UCI command. Then it translates time in available nodes (nodes =
npmsec * x), then feeds available nodes instead of time to the time
management logic and starts the search. During the search the engine
checks the searched nodes against the available ones in such a way
that all the time management logic still fully applies, and the game
mimics a real one played on real time. When the search finishes,
before returning best move, the total available nodes are updated,
subtracting the real searched nodes. After the first move, the time
information sent by the GUI is ignored, and the engine fully relies on
the updated total available nodes to feed time management.
To avoid time losses, the speed of the engine (npms) must be set to a
value lower than real speed so that if the real TC is for instance 30
secs, and npms is half of the real speed, the game will last on
average 15 secs, so much less than the TC limit, providing for a
safety 'time buffer'.
There are 2 main limitations with this mode.
1. Engine speed should be the same for both players, and this limits
the approach to mainly parameter tuning patches.
2. Because npms is fixed while, in real engines, the speed increases
toward endgame, this introduces an artifact that is equivalent to an
altered time management. Namely it is like the time management gives
less available time than what should be in standard case.
May be the second limitation could be mitigated in a future with a
smarter 'dynamic npms' approach.
Tests shows that the standard deviation of the results with 'nodestime'
is lower than in standard TC, as is expected because now all the introduced
noise due the random speed variability of the engines during the game is
fully removed.
Original NIT idea by Michael Hoffman that shows how to play in NIT mode
without requiring changes to the GUI. This implementation goes a bit
further, the key difference is that we read TC from GUI only once upfront
instead of re-reading after every move as in Michael's implementation.
No functional change.
And reformat a bit time manager code.
Note that now we set starting search time in think() and
no more in ThreadPool::start_thinking(), the added delay
is less than 1 msec, so below timer resolution (5msec) and
should not affect time lossses ratio.
No functional change.
Code like this is more a case of showing off one's C++ knowledge, rather than
using it adequately, IMHO.
**First loop (std::generate)**
Iterators are inadequate here, because they lose the key information which is
idx. As a result, we need to carry a redundant idx variable, and increment it
along the way. Very clumsy.
Usage of std::generate and a lambda function only obfuscate the code, which is
merely a simple and stupid loop over the elements of a vector.
**Second loop (std::accumulate)**
This code is thoroughlly incomprehensible. Restore the original, which was much
simpler to understand.
**Third loop (range based loop)**
Again, a range based loop is inadequate, because we lose idx! To resolve this
artificially created problem, the data model was made redundant (idx is a data
member of db[] elements!?), which is ugly and unjustified. A simple and stupid
for loop with idx does the job much better.
No functional change.
Resolves#313
Small speed-up in pawn.cpp
Results for 10 tests for each version:
Base Test Diff
Mean 1435636 1445238 -9602
StDev 22576 23189 1848
p-value: 1
speedup: 0.007
No functional change
Resolves#295
If the sum of CounterMoveHistory heuristic and History heuristic is below zero,
then reduce an extra ply in cut nodes
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 6479 W: 1099 L: 967 D: 4413
Bench: 7773299
Resolves#315
Currently if we call it more than once, we crash.
This is not a real problem, because this function is
indeed called just once. Nevertheless with this small fix,
that gets rid of a hidden 'static' variable, we cleanly
resolve the issue.
While there, fix also ThreadPool::exit to return in a
consistent state. Now all the init() functions but
UCI::init() are reentrant and can be called multiple
times.
No functional change.
Profiling shows that resetting attacks table after
a failed candidate magic attempt is the biggest
time consumer, so rewrite the logic avoiding the
memset()
Magics init for rook+bishop goes from 200msecs to
under 100msec.
No functional change.
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.
Two condition variables are strictly needed just for
many producers / many consumers case.
Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.
The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()
No functional change.
Now that we use spinlocks everywhere and don't put
threads to sleep while idle, we can use the slower
(but no more in hot path) std::condition_variable_any
instead of our homwgrown ConditionVariable struct.
Verified fo rno regression at STC with 7 threads:
ELO: -0.66 +-2.7 (95%) LOS: 31.8%
Total: 20000 W: 3210 L: 3248 D: 13542
No functional change
There's no reason to define it as a Bitboard, so for consistency, use bool.
This is even a speedup on my machine: i7-3770k, using gcc 4.9.1 (linux):
stat test master diff
mean 2,341,338 2,327,998 13,134
stdev 15,765 14,717 5,405
speedup 0.56%
P(speedup>0) 100.0%
No functional change.
Resolves#298
Avoid redundant 'while' conditions. It is enough to
check them in the outer loop.
Quick tested for no regression 10K games at 4 threads
ELO: -1.32 +-3.9 (95%) LOS: 25.6%
Total: 10000 W: 1653 L: 1691 D: 6656
No functional change.
Spinlock must be used instead.
Tested for no regression at 15+0.05 th 4:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25928 W: 4303 L: 4190 D: 17435
No functional change.
Resolves#297
Unify the quites moves loop for both cases,
the compiler optimizes away the
if (is_ok((ss-1)->currentMove))
inside loop, so that the result is same
speed as original.
No functional change.
During the search, do not block on condition variable, but instead use std::this_thread::yield().
Clear gain with 16 threads. Again results vary highly depending on hardware, but on average it's a clear gain.
ELO: 12.17 +-4.3 (95%) LOS: 100.0%
Total: 7998 W: 1407 L: 1127 D: 5464
There is no functional change in single thread mode
Resolves#294
Both are the result of a SPSA tuning session with a custom book, 50k iterations each.
After an additional tuning session of the mobility values, tuning the delta values, with following result.
40k games at 9+0.05:
ELO: 4.13 +-2.2 (95%) LOS: 100.0%
Total: 40000 W: 8581 L: 8106 D: 23313
and LTC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 36518 W: 6049 L: 5782 D: 24687
Bench: 8567402
Resolves#284
Fixes reported startup error about missing libwinpthread-1.dll
when the dll is not in the path.
The current -static-xxxx flags, introduced with:
https://github.com/official-stockfish/Stockfish/commit/373503f4a9a990054b5
Only take in account standard libraries, but not thread
library.
No functional change.
Resolves#289
Idea and original implementation by Stephane Nicolet
7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202
There is no functional change in single thread mode
Spend much less time in positions where one move is much better than all other alternatives.
We carry forward pv stability information from the previous search to identify such positions.
It's based on my old InstaMove idea but with two significant improvements.
1) Much better instability detection inside the search itself.
2) When it's time to make a FastMove we no longer make it instantly but still spend at least 10% of normal time verifying it.
Credit to Gull for the inspiration.
BIG thanks to Gary because this would not work without accurate PV!
20K
ELO: 8.22 +-3.0 (95%) LOS: 100.0%
Total: 20000 W: 4203 L: 3730 D: 12067
STC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 23266 W: 4662 L: 4492 D: 14112
LTC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 12470 W: 2091 L: 1931 D: 8448
Resolves#283
Introduce a counter move history table which additionally is indexed by the last move's piece and target square.
For quiet move ordering use now the sum of standard and counter move history table.
STC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 4747 W: 1005 L: 885 D: 2857
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 5726 W: 1001 L: 872 D: 3853
Because of reported low NPS on multi core test
STC (7 threads):
ELO: 7.26 +-3.3 (95%) LOS: 100.0%
Total: 14937 W: 2710 L: 2398 D: 9829
Bench: 7725341
Resolves#282
Use Mutex instead.
This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.
Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.
No functional change.
minKingPawnDistance is used only as local variable in one place so we don't need it to be part of "Pawns::Entry" structure.
No functional change.
Resolves#277
This change in the Makefile restores the possibility to compile
Stockfish on Mac OS X 10.9 and 10.10 after the C++11 has been merged.
To use the default (fastest) settings, compile with:
make build ARCH=x86-64-modern
To test the clang settings, compile with
make build ARCH=x86-64-modern COMP=clang
Beware that the clang settings may provide a slightly slower (6%)
executable.
Backported from master.
No functional change
Resolves#275
This change in the Makefile restores the possibility to compile
Stockfish on Mac OS X 10.9 and 10.10 after the C++11 has been merged.
To use the default (fastest) settings, compile with:
make build ARCH=x86-64-modern
To test the clang settings, compile with
make build ARCH=x86-64-modern COMP=clang
Beware that the clang settings may provide a slightly slower (6%)
executable.
No functional change
Resolves#275
Now that c++11 branch has been merged in master,
disable unconditionally the spinlocks and use mutex
instead. This will allow to run fishtest even on HT
machines withouth changes.
In the future we will reintorduce spinlocks, once
we will have took care of fishtest.
No functional change.
And use mutex instead. You may never want to do this.
It is a workaround to run c++11 on fishtest where many
machiens have HTenabled and this can be a problem when
number of cores set is higher than number of physical cores.
To disable spinlocks, just compile with -DNO_SPINLOCK flag
No functional change.
Calling lock.test_and_set() in a tight loop creates expensive
memory synchronizations among processors and penalize other
running threads. So syncronize only only once at the beginning
with fetch_sub() and then loop on a simple load() that puts much
less pressure on the system.
Reported about 2-3% speed up on various systems.
Patch by Ronald de Man.
No functional change.
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.
The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.
No functional change.
Initialization is more complex than what I'd like due
to MSVC compatibility that for some reason does not like:
std::atomic_flag lock = ATOMIC_FLAG_INIT;
No functional change.
It should be used slavesMask.count() instead.
Verified 100% equivalent when sp->allSlavesSearching:
dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());
No functional change.
Document and clarify that we cannot rejoin on ourselves
and that we never late join if we are master and all
slaves have finished, inded in this case we exit idle_loop.
No functional change.
In case of Threads.size() == 2 we have that sp->allSlavesSearching
is always false (because we have finished our search), bestSp is
always NULL and we never late join, so there is no need to special
case here.
Tested with dbg_hit_on(sp && sp->allSlavesSearching) and
verified it never fires.
No functional change.
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.
Verified with
dbg_hit_on(sp->spLevel != level)
that the values are 100% the same out of more 50K samples.
No functional change.
Balance threads between split points.
There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:
For Bravone (1000 games): 0 ELO
For Glinscott (1000 games): +20 ELO
For bKingUs (1000 games): +50 ELO
For fastGM (1500 games): +50 ELO
The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861
Finally it was verified that there was no (significant) regression for
4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069
2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105
1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196
Resolves#258
This micro-optimization only complicates the code and provides no benefit.
Removing it is even a speedup on my machine (i7-3770k, linux, gcc 4.9.1):
stat test master diff
mean 2,403,118 2,390,904 12,214
stdev 12,043 10,620 3,677
speedup 0.51%
P(speedup>0) 100.0%
No functional change.
This micro-optimization only complicates the code and provides no benefit.
Removing it is even a speedup on my machine (i7-3770k, linux, gcc 4.9.1):
stat test master diff
mean 2,403,118 2,390,904 12,214
stdev 12,043 10,620 3,677
speedup 0.51%
P(speedup>0) 100.0%
No functional change.
Use integer arithmetic instead of floating point arithmetic.
Floating point arithmetic was causing different results for some 32-bit compiles
No functional change
Resolves#249Resolves#250
Also remove useless StateCopySize64 optimization:
compiler uses SSE movups instruction anyhow and
does not need this trick (verified with fishbench).
No functional change.
Funny enough, gcc __builtin_prefetch() expects
already a void*, instead Windows's _mm_prefetch()
requires a char*.
The patch allows to remove ugly casts from caller
sites.
No functional change.
Results for 10 tests for each version (gcc 4.8.3 on mingw):
Base Test Diff
Mean 1502447 1507917 -5470
StDev 3119 1364 4153
p-value: 0,906
speedup: 0,004
Results for 10 tests for each version (MSVC 2013):
Base Test Diff
Mean 1400899 1403713 -2814
StDev 1273 2804 2700
p-value: 0,851
speedup: 0,002
No functional change.
I went through all the individual compile options that differ between
-fprofile-generate/-fprofile-use and -fprofile-arcs/-fbranch-probabilities
and distilled the speed difference down to only turning off
-fno-peel-loops and -fno-tracer. Using this we still get the full speedup
(maybe a bit more because other optimizations stay on) and it's also much cleaner
because we can get rid of the "@rm -f ucioption.gc*" hack for all versions of gcc.
No functional change.
Resolves#237
Follow the usual approach to delay computation
as far as possible, in case an earlier killer
cut-offs we avoid to do useless work.
This also greatly simplifies the code.
No functional change.
Verified with perft there is no speed regression,
and code is simpler. It is also conceptually correct
becuase an extended move is just a move that happens
to have also a score.
No functional change.
This is an old patch from Jean-Francois Romang to send
UCI hashfull info to the GUI:
https://github.com/mcostalba/Stockfish/pull/60/files
It was wrongly judged as a slowdown, but it takes much
less than 1 ms to run, indeed on my core i5 2.6Ghz it
takes about 2 microsecs to run!
Regression test is good:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 7352 W: 1548 L: 1401 D: 4403
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 61432 W: 10307 L: 10251 D: 40874
I have set the name of the author to the original
one.
No functional change.
This patch has two positive effects:
- Retire a hackish formula and leave
just a natural, simple and plain one.
- Reduce strenght at very low level, but
don't impact medium/high levels.
Actually even at level 0, SF is still too
strong for many beginners (this was reported
many times for instance on Droidfish user
comments on Google Play).
Test on fishtest shows that ELO drop is around
170 ELO at level 0 (good!), 130 ELO at level 1
and smoothly reduces (as expected) until level
10 where the drop is just of 8 ELO.
No functional change.
- Fix a skill level problem: Don't allow move pruning at root node
- Revert "Fix profile build for gcc on Mac OSX". Results for a faster binary in x86-64.
- Fix a MSVC warning
Bench: 8918745
Seems to be a performance regression for standard build.
For SF6 people compiling on Mac OSX using profile-build option
just need to make necessary adjustments manually...
No functional change
Resolves#223
Instead of swapping sub-optimal move in Skill
d'tor, make it explicitly at the end of the search.
Also streamline and clarify relation with multiPV
and pass it directly instead of relying on the hacky
'candidates' member.
No functional change.
Warning is C4512 (assignment operator could not be generated)
Now, apart the foreign syzygy code, everything compiles
without warnings at warning level 4.
Backported from C++11 branch.
No functional change.
- Fix a compilation issue related to BMI2 PEXT instruction
- Retrieve a ponder move from TT if PV is only one move long
Bench: 8080602
No functional change
This intrinsic to call BMI2 PEXT instruction is
defined in immintrin.h. This header should be
included only when USE_PEXT is defined, otherwise
we define _pext_u64 as 0 forcing a nop.
But under some mingw platforms, even if we don't
include the header, immintrin.h gets included
anyhow through an include chain that starts with
STL <algorithm> header. So we end up both defining
_pext_u64 function and at the same time defining
_pext_u64 as 0 leading to a compile error.
The correct solution is of not using _pext_u64 directly.
This patch fixes a compile error with some mingw64
package when compiling with x86-64.
No functional change.
Resolves#222
In case we stop the search during a fail-high
it is possible we return to GUI without a ponder
move. This patch try harder to find a ponder move
retrieving it from TT. This is important in games
played with 'ponder on'.
bench: 8080602
Resolves#221
When not using Makefile, e.g. with MSVC, if hardware
supports BMI2 instructions, then USE_PEXT should be
added in project configuration to enable pext support.
No functional change.
This intrinsic to call BMI2 PEXT instruction is
defined in immintrin.h. This header should be
included only when USE_PEXT is defined, otherwise
we define _pext_u64 as 0 forcing a nop.
But under some mingw platforms, even if we don't
include the header, immintrin.h gets included
anyhow through an include chain that starts with
STL <algorithm> header. So we end up both defining
_pext_u64 function and at the same time defining
_pext_u64 as 0 leading to a compile error.
The correct solution is of not using _pext_u64 directly.
This patch fixes a compile error with some mingw64
package when compiling with x86-64.
No functional change.
Warning is C4512 (assignment operator could not be generated)
Now, apart the foreign syzygy code, everything compiles
without warnings at warning level 4.
No functional change.
Coverity scan warns about uninitialized 'sf' argument when
calling probe(). Actually it is a false positive because
argument is passed by reference and assigned inside
probe(). Nevertheless it is a hint that fucntion signature
is a bit tricky, so rewrite it in a more conventional way,
assigning 'sf' from probe() return value.
No functional change.
On platforms where size_t is 32 bit, we
can have an overflow in this expression:
(mbSize * 1024 * 1024)
Fix it setting max hash size of 2GB on platforms
where size_t is 32 bit.
A small rename while there: now struct Cluster
is definied inside class TranspositionTable so
we should drop the redundant TT prefix.
No functional change.
On Android-ARM current TB code crashes at
random times even in single thread mode.
Reported, debugged, fixed and verified
by Peter Osterlund.
No functional change.
Resolves#201
This optimization is aimed at old hardware only (withouth popcount), and even on
non popcount compile (ARCH=x86-64), it provides no mesurable speedup:
stat test master diff
mean 2,341,779 2,354,699 -12,920
stdev 12,910 14,770 18,150
speedup -0.55%
P(speedup>0) 23.8%
No functional change.
Resolves#187
- Change UCI::value() signature
This function should only return the value,
lowerbound and upperbound info is up to the
caller because it requires external knowledge,
out of the scope of this little helper.
- Retire 'key' command
It is not an UCI command and is absolutely
useless: never used.
- Comments fixing and other trivia
No functional change.
And reshuffle a bit the functions to place
them in a consistent order.
To be on the safe side, patch has been
validated for no regression/crashes with
a small 8K games test with 3 threads:
ELO: 3.98 +-4.4 (95%) LOS: 96.3%
Total: 8388 W: 1500 L: 1404 D: 5484
No functional change.
It is up to material (and pawn) table look up
code to know where the per-thread tables are,
so change API to reflect this.
Also some comment fixing while there
No functional change.
Move all in evaluation.
Simplify the code and concentrate in a single place
all the logic behind space evaluation, making it much
more clear.
Verified also at STC it does not regress due to a possible
slow down:
LLR: 3.91 (-2.94,2.94) [-3.00,1.00]
Total: 65744 W: 13285 L: 13194 D: 39265
No functional change.
Commenst are obsolete now, an updated description
would be quite obscure, so better let the code
to talk and remove them all together.
No functional change.
Use the same template of other pawns moves generation,
make the code more uniform, simplify generate_promotions
that has now been renamed.
No functional change (verified also with perft).
In some UNIX systems "rm" prompts user for confirmation.
However "rm -f" is always a guaranteed forced deletion.
Also move gcc profiling hack under the correct target
No Functional change
Resolves#168
ShelterWeakness and Stormdanger array are now indexed additionally by
file pair (a/h,b/g,c/f,d/e). The special case of king blocking a pawn
is incorporated in the StormDanger array. Finally the 93 parameters
are tuned by SPSA on LTC.
STC
ELO: 3.46 +-2.2 (95%) LOS: 99.9%
Total: 40000 W: 8275 L: 7877 D: 23848
LTC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 10311 W: 1876 L: 1721 D: 6714
Bench: 9498821
Resolves#163
The evaluation is already done by the specialized
function, don't need to add something elese later.
With this patch following positions are evaluated
correctly as draws:
8/6p1/1Pkp1p1p/2nNn2P/2P1K1P1/8/8/3B4 w - - 7
8/1k4p1/1P1p1p1p/3NnK1P/2P3P1/1n6/4B3/8 w - -
Verified it not regress with an STC test:
LLR: 3.15 (-2.94,2.94) [-3.00,1.00]
Total: 49812 W: 10095 L: 10016 D: 29701
Reported by Arjun Temurnikar.
bench: 8289983
In particular seems more natural to return
bool and TTEntry on the same line, actually
we should pass and return them as a pair,
but due to limitations of C++ and not wanting
to use std::pair this can be an acceptable
compromise.
No functional change.
Resolves#157
In some cases we want to go direcly to the moves loop
without checking for early return. The patch make this
logic more clear and consistent.
Tested for no regression, passed STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25282 W: 5136 L: 5022 D: 15124
and LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 72007 W: 12133 L: 12095 D: 47779
bench: 9316798
This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed
by S. Vigna (2014). It is extremely simple, has a large enough period for
Stockfish's needs (2^64), requires no warming-up (allowing such code to be
removed), and offers slightly better randomness than MT19937.
Paper: http://xorshift.di.unimi.it/
Reference source code (public domain):
http://xorshift.di.unimi.it/xorshift64star.c
The patch also simplifies how init_magics() searches for magics:
- Old logic: seed the PRNG always with the same seed,
then use optimized bit rotations to tailor the RNG sequence per rank.
- New logic: seed the PRNG with an optimized seed per rank.
This has two advantages:
1. Less code and less computation to perform during magics search (not ROTL).
2. More choices for random sequence tuning. The old logic only let us choose
from 4096 bit rotation pairs. With the new one, we can look for the best seeds
among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces
the effort needed to find the magics:
64-bit SF:
Old logic -> 5,783,789 rand64() calls needed to find the magics
New logic -> 4,420,086 calls
32-bit SF:
Old logic -> 2,175,518 calls
New logic -> 1,895,955 calls
In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5).
Finally, when playing with strength handicap, non-determinism is achieved
by setting the seed of the static RNG only once. Afterwards, there is no need
to skip output values.
The bench only changes because the Zobrist keys are now different (since they
are random numbers straight out of the PRNG).
The RNG seed has been carefully chosen so that the
resulting Zobrist keys are particularly well-behaved:
1. All triplets of XORed keys are unique, implying that it
would take at least 7 keys to find a 64-bit collision
(test suggested by ceebo)
2. All pairs of XORed keys are unique modulo 2^32
3. The cardinality of { (key1 ^ key2) >> 48 } is as close
as possible to the maximum (65536)
Point 2 aims at ensuring a good distribution among the bits
that determine an TT entry's cluster, likewise point 3
among the bits that form the TT entry's key16 inside a
cluster.
Details:
Bitset card(key1^key2)
------ ---------------
RKISS
key16 64894 = 99.020% of theoretical maximum
low18 180117 = 99.293%
low32 305362 = 99.997%
Xorshift64*, old seed
key16 64918 = 99.057%
low18 179994 = 99.225%
low32 305350 = 99.993%
Xorshift64*, new seed
key16 65027 = 99.223%
low18 181118 = 99.845%
low32 305371 = 100.000%
Bench: 9324905
Resolves#148
Currently Search::RootMoves is accessed and even
modified by TB probing functions in a hidden
and sneaky way.
This is bad practice and makes the code tricky.
Instead explicily pass the vector as function
argument so to clarify that the vector is modified
inside the functions.
No functional change.
Simplified also some logic while there.
TBLargest needs renaming too, but itis for
a future patch because touches also syzygy
directory stuff.
No functional change.
We really don't need to uglify in this way
our nice count() API with this ad-hoc hack.
So remove the hack and use the already
existing infrastructure.
No functional change.
Resolves#134
Updating the makefile so that the clean and gcc-profile-clean targets also
remove the profiling data files in the syzygy directory.
No functional change.
Resolves#136
Just like in Physics, the ratio of 2 things in the same unit, should be
without unit.
Example use case:
- Ratio of a Depth by a Depth (eg. ONE_PLY) should be an int.
- Ratio of a Value by a Value (eg. PawnValueEg) should be an int.
Remove a bunch of useless const while there.
No functional change.
Resolves#128
Adds support for Syzygy tablebases to Stockfish. See
the Readme for information on using the tablebases.
Tablebase support can be enabled/disabled at the Makefile
level as well, by setting syzygy=yes or syzygy=no.
Big/little endian are both supported.
No functional change (if Tablebases are not used).
Resolves#6
It is possible that we won't have a ponder move if our PV
is too short. In that case, just don't print a ponder move.
No functional change
Resolves#130
When evaluating double pawns we use always
lsb() to extract the frontmost square.
This breaks evaluation color symmetry as is
possible to verify with an instrumented evaluate()
Value evaluate(const Position& pos) {
Value v = do_evaluate<false>(pos);
Position p = pos;
p.flip();
assert(v == do_evaluate<false>(p));
return v;
}
Passed no regression test:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 21035 W: 4244 L: 4122 D: 12669
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 39839 W: 6662 L: 6572 D: 26605
bench: 8255966
It is a non functional change, but because
we now skip copying of pv[] in SpNode, patch
has been tested for regression with 3 threads:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 54668 W: 9873 L: 9809 D: 34986
No functional change.
This is a regression from 428962a
We have to cast to char here, otherwise the compiler
interprets it as an integer, and writes a number.
No functional change
Resolves#122
This is the first of a patch series to
rearrange and simplify accurate PV.
In this patch there is simple coding
style and reformatting stuff.
Verified with fishtest it does not crash
with MAX_PLY = 8
No functional change.
Previously token would keep its value from the previous line when an empty
line was input, leading to unexpected behaviour.
No functional change
Resolves#119
When comparing to a Depth it is more
consistent to use DEPTH_MAX instead
of a int.
This is a subtle difference because we use
ply and depth almost interchangably in SF,
but they are different. FOr counting plies
makes ense to continue using ints, while
for Depth we have our specific enum.
This cleanly fixes a new Clang 3.5 warning:
No functional change.
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).
One drawback is that fail-high/low only get a minimal 2 move PV.
It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.
A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.
STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926
STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643
STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699
LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740
Bench: 6984058
Resolves#102
Daniel Jose reported that it was an elo gain in his engine:
http://www.talkchess.com/forum/viewtopic.php?t=54290
STC: Hash=16
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33067 W: 6670 L: 6571 D: 19826
LTC: Hash=64
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 41181 W: 7008 L: 6920 D: 27253
And another one to verify no regression with hash pressure:
STC: Hash=4
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 25085 W: 5059 L: 4991 D: 15035
Verified that qsearch does not explode after this patch (recapture threshold).
Bench 7962287
Resolves#112
Real men jump/branch by hand...but
we prefer the humble way.
Moved also some uci info code where it
belongs, while there.
No functional change.
Resolves#110
16MB for 1" searches is more comensurate with the average use case.
And 16 is the default used by stockfish bench, so it makes sense to be
consistent, if only to have the same minimum memory requirement for using
SF and compiling it with PGO.
No functional change.
Resolves#109
Now we can directly replace it with
the definition resulting in simpler
and possibly faster code because
PvNode is evaluated at compile time.
No functional change.
* remove some erroneous comments, that were based on the ONE_PLY == 2.
* rename hd to d, because there's no more half-depth in SF.
* rescope variables into the for loops.
* reindent the for loops correctly.
* add a comment to explain the eval improving part (not so obvious to read
this code as array has 4 dimensions).
No functional change.
This should reduce search inconsistencies, and doesn't seem to have a measurable ELO Impact:
STC with Hash=16
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 49264 W: 10076 L: 10007 D: 29181
LTC with Hash=64
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 82149 W: 14044 L: 14023 D: 54082
Plus an extra test, to make sure it doesn't regress with strong hash pressure:
STC with Hash=4
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 21498 W: 4327 L: 4246 D: 12925
Bench: 7302735
Resolves#100
Idea is to apply king safety later in the endgame. Previously, we didn't
apply KS in a RR vs. Q ending for example, which causes poor play.
Now we calculate king attacks when the attacking side has a queen or more.
STC with 8moves_v3
LLR: 3.06 (-2.94,2.94) [0.00,4.00]
Total: 38481 W: 6228 L: 5952 D: 26301
LTC with 2moves_v1
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 51053 W: 8670 L: 8353 D: 34030
Bench: 7514010
Resolves#98
UCI specification is not clear on the size of
integers that are exchanged in the protocol, so
instead of a simple int, assume 'nodes' is a
int64_t because we need a bigger size to store
this value in many real cases, especialy with
very long searches.
No functional change.
Resolves#75
Clang 3.5 issues warning on constructs like: abs(f1 - f2). The thing is that
f1 and f2 are enum types, and the range given (all positive) allows the
compiler to choose an unsigned type (efficiency being one reason to prefer
unsigned arithmetic). If f1 < f2 are unsigned, then f1 - f2 wraps around zero
and the abs() becomes a no-op. It's the reinterpretation of the unsigned
result (large value) as a signed int that happens to give the correct result,
thanks to 2's complement. This is all tricky and dangerous!
In the spirit of the standard, we assume nothing on the signedness of enums,
and simply calculate the rank and file distances as:
- rank_dist(r1, r2) = r1 < r2 ? r2 - r1 : r1 - r2
- file_dist(f1, f2) = f1 < f2 ? f2 - f1 : f1 - f2
this logic can in fact be applied to any enum we may use, so for better
generality and to avoid code duplication, we use a template function diff()
here.
No functional change.
Resolves#95
This area has become obscure and tricky over the course of incremental
changes that did not respect the original's consistency and clarity. Now,
it's not clear why we use MAX_PLY = 120, or why we use MAX_PLY+6, among
other things.
This patch does the following:
* ID loop: depth ranges from 1 to MAX_PLY-1, and due to TT constraint (depth
must fit into an int8_t), MAX_PLY should be 128.
* stack[]: plies now range from 0 to MAX_PLY-1, hence stack[MAX_PLY+4],
because of the extra 2+2 padding elements (for ss-2 and ss+2). Document this
better, while we're at it.
* Enforce 0 <= ply < MAX_PLY:
- stop condition is now ss->ply >= MAX_PLY and not ss->ply > MAX_PLY.
- assert(ss->ply < MAX_PLY), before using ss+1 and ss+2.
- as a result, we don't need the artificial MAX_PLY+6 range. Instead we
can use MAX_PLY+4 and it's clear why (for ss-2 and ss+2).
* fix: extract_pv_from_tt() and insert_pv_in_tt() had no reason to use
MAX_PLY_PLUS_6, because the array is indexed by plies, so the range of
available plies applies (0..MAX_PLY before, and now 0..MAX_PLY-1).
Tested with debug compile, using MAX_PLY=16, and running bench at depth 17,
using 1 and 7 threads. No assert() fired. Feel free to submit to more severe
crash-tests, if you can think of any.
No functional change.
SSE4.2 has nothing to do with POPCNT. We must dispell this myth, because
Stockfish is a reference and many will copy this mistake if they see it in Stockfish:
* SSE is an SIMD instruction set, relative to vectorization (using special 128-bit registers).
* POPCNT/LZCNT work on normal registers (eg. AL, AX, EAX, RAX).
The confusion comes from the fact that, in the Intel product line, it just
so happens that SSE4.2 and POPCNT/LZCNT came out at the same time. But this
is not true for AMD. For example, all AMD Pheniom II have SSE3 but no
POPCNT/LZCNT, and that is why the modern compile uses -msse3 -popcnt and not -msse4.2.
No functional change.
Resolves#86
Objects that are only accessible at file-scope should be put in the anonymous namespace.
This is what the C++ standard recommends, rather than using static, which is really C-style and results in static linkage.
Stockfish already does this throughout the code. So let's weed out the few exceptions,
because... they have no reason to be exceptional.
No functional change.
Resolves#84
It is more idiomatic, we didn't used it
in the past because Position::pretty(Move)
had a calling argument, but now we can.
As an added benefit, we avoid a lot of string
copies in the process because now we avoid
std::ostringstream ss.
No functional change.
This commit fixes two issues:
1) Don't print PVs after the search has been interrupted
This solves the "mate 0 upperbound" scores that sometimes
creep up when a multi-PV analysis gets interrupted with
the `stop` command.
2) Print multipv before score
Shredder Classic fails to identify the main PV
(the one with multipv 1) if `score` comes first.
This leads to an eval graph that doesn't reflect
the scores actually reported by Stockfish when
doing a multiPV analysis.
No functional change
Closes#76
Helper function should just know how to find the
biggest piece type in a bitboard. All the threat
logic and data shoud be in evaluate_threats().
This nicely separates the scope of the two functions
in a more consistent way and simplifies the code.
No functional change.
Use the max_threat() helper function to estimate more precisely the
best hanging piece threat. Also retunes the Threat array using SPSA.
STC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 7598 W: 1596 L: 1468 D: 4534
LTC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 7896 W: 1495 L: 1350 D: 5051
Bench: 6816504
Resolves#73
Instead of hard-code the weights in a big table,
we prefer to calculate them out of few parameters
at startup. This allows to keep low the number of
independent parameters and hence is good for tuning
and for a better insight in the meaning of the numbers.
No functional change.
Make even more clear what are the terms that
contribute to evaluate connected pawns, and
completely separate them from the weights
that are now fully looked up in a table.
For future tuning makes sense to init the table with
a formula instead of hard-code it. This allows to
reduce problem space cardinality and makes tuning
easier.
And fix a MSVC warning while there:
warning C4804: '>>' : unsafe use of type 'bool' in operation
No functional change.
These two notions are very correlated. Since connected has the most
generality, it makes sense to generalize it to encompass what is
covered by candidate.
STC:
LLR: 4.03 (-2.94,2.94) [-3.00,1.00]
Total: 11970 W: 2577 L: 2379 D: 7014
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 13194 W: 2389 L: 2255 D: 8550
bench 7328585
Now that half plies have been removed from the engine, we can encode
TT depth into an int8_t.
Range is -128 to +127, so it goes still further than the previous
limit of 121 plies (with ONE_PLY == 2 where depth - DEPTH_NONE was
encoded as an uint8_t).
No functional change.
Resolved#60
Now that half-plies are no more used we can simplify
the code assuming that ONE_PLY is 1 and no more 2.
Verified with a SMP test:
LLR: 2.95 (-2.94,2.94) [-4.50,0.00]
Total: 8926 W: 1712 L: 1607 D: 5607
No functional change.
On top of previous patch, rename time variables to
reflect the simplification of UCI parameters.
It is more correct to use as varibales directly the
corresponding UCI option, without intorducing redundant
intermediate variables.
This allows also to simplify the code.
No functional change.
In case of a succesful late join we set again
'searching' flag, so we can restart search
immediately without an useless lock/unlock
cycle.
No functional change.
Now "Write Search Log" will pring moves in UCI format, consistent with all the rest. This functionality is
not aimed at end-users anyway. It's hardly useful at all, in fact. Also, pretty-printing SAN moves is
something that better belongs in the GUI than in the engine.
No functional change.
Where they better belong.
Also, this removes '#include <string>' from types.h, which reduces the amount of code to compile (every
translation unit includes types.h).
No functional change.
Unify various perft functions and move all the code
to search.cpp.
Avoid perft implementation to be splitted between
benchmark.cpp (where it has no reason to be) and
search.cpp
No functional and no speed change (tested).
So that one can redirect cout to /dev/null and only print print cerr in the terminal (for more accurate speed
tests).
Suggested by Marco.
No functional change.
The compiler tries to cast Options["Hash"] into a string, using:
Option::operator std::string() const {
assert(type == "string");
return currentValue;
}
And, as expected, the assert() fails.
std::to_string() would be the right solution, but it's C++11. And using a stringstream is too much code to
achieve so little. Let's keep it the way it was: hardcoded (ie. default hash defined in two places).
No functional change.
The eval already returns zero in KK, KBK, KNK (see material.cpp). The difference is:
- we lose the "TB pruning" benefit of the draw rule (ie. search goes on even if eval is zero)
- we gain some speed by removing a useless test from the hot path
STC:
LLR: 0.05 (-2.94,2.94) [-3.00,1.00]
Total: 128000 W: 21357 L: 21560 D: 85083
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 33023 W: 4613 L: 4509 D: 23901
bench 7461881
First, remove some dead code (function never called with a Move argument).
Then, remove printing of legal moves, which does not belong here. Let's keep commands orthogonal and minimal:
- the "d" command should display the board, nothing more, or less.
- "perft 1" will display the list of legal moves.
No functional change.
Stockfish allocates the default hash (32MB) in main(), before entering UCI::loop(). If there is not enough
memory, the program will crash even before UCI::loop() is entered and the GUI is given a change to specify a
lower Hash value.
This defective design could be resolved by doing a lazy allocation upon "isready" command, as the UCI protocol
guarantees that "isready" will be sent at least once before any search. But it's a bit cumbersome when using
Stockfish "manually" to have to remember to type "isready" everytime.
So leave the current design, but reduce the default hash to 16MB instread of 32MB. In order to perform such
quick searches (depth=13), there is no reason to use so much Hash anyway. Another benefit is to introduce a
bit of hash pressure in bench, which increases chances to detect rare bugs related to TT replacement, for
example.
This is not a functional change, although it obviously changes the bench.
bench 7461879
Instead of defining it both in ucioption.cpp and benchmark.cpp. Obviously changing the default Hash will
change the bench as a result.
No functional change.
The main purpose of perft is to help debugging. But without the breakdown in sum of perft(N-1), it is a
completely useless debugging tool.
So perft now displays the breakdown, and divide is therefore removed.
No functional change.
After commit 94b1bbb68b, in case available root moves are less than multiPV, we
could never reach condition:
PVIdx + 1 == multiPV
and as a consequence UCI output is not printed.
Fixed suggested by Joerg Oster.
No functional change.
Despite being neutral at STC, it turned out to be regressive at LTC:
40k games at LTC with Hash=8
ELO: -2.06 +-1.9 (95%) LOS: 1.4%
Total: 39720 W: 5740 L: 5976 D: 28004
40k games at LTC with Hash=128
ELO: -2.69 +-1.9 (95%) LOS: 0.2%
Total: 39149 W: 5702 L: 6005 D: 27442
bench 7477963
Also raise the admissible bounds to (-100,100), as there is no reason to prevent users from using high
values if they want to.
Does not regress in self play:
ELO: 0.10 +-2.0 (95%) LOS: 53.7%
Total: 40000 W: 7084 L: 7073 D: 25843
master vs SF 3
ELO: 182.86 +-2.7 (95%) LOS: 100.0%
Total: 40000 W: 21843 L: 2541 D: 15616
Contempt = 20 vs SF 3
ELO: 189.25 +-2.8 (95%) LOS: 100.0%
Total: 40000 W: 22721 L: 2859 D: 14420
Diff is therefore 6.4 +/- 3.9 elo against a 180-190 elo weaker engine, which is significantly positive,
as expected. This elo difference is likely understated, because of FishTest aggressive draw adjudication
though.
We could push Contempt further, but after 20cp, it would get in the way of FishTest draw adjudication
rule, and is likely to reduce the testing throughput as a result.
bench 8198667
Steamline a bit the implementation of
skill levels. As a side effect we can
retire MultiPV global and use a local
variable instead.
No functional change.
- Currently broken
- Never been really useful
- Does not work well with new splitting model
Verified for no regression at STC with 3 threads:
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 81905 W: 12122 L: 12381 D: 57402
No functional change
Bitboard init code is already noteasy to follow,
so don't make it even harder using 'smart' code.
Also reindent a while loop in standard way.
No functional change.
With Eelco's patch "Don't special case for abs(beta) >= VALUE_MATE_IN_MAX_PLY" condition "abs(ttValue) < VALUE_KNOWN_WIN" has been removed from singular extension search, and condition "abs(beta) < VALUE_KNOWN_WIN" was added to the SingularExtensionNode definition.
This might lead to problems, especially in positions, where a mate is due.
For example, this position 5rk1/4K1pp/8/5PPP/8/8/8/1R6 w - - 12 1 triggers an assert.
stockfish: search.cpp:434: Value {anonymous}::search(Position&, Search::Stack*, Value, Value, Depth, bool) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)2u; bool SpNode = false]: Assertion `-VALUE_INFINITE <= alpha && alpha < beta && beta <= VALUE_INFINITE' failed.
So let's re-insert the removed condition.
First spotted by Uri Blass, fix by me.
Bench: 8759675
* /boot/common was removed from Haiku
* The equivalent path now that package management has been implemented is /boot/system/non-packaged
No functional change
Bench: 8759681
The assert:
assert(ttValue != VALUE_NONE);
Could fire for multiple reasons (although is very rare),
for instance after an IID we can have ttMove != MOVE_NONE
while ttValue is still set at VALUE_NONE.
But not only this, actually SMP is a source of corrupted
ttValue and anyhow we can detect the condition:
ttMove != MOVE_NONE && ttValue == VALUE_NONE
even north of IID.
Reported by Ronald de Man.
It is so rare that bench didn't change.
bench: 7710548
Remove from the search this special case and apply
null search and razoring also in mate positions.
Tested in no-regression mode and passed both
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 65431 W: 10860 L: 10810 D: 43761
and LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 34928 W: 4814 L: 4713 D: 25401
This patch kicks in only in mate positions and in
these cases it seems beneficial in finding mates
faster as Yery Spark measured on the Chest mate suite:
Total number of positions 6425
Fixed nodes 200K per position
master: 1049
new: 1154
And also the 5446 'hard' positions again with 2000K nodes
(those not found by both engines in 200K nodes):
master: 1069
new: 1395
bench: 7710548
It seems this flag is only for gcc and
yields a warning under OSX Mavericks:
clang: warning: argument unused during compilation: '-ansi'
No functional change.
Here MSVC is worried that
StepAttacksBB[PAWN][psq]
could overflow, so change psq initialization
to clarify psq is always less than 64.
No functional change.
Don't take the split lock if we don't have
available slaves (about 30-40% of times).
This new condition allows to retire the now
redundant one on number of threads.
No functional change.
Split previous patch in 2 steps: first remove
the MOVE_NULL hack, then retire nullChild.
The first step is a prerequisite
for second one and affects bench.
The second step (next patch) just removes nullChild
without affecting bench.
bench: 8205159
Are broken for big-endian case and
I have verified with MSVC 2013 Premium
bench is correct and there is no
miscompilation, so the main reason
to change the original code drops.
No functional change.
Another attempt at retiring current asymmetric
king evaluation and use a much simpler symmetric
one. As a good side effect we can avoid recalculating
eval after a null move.
Tested in no-regression mode and passed
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 21580 W: 3752 L: 3632 D: 14196
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 18253 W: 2593 L: 2469 D: 13191
And a LTC regression test against SF DD to
verify we don't have regression against
weaker engines due to some kind of 'contempt'
effect:
ELO: 54.69 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 11072 L: 4827 D: 24101
bench: 8205159
Before it was working by accident in case of
see_sign() and failing with see() due to how
castle moves are coded (king captures the rook).
Better to explicitly filter out castling moves
and use see() without any surprise/trick.
No functional case.
Book handling belongs to GUI, we kept this code
for historical reasons, but nowdays there is
really no need of this old, (mostly) unused
and especially incorrect designed functionality.
It is up to the GUI to choose the book (far easier for
the user) and to select the book parameters. In no
place, including fishtest, TCEC, rating lists, etc.
the "own book" is used, moreover currently SF is
released without any book and even if in the future we
bundle a book in the release package, it will be the GUI
that will take care of it.
This corrects a wrong design decision that Galurung
and later Stockfish inherited from what was common
practice many yeas ago.
No functional change.
There is really little that user can achieve (apart
from a weakened engine) tweaking these parameters
that are already tuned and have no immediate or visible
effect.
So better do not expose them to the user and avoid the
typical "What is the best setup for my machine?" kind of
question (by far the most common, by far the most useless).
No functional change.
To show perft numbers for each move. Just
use 'divide' instead of 'perft', for instance:
position startpos moves e2e4 e7e5
divide 4
Inspired by Ronald de Man.
No functional change.
Retire current asymmetric king evaluation
and use a much simpler symmetric one.
As a side effect retire the infamous
'Aggressiveness' and 'Cowardice' UCI
options.
Tested in no-regression mode,
Passed both STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33855 W: 5863 L: 5764 D: 22228
And LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40571 W: 5852 L: 5760 D: 28959
bench: 8321835
At root we start counting plies from 1,
instead pv[] array starts from 0. So
the variable 'ply' we use in extract_pv_from_tt
to index pv[] is misnamed, indeed it is
not the real ply, but ply-1.
The fix is to leave ply name in extract_pv_from_tt
but assign it the correct start value and
consequentely change all the references to pv[].
Instead in insert_pv_in_tt it's simpler to rename
the misnamed 'ply' in 'idx'.
The off-by-one bug was unhidden when trying to use
'ply' for what it should have been, for instance in
this position:
position fen 8/6R1/8/3k4/8/8/8/2K5 w - - 0 1
at depth 24 mate line is erroneusly truncated due
to value_from_tt() using the wrong ply.
Spotted by Ronald de Man.
bench: 8732553
If razoring conditions are satisfied and
depth is low, then directly drop in qsearch.
Passed both STC
LLR: 2.98 (-2.94,2.94) [-1.50,4.50]
Total: 12914 W: 2345 L: 2208 D: 8361
And LTC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 50600 W: 7548 L: 7230 D: 35822
bench: 8739659
When there aren't legal moves after
a search, instead of returning imediately,
save bestValue in TT as in the usual case.
There is really no reason to special case
this one.
With this patch is fully fixed (again) follwing
position:
7k/6p1/6B1/5K1P/8/8/8/8 w - - 0 1
Also in SMP case.
bench: 8802105
After last Joona's patch there is no measurable
difference between the option set or unset.
Tested by Andreas Strangmüller with 16 threads
on his Dual Opteron 6376.
After 5000 games at 15+0.05 the result is:
1 Stockfish_14050822_T16_on : 3003 5000 (+849,=3396,-755), 50.9 %
2 Stockfish_14050822_T16_off : 2997 5000 (+755,=3396,-849), 49.1 %
bench: 880215
Instead of waiting to be allocated, actively search
for another split point to join when finishes its
search. Also modify split conditions.
This patch has been tested with 7 threads SMP and
passed both STC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2885 W: 519 L: 410 D: 1956
And a reduced-LTC at 25+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 4401 W: 684 L: 566 D: 3151
Was then retested against regression in 3 thread case
at standard LTC of 60+0.05:
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 40809 W: 5446 L: 5406 D: 29957
bench: 8802105
On a final fixed game number test it failed
to prove better than standard version.
STC 15+0.05
ELO: -0.86 +-1.7 (95%) LOS: 15.8%
Total: 57578 W: 10070 L: 10213 D: 37295
bench: 8802105
Unfortunatly we have a slow down that causes
a regression in STC with no-regression mode:
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 22454 W: 3836 L: 4029 D: 14589
bench: 8678654
The bug was found to be elsewhere. This version
is correct and also is able to detect as draw
positions like:
8/8/5b2/8/8/4k1p1/6P1/5K2 b - - 6 133
bench: 8678654
Position is win also if strong side has a bishop
and a knight (plus other material, otherwise
KBNK would be triggered instead of KXK).
This fixes a subtle bug where a search on position
k7/8/8/8/8/P7/PB6/K7 b - - 6 1
Instead of returning a draw score, suddendly returns
a big score. This happens because at one point in
search we reach this position:
8/Pk6/8/8/8/4B3/P7/K7 w - - 3 8
Where white can promote. In case of rook promotion (and also in case of
queen promotion) the resutling position gets a huge static eval that is
above VALUE_KNOWN_WIN (from the point of view of white). So for rook
promotion it is
&& futilityBase > -VALUE_KNOWN_WIN
that prevents futility pruning in qsearch. (Removing this condition indeed
lets the problem occur). Raising the static eval for K+B+N+X v K to a value
higher than VALUE_KNOWN_WIN fixes this particular problem without having to
introduce an extra futility pruning condition in qsearch.
I just checked and it seems K+R v K, K+2B v K and even K+B+N v K already get
a huge static eval. Why not K+B+N+P v K?
I think this fix corrects an oversight. There is special code for KBNK, but
KBNXK is handled by KXK, so the test for sufficient material should also test
for B+N.
bench: 8678654
In the (rare) cases when the two conditions
are true, then fully check again with a slow
but correct MoveList<LEGAL>(pos).size().
This is able to detect false positives like
this one:
8/8/8/Q7/5k1p/5P2/4KP2/8 b - - 0 17
When we have a possible simple pawn push that
is not stored in attacks[] array. Because the
third condition triggers very rarely, even if
it is slow, it does not alters in a measurable
way the average speed of the engine.
bench: 8678654
Currently a stealmate position is misevaluated
in a negative/positive score, this leads qsearch(),
that does not detects stealmates too, to return the
wrong score and this yields to some kind of endgames
to be completely misevaluated.
With this patch is fully fixed follwing position
7k/6p1/6B1/5K1P/8/8/8/8 w - - 0 1
Also in SMP case.
Correct root cause analysys by Ronald de Man.
bench: 8678654
After reverting to the original Tord's
endgame, a search on position
7k/6p1/6B1/5K1P/8/8/8/8 w - - 0 1
Reports, correctly, a draw score instead of
an advantage for white.
Issue reported by Uri Blass.
bench: 8678654
Remove the optimization for Intel, is not
standard and can break at any time, moreover
our release build is not done with Intel C++
anymore so we don't need to sqeeze the extra
speed out from this compiler.
No functional change.
If we return from split with a stale value
due to a stop or a cutoff upstream occurred,
then we exit moves loop and save a stale value
in TT before returning search().
This patch, from Joona, fixes this.
bench: 8678654
We can never have bestValue == -VALUE_INFINITE at
the end of move loop because if no legal move exists
we detect it with previous condition on !moveCount,
if a legal move exists we never prune it due to
futility pruning condition:
bestValue > VALUE_MATED_IN_MAX_PLY
So this code never executes, as I have also verified
directly.
Issue reported by Joona.
No functional change.
Intel compiler is very picky:
"error: this operation on an enumerated type requires an
applicable user-defined operator function"
Reported by Tony Gaor.
No functional change.
Put the division at the end to reduce
rounding errors. This alters the bench
due to different rounding errors, but
should not alter ELO in any way.
bench: 7615217
This apparentely silly tweak allows
to speed up the bench by almost 3%.
Not clear why, repeating with perft,
the speed up vanishes.
Suggested by Jonathan Calovski.
No functional change.
Believed to be a speed optimization as benched
on Windows with bench realtime affinity 0x1 deleting
highest and lowest runs:
Base Test
1549259 1608202
1538115 1583934
1543168 1556938
1536365 1554179
1533026 1582010
Signature remains unchanged and gives anywhere from 1-2% nps
boost in analysis depending on number of cores used.
No functional change.
This is a very discussed patch with many
argumentations pro and against. The fact is
it passed both STC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16305 W: 3001 L: 2855 D: 10449
And LTC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 34273 W: 5180 L: 4931 D: 24162
Although it is true that a correct test should
include foreign engines, we commit it anyhow so
people can test it out in the wild, under broader
conditions.
bench: 7384368
Idea from Lyudmil Tsvetkov.
The value seems to be raised a bit abruptly, but as
Gary said, a blocked pawn on the sixth rank has been
instrumental in limiting king mobility in multiple
losses that I've seen from SF. A blocked pawn on fifth
rank is much less serious on the king safety impact.
Passed both STC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 14551 W: 2750 L: 2607 D: 9194
and LTC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 43595 W: 6917 L: 6618 D: 30060
And even a retest at 60" fixed games 40K
ELO: 1.79 +-1.9 (95%) LOS: 97.0%
Total: 39889 W: 6018 L: 5813 D: 28058
bench: 7154916
Adding BMI1 allows the compiler to use _blsr_u64
automatically (the advertised 0.3% speed gain).
I verified that the compiler does not use this
instruction with the -mbmi2 flag only. Also, all
processors supporting BMI2 is also supporting BMI1.
No functional change
Prefer
file_of(s) < file_of(ksq)
to the inidrect
file_of(ksq) < FILE_E
To evaluate if semiopen side to check is the left side.
Also other small touches while there.
No functional change.
Right now the Makefile is cluttered with OS X equivalents
of all the x86 targets. We can get rid of all of them and
just check UNAME against "Darwin" for the few OS X-specific
things we need to do.
We also disable Clang LTO when using BMI2 instructions. For
some reason, LLVM cannot find the PEXT instruction when using
LTO. I don't know why, but disabling LTO for BMI2 fixes it.
No functional change.
Intel Haswell and newer CPUs can calculate sliders
attacks using special PEXT asm instructions instead
of magic bitboards. This gives a +3% speed up.
To enable it just compile with ARCH=x86-64-bmi2
No functional change.
Retire software pext and introduce hardware
call when USE_PEXT is defined during compilation.
This is a full complete implementation of sliding
attacks using PEXT.
No functional change.
Reshuffle functions to define them in reverse
calling order (C style).
This allow us to define templates before they are
used. Currently it is not like this, for instance
evaluate_pieces is defined after do_evaluate that
calls it. This happens to work for some strange
reason (two phase lookup?) but we want to avoid
code that works 'by magic'.
As a nice side-effect we can now remove the function
prototypes.
No functional change.
This is more consistent with what other engines are doing.
Often people thinks that SF's scores are overblown. In the
end, it just boils down to the arbitrary way of rescaling them.
No functional change.
I have noticed that increasing the bench depth produces
progressively smaller and slightly faster executables at
the cost of longer compile times. Also using bench "time"
instead of "depth" seems to produce slightly smaller/faster
executables given comparable compile times.
I have made a new Makefile that generates smaller and
about 1% to 2% faster profile executables at only a
little extra compile time. On my mobile 2GHz i7 a
full profile build time goes from 3'48" to 4'13" and
the exe goes down by 5% from 416,310 bytes to 395,567
bytes.
No functional change.
Small simplification.
Passed SPRT(-3,1) both at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 17051 W: 3132 L: 3005 D: 10914
and LTC:
LLR: 4.55 (-2.94,2.94) [-3.00,1.00]
Total: 24890 W: 3842 L: 3646 D: 17402
The rationale behind this is that I've never managed to add a
Queen on 7th rank bonus in DiscoCheck, because it never showed
to be positive (evne slightly) in testing. The only thing that
worked is Rook on 7th rank.
In terms of SF code, it seemed natural to group it with QueenOnPawn
as well as those are done together. I know you're against groupping
in general, but when it comes to non regression test, you are being
more conservative by groupping. If the group passes SPRT(-3,1) it's
safer to commit, than test every component in SPRT(-3,1) and end up
with the risk of commiting several -1 elo regression instead of just
one -1 elo regression.
In chess terms, perhaps it's just easier to manouver a Queen (which
can more also diagonaly) than a Rook. Therefore you can let the search
do its job without needing eval ad-hoc terms to guide it. For the Rook
which takes more moves to manouver such eval terms can be (marginally)
useful.
bench: 7473314
We chose this instead of negamax sign convention
(ie. from the point of view of the side to move)
because it is more in line to how the eval
table is presented.
Also some tweak to formatting while there.
No functional change.
In some legal positions like this one:
R6R/3Q4/1Q4Q1/4Q3/2Q4Q/Q4Q2/Np1Q4/kB1N1KB1 b -- 0 1
We can have a very high score, in this case 30177 and 29267
for midgame and endgame respectively, and because
VALUE_INFINITE = 30001 we have an assert in interpolate()
Midgame and endgame scores are stored in 16 bit signed integers
so we can rise VALUE_INFINITE a little bit. This does not fix
the possibility of overflow in general case, just makes the
condition more difficult to trigger and anyhow better uses all
the score width.
Raising VALUE_INFINITE to 32000 seems to fix the problem for this
particular case.
No functional change.
Tested directly at LTC because previous long
test series on this topic shows it is TC dependant.
Tested with no-regression mode because gets rid of
an ugly and ad-hoc rule.
Test at LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 67918 W: 10590 L: 10541 D: 46787
bench: 7926803
Here the new idea is to link pinned pieces
with king safety.
Passed both STC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 10047 W: 1867 L: 1737 D: 6443
And LTC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 10419 W: 1692 L: 1543 D: 7184
bench: 8325087
We add a penalty for each pawn which is not protected by another pawn
of the same color.
Passed both short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 12107 W: 2411 L: 2272 D: 7424
And long TC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 9204 W: 1605 L: 1458 D: 6141
bench: 7682173
We want all the UCI options are printed in the order in which are
assigned, so we use an index that, depending on Options.size(),
increases after each option is added to the map. The problem is
that, for instance, in the first assignment:
o["Write Debug Log"] = Option(false, on_logger);
Options.size() can value 0 or 1 according if the l-value (that
increments the size) has been evaluated after or before the
r-value (that uses the size value).
The culprit is that assignment operator in C++ is not a
sequence point:
http://en.wikipedia.org/wiki/Sequence_point
(Note: to be nitpick here we actually use std::map::operator=()
that being a function can evaluate its arguments in any order)
So there is no guarantee on what term is evaluated first and
behavior is undefined by standard in this case. The net result
is that in case r-value is evaluated after l-value the last
idx is not size() - 1, but size() and in the printing loop
we miss the last option!
Bug was there since ages but only recently has been exposed by
the removal of UCI_Analyze option so that the last one becomes
UCI_Chess960 and when it is missing engine cannot play anymore
Chess960.
The fix is trivial (although a bit hacky): just increase the
last loop index.
Reported by Eric Mullins that found it on an ARM and MIPS
platforms with gcc 4.7
No functional change.
Thanks to std::bitset we can easily increase
the limit of active threads above 64.
Thanks to Lucas Braesch for pointing at the
correct solution of using std::bitset.
No functional change.
Using memset on a std::vector is undefined behavior,
so manually init all the data memebers of LimitsType.
Bug intorduced in 41641e3b1e
No functional change.
Because we test for available slaves before
entering split(), we almost always allocate a
slave, only in the rare case of a race (less
then 2% of cases) this is not true, but to
special case this occurrence is not worth
the added complexity.
bench: 7451319
Split delta value in aspiration window so that when
search depth is less than 24 a smaller delta value
is used. The idea is that the search is likely to
be more accurate at lower depths and so we can exclude
more possibilities, 25% to be exact.
Passed STC
LLR: 2.96 (-2.94, 2.94) [-1.50, 4.50]
Total: 20430 W: 3775 L: 3618 D: 13037
And LTC
LLR: 2.96 (-2.94, 2.94) [0.00, 6.00]
Total: 5032 W: 839 L: 715 D: 3478
Bench: 7451319
During endgame initialization we get the material
hash key of each endgame forging and ad-hoc position
that in same cases is illegal (leaves teh king under
capture). This is not a problem for the material key,
but rises an assert when SF is run in debug mode with
'testKingCapture' set in pos_is_ok().
So rewrite the code to always produce legal positions.
No functional change.
Big simplification of pawn move check.
Code has been tested with a brute force approach: for
every position reached during a bench search, the function
has been called for each combinations of Move(from, to)
and verified the result is the same of old code.
Actually this function is very critical becuase is the
one that ensures corrupted TT moves are discarded, so
to properly test it a simple bench is not enough.
Verified also speed is not changed.
No functional chnage.
It has been obsoleted out already some time ago
and currently there is no point in changing eval
score according to if we are in game or analyzing.
So retire the option.
No functional change.
Try to avoid repetition draws at early midgame,
this should give an edge against weaker opponents
and reduce draw rate.
Tested for regressions with SPRT[-3, 1] and
passed both short TC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 68498 W: 12928 L: 12891 D: 42679
And long TC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40212 W: 6386 L: 6295 D: 27531
bench: 7990513
When running the following position:
8/kPp5/2P3p1/p1P1p1P1/2PpPp2/3p1p2/3P1P2/5K2 w - - 0 1
An assert is raised at depth 92:
assert(-VALUE_INFINITE <= alpha && alpha < beta && beta <= VALUE_INFINITE);
This is because it happens that beta = 29832,
so rbeta = 30032 that is > VALUE_INFINITE
Bug spotted and analyzed by Uri, fix suggested by Joerg.
Other fixes where possible but this one is pointed
exactly at the source of the bug, so it is the best
from a code documentation point of view.
bench: 8430785
Under some very rare case 100 plies of search
could be not enough. Increasing more could lead
to crashes due to reached stack size limit on
some platforms.
Strongly requested by Uri.
bench: 8430785
Currently king has no material key associated because
it can never happen to find a legal position without
both kings, so there is no need to keep track of it.
The consequence is that a position with only the two
kings has material key set at zero and if the material
hash table is empty any entry will match and this is
wrong.
Normally bug is hidden becuase the checking for a draw
with pos.is_draw() is done earlier than evaluate() call,
so that we never check in gameplay the material key of a
position with two kings.
Nevertheless the bug is there and can be reproduced setting
at startup a position with only two kings and typing
'eval' from prompt.
The fix is very simple: add a random key also for the king.
Also fixed the condition in material.cpp to avoid asserting
when a 'just 2 kings' postion is evaluated.
No functional change.
Actually MultiCut is too different from current scheme.
Note that neither ProbCut is exactly what we do because
we try just a handful of captures instead of all moves,
nevertheless it seems more in line with what we do.
Suggested by Joona.
No functional change.
Makes more sense than returning a draw score. Tested
with reduced MAX_PLY = 30 and passed both short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 17434 W: 3345 L: 3194 D: 10895
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 2610 W: 488 L: 373 D: 1749
With current limit of MAX_PLY = 100 the patch should not
introduce any measurable change, nevertheless is the correct
approach.
Idea of returning eval is from Michel Van den Bergh.
bench: 8430785
Fix small overflow error while converting
magic boosters from right rotate to left rotate,
in particular booster 38 was converted to 4122
instead of the corrcet value 26.
Formula used was:
s1 = original & 63, s2 = (original >> 6) & 63;
new = (64 - s1) | ((64 - s2) << 6);
Instead of:
s1 = original & 63, s2 = (original >> 6) & 63;
new = ((64 - s1) & 63) | (((64 - s2) & 63) << 6);
This has no impact in number of cycles needed, but
just in the resultig number that yields to a rotate
amount bigger than 63.
Spotted by Ehsan Rashid.
No functional change.
Latest master triggers a compiler warning due
to comparing int64_t to uint64_t.
notation.cpp: In Funktion »std::string pretty_pv(Position&, int, Value, int64_t, Move*)«:
notation.cpp:230:30: Warnung: Vergleich zwischen vorzeichenbehafteten und vorzeichenlosen Ganzzahlausdrücken [-Wsign-compare]
This patch should fix it.
No functional change.
When initializing the magic numbers used to
compute sliding attacks, we endless generate a
random and test it as a possible magic.
In the general case this takes a lot of iterations,
but here, insteaad of picking a casual random, we
rotate it a couple of times and generate a number that
we know has a good probability to be a magic candidate.
This is becuase the quantities by which we rotate the
number are known in advance to produce quickly a good
canidate.
The patch, inspired by DON, just moves the shuffle to RKISS
changing the boosters to take in account a left rotation
instead of a right rotation as in the original.
No functional change.
Although does not change ELO level, it seems
verification is useful in many zugzwang positions
as reported by many sources.
So revert this simplification.
bench: 8430785
Another SEE speed up that passed the SPRT short TC test!
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 81337 W: 15060 L: 14745 D: 51532
No functional change.
Actually race conditions do exist in an engine, just
think for a moment to TT concurrent access. Racy code
is not a problem per se, if the consequences are well
known and correctly handled.
In case of TT access we ensure that the TT move is validated
before to be tried, here we just retry the same move in less
that 1 case out of a million: this is totally harmless considering
that very probably the second time the move is tried we get
immediately a TT hit and search quickly returns.
So we simplify the code for no harm.
No fuctional change (in single thread case)
Tested with SPRT in simplification mode [-4.00,0.00],
this ensures that the patch is (very probably) not
a regression.
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 27543 W: 4278 L: 4209 D: 19056
And long TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 39483 W: 7325 L: 7305 D: 24853
bench: 8347121
Hopefully this patch makes the code more:
* Self-documenting: Null search is always a zero window search,
because it is testing for a fail high. It should never be done
on a full window! The current code only works because we don't
do it at PV nodes, and therefore (alpha, beta) = (beta-1, beta):
that's the kind of "clever" trick we should avoid.
* Idiot-proof: If we want to enable null search at PV nodes, all we
need to do now is comment out the !PvNode condition. It's that simple!
In theory, null search should not be done at PV nodes, because PV nodes
should never fail high. But in practice, they DO fail high, because of
aspiration windows, and search inconsistencies, for example. So it makes
sense to keep that flexibility in the code.
No functional change.
Use ralpha instead of rbeta
* rbeta is confusing people. It took THREE attempts to code razoring
at PV nodes correctly in a recent test, because of the rbeta trick.
Unnecessary tricks should be avoided.
* The more correct and self-documenting way of doing this, is to say
that we use a zero window around alpha-margin, not beta-margin.
The fact that, because we only do it at PV nodes, alpha happens to be
beta-1 and that the current stuff with rbeta works, may be correct,
but is confusing.
Remove the misleading and partially erroneous comment about returning
v + margin:
* comments should explain what the code does, not what it could have done.
* this comment is partially wrong in saying that v+margin is "logical",
and that it is "surprising" that is doesn't work.
From a theoretical perspective, at least 3 ways of doing this are equally
defendable:
1/ fail hard: return alpha: The most conservative. We bet that the search
will fail low, but we don't know by how much and don't want to take risks.
2/ aggressive fail soft: return v (what the current code does). This
corresponds to normal fail soft, with the added assumption that we don't
care about the reduction effect (see below point 3/)
3/ conservative fail soft: return v + margin. If the reduced search (qsearch)
gives us a score <= v, we bet that the non reduced search will give us a
score <= v + margin.
* Saying that 2/ is "logical" implies that 1/ and 3/ are not, which is
arguably wrong. Besides, experimental results tell us that 2/ beats 3/,
and that's not something we can argue against: experimental results are
the only trusted metric.
* Also, with the benefit of hindsight, I don't think the fact that 2/ is
better than 3/ is surprising at all. The point is that it is YOUR turn to
move, and you are assuming that by NOT playing (and letting the opponent
capture your hanging pieces in QS) you cannot generally GAIN razor_margin(depth).
No functional change.
With some positions like
8/8/8/2p2K2/1pp5/br1p1b2/2p2r2/qqkqq3 w - -
The eval score is higher than VALUE_INFINITE because
is the sum of VALUE_KNOWN_WIN plus a big material
advantage. This leads to an assert. Here are the
steps to reproduce:
Compile SF with debug=yes then do
./stockfish
position fen 8/8/8/2p2K2/1pp5/br1p1b2/2p2r2/qqkqq3 w - -
go depth 1
This patch fixes the issue in this case, but do exsist
other positions for which the patch is not enough and
we will need to limit the eval score to be sure not
overflow the limit.
Note that is not possible to increase the value of
VALUE_INFINITE because should remain within int16_t
type to be stored in a TT entry.
bench: 7356053
Depth is already dependent on the actual value
of ONE_PLY, in particular can be expressed like:
Depth = n * ONE_PLY
And because formula is used to calculate R that is
also dependent on the value of ONE_PLY and can be
expressed like:
R = x * ONE_PLY
We don't want to divide depth by a 'ply' value but
directly by an integer number.
Spotted by sf-x
No functional change.
Instead of a fixed reduction of ONE_PLY, now
Null move dynamic reduction based on value can
grow larger in case we are above beta of a value
much higher then PawnValueMg.
Note that now an eval returning VALUE_KNOWN_WIN
makes null search to drop in qsearch.
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 26141 W: 4871 L: 4699 D: 16571
And long TC:
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 33695 W: 5309 L: 5056 D: 23330
bench: 7356053
Verified there are no hidden bugs and is
actually a speed optimization:
Fixed games at 15+0.05 TC
ELO: 1.72 +-2.9 (95%) LOS: 87.5%
Total: 20000 W: 3741 L: 3642 D: 12617
No functional change
When time remaining is less than Emergency Move Time,
we won't even complete one iteration and engine reports
a stale +M0 score.
To reproduce run "go wtime 10"
info depth 1 seldepth 1 score mate 0 upperbound nodes 2 nps 500 time 4 multipv 1 pv a2a3
info nodes 2 time 4
bestmove a2a3 ponder (none)
This patch fixes the issue.
Tested by Binky at very short TC: 0.05+0.05
ELO: 5.96 +-12.9 (95%) LOS: 81.7%
Total: 1458 W: 394 L: 369 D: 695
And at a bit longer TC:
ELO: 1.56 +-3.7 (95%) LOS: 79.8%
Total: 16511 W: 3983 L: 3909 D: 8619
bench: 7804908
TCEC season 3, which is due to start in a few weeks, just
had its server upgraded to 64GB RAM and will therefore allow
16GB hash to be used per engine.
This is almost the upper limit without changing the
type of size and hashMask. After this we need to
move to uint64_t instead of uint32_t.
No functional change.
Retire KmmKm evaluation function. Instead give a very drawish
scale factor when the material advantage is small and not much
material remains.
Retire NoPawnsSF array. Pawnless endgames without a bishop will
now be scored higher. Pawnless endgames with a bishop pair will
be scored lower. The effect of this is hopefully small.
Consistent results both at short TC (fixed games):
ELO: -0.00 +-2.1 (95%) LOS: 50.0%
Total: 40000 W: 7405 L: 7405 D: 25190
And long TC (fixed games):
ELO: 0.77 +-1.9 (95%) LOS: 78.7%
Total: 39690 W: 6179 L: 6091 D: 27420
bench: 7213723
When we have a fail-high of a quiet move, store it in
a Followupmoves table indexed by the previous move of
the same color (instead of immediate previous move as
is in countermoves case).
Then use this table for quiet moves ordering in the same
way we are already doing with countermoves.
These followup moves will be tried just after countermoves
and before remaining quiet moves.
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10350 W: 1998 L: 1866 D: 6486
And long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 14066 W: 2303 L: 2137 D: 9626
bench: 7205153
This hacky rule allows to get an about right eval out of this position:
r2qk2r/ppp2p2/2npbn2/2b1p3/2P1P1P1/2NB1PPp/PPNP3K/R1BQ1R2 b kq - 0 13
And, more importantly, passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 6239 W: 1249 L: 1127 D: 3863
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 38371 W: 6165 L: 5888 D: 26318
bench: 8183238
As pointed out by Joona, Lucas and otehr people in
the forum, this endgame is not a known, there are many
positions where it takes more than 50 moves to claim the
win and becasue exact rules is not possible better to
retire and allow the search to workout the endgame for us.
bench: 8502826
While editing original Uri's messy patch
I have incorrectly simplified the logic
condition. Here is the correct original
version, as it was tested.
bench: 8502826
Remove the easy move code and add the condition to
play instantly if only one legal move is available.
Verified there is no regression at 60+0.05
ELO: 0.17 +-1.9 (95%) LOS: 57.0%
Total: 40000 W: 6397 L: 6377 D: 27226
bench: 8502826
If we are still at first move, without a fail-low and
current iteration is taking too long to complete then
stop the search.
Passed short TC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 26030 W: 4959 L: 4785 D: 16286
Long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 18019 W: 2936 L: 2752 D: 12331
And performed well at 40/30
ELO: 4.33 +-2.8 (95%) LOS: 99.9%
Total: 20000 W: 3480 L: 3231 D: 13289
bench: 8502826
First tested with 50K games at very short TC of 5+0.05
ELO: 3.11 +-2.0 (95%) LOS: 99.9%
Total: 49665 W: 10941 L: 10497 D: 28227
Then retested with usual SPRT at short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16875 W: 3198 L: 3049 D: 10628
And at long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 5890 W: 985 L: 857 D: 4048
bench: 7800379
In case ply is very high, function will round
to zero (although mathematically it is always
bigger than zero). On my system this happens at
movenumber 6661.
Although 6661 moves in a game is, of course,
probably impossible, for safety and to be locally
consistent makes sense to ensure returned value
is positive.
Non functional change.
Function move_importance() is already always
positive, so we don't need to add a constant
term to ensure it.
Becuase move_importance() is used to calculate
ratios of a linear combination (as explained in
previous patch), result is not affected. I have
also verified it directly.
No functional change.
Drop a useless parameter. This works because ratio1 and ratio2
are ratios of linear combinations of thisMoveImportance and
otherMovesImportance and so the yscale cancels out.
Therefore the values of ratio1 and ratio2 are independent
of yscale and yscale can be retired.
The same applies to yshift, but here we want to ensure
move_importance() > 0, so directly hard-code this safety
guard in function definition.
Actually there are some small differences due to rounding errors
and usually are at most few millisecond, that's means below 1% of
returned time, apart from very short intervals in which a difference
of just 1 msec can raise to 2-3% of total available time.
No functional change.
The flag raises also in case of a pawn duo, i.e.
when we have two adjacent pawns on the same rank,
and not only in case of a chain, i.e. when the two
pawns are on a diagonal line.
See this for a reference:
http://en.wikipedia.org/wiki/Connected_pawns
Renaming suggested by Ralph.
No functional change.
Use a skew-logistic function to replace the
MoveImportance[] array.
Verified it does not regress at fixed number
of games both at short TC:
LLR: -2.91 (-2.94,2.94) [-1.50,4.50]
Total: 39457 W: 7539 L: 7538 D: 24380
And long TC:
ELO: -0.49 +-1.9 (95%) LOS: 31.0%
Total: 39358 W: 6135 L: 6190 D: 27033
bench: 7335588
Passed short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 18331 W: 3608 L: 3453 D: 11270
And scored above 50% on a very long test in long TC
LLR: -2.97 (-2.94,2.94) [0.00,6.00]
Total: 51533 W: 8181 L: 8047 D: 35305
bench: 7335588
In endgame it's better to have pawns on both wings.
So give a bonus according to file distance between left
and right outermost pawns.
Passed both short TC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 39073 W: 7749 L: 7536 D: 23788
And long TC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 6149 W: 1040 L: 910 D: 4199
bench: 7665034
Reduce eval discontinuity becuase now we kick in
king safety evaluation in many more cases.
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8708 W: 1742 L: 1613 D: 5353
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 6743 W: 1122 L: 990 D: 4631
bench: 6835416
Tighter lower bound for pawn attacks so to
activate king safety in some cases like here:
6k1/2B3p1/2Pp1p2/2nPp3/2Q1P2K/P2n1qP1/R6P/1R6 w
Original patch by Chris, further simplified by
Jörg Oster.
Passed both short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 30171 W: 5887 L: 5700 D: 18584
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 20706 W: 3402 L: 3204 D: 14100
bench: 7607562
Add a bonus according if the attacking
pieces are minor or major.
Passed both short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 13142 W: 2625 L: 2483 D: 8034
And long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 18059 W: 3031 L: 2844 D: 12184
bench: 7425809
A great simplification that shows no regression
and it seems even a bit scalable.
Tested with fixed number of games:
Short TC
ELO: 0.60 +-2.1 (95%) LOS: 71.1%
Total: 39554 W: 7477 L: 7409 D: 24668
Long TC
ELO: 2.97 +-2.0 (95%) LOS: 99.8%
Total: 36424 W: 5894 L: 5583 D: 24947
bench: 8184352
Change updating rule after a TT hit to match
the same one at the end of the search.
Small change in functionality, but we want to
have uniform rules in the code.
bench: 7767864
We already update killers so it is natural to extend to
history and counter move too.
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 52690 W: 9955 L: 9712 D: 33023
And long TC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 5555 W: 935 L: 808 D: 3812
bench: 7876473
After a fail high in LMR, if reduction is very high do
a research at lower depth before teh full depth one.
Chances are that the re-search will fail low and the
full depth one is skipped.
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 11363 W: 2204 L: 2069 D: 7090
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 7292 W: 1195 L: 1061 D: 5036
bench: 7869223
Since all ENPASSANT moves are now considered dangerous, this
change of order should give a slight speedup.
Also simplify futilityValue formula.
No functional change.
We avoid to use an ad-hoc table at the cost of a
relative_rank() call in advanced_pawn_push().
On my 32 bit system it is even slightly faster (on 64bit
may be different). This is the speed in nps alternating
old and new bench runs:
new
368890
368825
369972
old
367798
367635
368026
No functional change.
Instead of a passed pawn now we just require the pawn to
be in the opponent camp to be considered a dangerous
move. Added some renaming to reflect the change.
Passed both short TC test
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10358 W: 2033 L: 1900 D: 6425
And long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 21459 W: 3486 L: 3286 D: 14687
bench: 8322172
To align to same named Position function and
avoid using std::cout directly.
Also remove some stale <iostream> include while
there.
No functional change.
Add a Mac SSE4.2 target. Also change the Mac OS X minimum version to
10.6. Rationale: 97% of Macs run at least 10.6, version 10.9 is now
free, and using 10.6 as the minimum version gives a small 5% boost in
benchmark speed over versions using 10.0 as the minimum version.
Finally, enable Clang’s Link Time Optimization when compiling for the
Mac.
No functional change.
An old idea retested at SPRT(0, 3) with 60+0.05 TC:
LLR: 2.95 (-2.94,2.94) [0.00,3.00]
Total: 98872 W: 15549 L: 15123 D: 68200
This is a very small elo increase patch so it really
stresses the limits of fishtest.
bench: 8596156
It seems to intorduce a regression when tested
with 3 threads at 15+0.05:
ELO: -2.26 +-2.2 (95%) LOS: 2.4%
Total: 30000 W: 4813 L: 5008 D: 20179
bench: 8331357
Tested setting FakeSplit to true and running
./stockfish bench 128 2
There is a different signature with and without
the patch so it affects functionality but
only in SMP case.
bench: 8331357
SMP case is very tricky and raises an assert in stage_moves():
assert(stage == KILLERS_S1 || stage == QUIETS_1_S1 || stage == QUIETS_2_S1)
So rewrite the code to just return moves[] when we are sure
we are in quiet moves stages.
Also rename stage_moves to quiet_moves to reflect that.
No functional change (but needs testing in SMP case)
Use MovePicker moves[] to access already tried
quiet moves. A bit of care shall be taken
to avoid calling stage_moves() when we are still
at ttMove stage, because moves are yet to be
generated. Actually our staging move generation
makes this code a bit more tricky than what I'd
like, but removing an ausiliary redundant
array like quietsSearched[] is a good thing.
Idea by DiscoCheck
bench: 9355734
Use the newly introduced LineBB[] to simplify this
super hot-path function.
Verified with perft we don't have any speed regression, although
the number of squares removed is less than before in case of
contact check.
Insipred by DiscoCheck implementation.
Perft numbers are the same, but we have an harmless functional
change due to reorder of moves, because now some illegal moves
are no more detected at generation time, but in the search.
bench: 8331357
This seems a die hard idea :-)
Passed both short TC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 17485 W: 3307 L: 3156 D: 11022
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 38181 W: 6002 L: 5729 D: 26450
bench: 8659830
Actually, it is not used, as both arrays have the
same values. Some local tests in either direction
showed no improvement.
Also some minor corrections in the comments.
No functional change.
Previously some squares could be "incorrectly" awarded
to a pinned piece.
e.g. in 3k4/1q6/3b4/3Q4/8/5K2/B7/8 b - - 0 1 the black
bishop get 4 squares too many and the white queen gets 6.
Passed both short TC.
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 4871 W: 934 L: 817 D: 3120
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 38968 W: 6113 L: 5837 D: 27018
bench: 9282549
But compensate by reducing rook and queen
value by 53 = (160 / 3)
Material imbalances are affected as follows:
Red. Major Rook Queen Total
QRR +160 -2*53 -53 +1
QR +160 -53 -53 +54
RR +160 -2*53 0 +54
R 0 -53 0 -53
Q 0 0 -53 -53
so that the imbalance changes by at most 54 + 53 = 107 units.
This corresponds to appromximately 3.5cp in the final evaluation.
Verified with fixed number 40000 games at both short
and long TC it does not regress.
Short TC 15+0.05
ELO: 1.93 +-2.1 (95%) LOS: 96.6%
Total: 40000 W: 7520 L: 7298 D: 25182
Long TC 60+0.05
ELO: -0.33 +-1.9 (95%) LOS: 36.5%
Total: 39663 W: 6067 L: 6105 D: 27491
bench: 6703846
As, Gary (that analyzed the bug) says:
SF does not print a PV when the original best move fails low,
we hit our time allowance, and stop the search. The output from
the SF search is below. It was failing low on Ne1 at depth 34.
Then, we get bestmove Qd3, but no PV change.
info depth 34 seldepth 45 score cp 38 upperbound nodes 483484489 nps 15464575 time 31264 multipv 1 pv f3e1 h5h4 e1d3 h4g3 f2g3 a6f6 f1f6 e7f6 d1a4 f6e7 a1f1 d8f8 a4b3 b7b6 b3c2 f7f6 c2a4 h3g5 b2b3 g5f7 a4c6 f7d6 h1g2 f6f5 e4f5 d6f5
info depth 34 seldepth 45 score cp 38 upperbound nodes 483484489 nps 15464575 time 31264 multipv 1 pv f3e1 h5h4 e1d3 h4g3 f2g3 a6f6 f1f6 e7f6 d1a4 f6e7 a1f1 d8f8 a4b3 b7b6 b3c2 f7f6 c2a4 h3g5 b2b3 g5f7 a4c6 f7d6 h1g2 f6f5 e4f5 d6f5
info depth 34 seldepth 47 score cp 30 upperbound nodes 2112334132 nps 17255517 time 122415 multipv 1 pv f3e1 h5h4 d1a4 a6f6 e1d3 d8f8 a4c2 h4g3 f2g3 f6f1 a1f1 h7g8 b2b3 f7f6 a3a4 b7b6
info depth 34 seldepth 47 score cp 30 upperbound nodes 2112334132 nps 17255517 time 122415 multipv 1 pv f3e1 h5h4 d1a4 a6f6 e1d3 d8f8 a4c2 h4g3 f2g3 f6f1 a1f1 h7g8 b2b3 f7f6 a3a4 b7b6
info nodes 18235667001 time 969824
bestmove e2d3 ponder c8d7
Looking at the code, if we hit Signals.stop, we return from id_loop
before printing any PV. It is possible for us to have resorted the
RootMove list though, which will change the move that is actually
played.
No functional change.
1/ eval margin and gains removed:
16bit are now free on TT entries, due to the removal of eval margin. may be useful
in the future :) gains removed: use instead by Value(128). search() and qsearch()
are now consistent in this regard.
2/ futility_margin()
linear formula instead of complex (log(depth), movecount) formula.
3/ unify pre & post futility pruning
pre futility pruning used depth < 7 plies, while post futility pruning used
depth < 4 plies. Now it's always depth < 7.
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench 7243575
1/ eval margin and gains removed:
- gains removed by Value(128): search() and qsearch() now behave consistently!
2/ futility_margin()
- testing showed that there is no added value in this weird (log(depth), movecount)
formula, and a much simpler linear formula is just as good. In fact, it is most
likely better, as it is not yet optimally tuned.
- the new simplified formula also means we get rid of FutilityMargins[], its
initialization code, and more importantly ss->futilityMoveCount, and the hacky
code that updates it throughout the search().
- the current formula gives negative futility margins, and there is a hidden interaction
between the move coutn pruning formula and the futility margin one: what happens is
that MCP is supposed to be triggered before we use the non-sensical negative futility
margins.
3/ unify pre & post futility pruning
- pre futility pruning (what SF calls value based pruning) used depth < 7 plies,
while post futility pruning (what SF calls static null move pruning) used depth < 4 plies.
- also the condition depth < 7 in pre futility pruning was not obvious, and it seemd
to be depth < 16 (futility_margin() returns an infinite value when depth >= 7).
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench: 10206576
RedundantRook and RedundantQueen replaced by simple
variable RedundantMajor. Also the SameColor coefficient
for Queen<->Queen has been set by definition to 0.
The remaining 5 parameters:
LinearCoefficients[ROOK]
LinearCoefficients[QUEEN]
QuadraticCoefficientsSameColor[ROOK][ROOK]
QuadraticCoefficientsSameColor[QUEEN][ROOK]
RedundantMajor
are sufficient to equate the material imbalances for the
5 common material configurations of R, RR, Q, QR and QRR
to any desired values simultaneously.
With the chosen parameters there should be no functional
change unless one side has more than 2 rooks or more
than 1 queen. For example bench from the start position
using the commands:
./stockfish
go depth 16
produces identical output except for one extra node
in the last iteration.
bench: 8198094
Coefficients for Bishop<->BishopPair and Bishop<->Bishop
are also pretty much redundant. By altering the values
in LinearCoefficients[] these coefficients can be zeroed
without changing the imbalance calculations in any position
with less than 3 bishops for one side.
bench: 7995098
First coefficient in the SameColor array does an
equivalent job when folded into the LinearCoefficients
array.
All of the diagonal terms in the OppositeColor array
are redundant due to cancellation.
No functional change.
In case we find a very good move after a
troubled start, we don't return immediately
anymore.
Tested directly at long TC where it passed:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 13910 W: 2397 L: 2228 D: 9285
bench: 7995098
And remove a complex (and broken) formula.
Indeed previous code was broken in case of TC with big
time increments where available_time() was too similar
to total time yielding to many time losses, so for instance:
go wtime 2600 winc 2600
info nodes 4432770 time 2601 <-- time forfeit!
maximum search time = 2530 ms
available_time = 2300 ms
For a reference and further details see:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/dCPAvQDcm2E
Speed tested with bench disabling timer alltogheter vs timer set at
max resolution, showed we have no speed regressions both in single
core and when using all physical cores.
No functional change.
A combo of two patches that failed SPRT with score
higher than 50% but togheter they succeed:
SPRT at 60+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 7312 W: 1276 L: 1139 D: 4897
bench: 8029334
If the game got late enough that move_importance(currentPly) * slowMover / 100
rounds to 0, then we ended up dividing 0 by 0 when only looking 1 move ahead.
This apparently caused the search to almost immediately abort and Stockfish
would blunder in long games. So convert thisMoveImportance to a double.
No functional change.
This seems more a material imbalance topic,
anyhow test is good and so patch is applied
as is.
Passed both short TC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 17391 W: 3548 L: 3393 D: 10450
And long TC:
LLR: 3.00 (-2.94,2.94) [0.00,6.00]
Total: 34660 W: 5972 L: 5700 D: 22988
bench: 8291883
Dumb down a bit the code and trade some possible
speed (but this is far from hot path anyhow) for
some added readability for the layman.
No functional change.
The normalising transformation is computed all at
once by the helper function get_flip_sq and then
applied immediately to the relevant squares as soon
as they are loaded from the position class.
bench: 8350690
New formula mathces the old formula until d = 45
Test code:
int main() {
for(int d=1; d<=45; d++)
{
int a = int(log(double(d * d) / 2) / log(2.0) + 1.001);
int b = int(2.9 * log(double(d)));
if (a != b) std::cout << d << std::endl;
}
return 0;
}
bench: 8455956
This is the first chain bonus version
from Ralph that also passed both
Short TC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 23460 W: 4727 L: 4556 D: 14177
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 31858 W: 5497 L: 5240 D: 21121
And performed better against current
committed version, always at 60secs:
LLR: -2.94 (-2.94,2.94) [-3.00,3.00]
Total: 26301 W: 4477 L: 4580 D: 17244
This test was done by Leonid.
bench: 8455956
After 40K games at 60 secs, result is still
not clear, but not a regression against SF 4
After
ELO: 50.11 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 10547 L: 4817 D: 24636
Before
ELO: 49.51 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 10483 L: 4821 D: 24696
So re-apply the patch to avoid to
special-case this one.
bench: 7403882
It was never updated !
Currently it only affects evaluate_passed_pawns()
and in particularly the rule to increase the bonus
if we have more non-pawn pieces. We could simply use
popcount() instead and avoid the little slowdown
in put_piece() and remove_piece(), but this would
leave a very subtle and tricky hole where people
are forced to remember that pos.count<ALL_PIECES>()
does not work. This is not obvious and so dangerous.
Thanks to Ronald de Man for spotting this.
bench: 7931424
Due to a strange issue (bug?) the ternary
operator does not return a BitCountType for
icc, so revert to the expression.
The same patch was already applied in
9749f1f14c
Thanks to NssY Wanyonyi for pointing out
this.
No functional change.
This reverts commit 4bc2374450 for
two reasons.
First regression testing shows almost equal
score:
Before the patch:
ELO: 49.75 +-2.5 (95%) LOS: 100.0%
Total: 27205 W: 7113 L: 3244 D: 16848
After the patch:
ELO: 48.87 +-2.9 (95%) LOS: 100.0%
Total: 20860 W: 5478 L: 2563 D: 12819
Second, and more sensible to me, this patch
increases safe check bonuses to 4 times their
original value (!) and considering:
- Values were already well tuned
- Values are highly critical
- King safety is highly critical, very TC
dependent and very difficult to test
- Our testing coverage is partial (self-testing,
blitz times)
I think is better to be safe than sorry and so
I revert the patch.
bench: 8440524
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10466 W: 2087 L: 1953 D: 6426
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 26334 W: 4540 L: 4310 D: 17484
And also proved stronger than a slightly
different patch, also succesful against master:
https://github.com/mcostalba/Stockfish/commit/dc6830a3b4ed12
But losing against current one in a match
at 60secs with SPRT [-3, 3]:
LLR: -2.96 (-2.94,2.94) [-3.00,3.00]
Total: 44484 W: 7360 L: 7463 D: 29661
bench: 9160831
Also the drawing criteria has been slightly loosened.
It now detects a draw if the king is ahead of all the
pawns and on the same file or the adjacent file.
bench: 7700683
This lost position 8/8/3q4/8/5k2/2P1R3/2K2P2/8 w - - 0 1
was previously evaluated as a draw.
The king and rook need to be correctly placed with
respect to the _same_ pawn.
(Note also that the check for the pawn being on RANK_2
in the old version is redundant: it must be on RANK_2 if
it hopes to protect a rook on RANK_3)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Good both at short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 5448 W: 1133 L: 1012 D: 3303
And at long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 40509 W: 6836 L: 6541 D: 27132
bench: 7700683
Don't need a struct here. Speed test shows
result is teh same. Moreover RKISS is used
mainly at startup to compute magics, so
prefer to keep it simple...RKISS ;-)
Also some assorted triviality while there.
No functional change.
These two changes go in opposite directions and it
seems that the combination is stronger than original.
Here are the positive tests at various TC:
15+0.05
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 24561 W: 4946 L: 4772 D: 14843
60+0.05
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 15259 W: 2598 L: 2423 D: 10238
40/30
LLR: 2.96 (-2.94,2.94) [-3.00,3.00]
Total: 2570 W: 527 L: 422 D: 1621
Unfortunately there is also a bad result
with one sec time increment that needs
to be further investigated:
12+1
LLR: -2.97 (-2.94,2.94) [-3.00,3.00]
Total: 2694 W: 438 L: 543 D: 1713
bench: 8340585
Rationale:
- Speed of double and float is about the same (not on the hot path anyway)
- Double makes code prettier (no need to write 1.0f, just 1.0)
- Only practical advantage of float is to use less memory, but since we never
store large arrays of double, we don't care.
No functional change.
Increase bench default depth from 12 to 13 and
add 15 new endgame positions to have broader
coverage and also more reliable nps calulcation
used for fishtest framework.
Due to the new endgame positions, where nps is higher,
the total nps is increased of about 15%.
Thanks to Lucas and Jörg for the suggestions.
No functional change, but bench number is now:
bench: 8336338
For some users -stack_size,0x4000 does not work,
so revert for now.
osX 10.6.8
gcc version 4.7.3 (MacPorts gcc47 4.7.3_2)
g++: error: unrecognized command line option '-stack_size,0x4000'
make[2]: *** [stockfish] Error 1
make[1]: *** [gcc-profile-make] Error 2
make: *** [profile-build] Error 2
No functional change.
This reverts commit 800410eef1 and instead increases
stack size.
I went through the old emails with Daylen that reported the
crash issue on Mac OS X and was fixed by 0049d3f337.
It was reported default stack size for a thread in Mac OS X is 8
megabytes while the patch that we are reverting allows to reduce
stack size at max of about 217KB, so the reason for the crash was
only marginal in MAX_MOVES value. On those emails Daylen also
hinted how to increase stack size for Mac OS X to 16MB.
So prefer to increase stack size to 16MB instad of re-inventing
the wheel and do our home grown stack as we did with the patch
that we are now reverting (it will remain anyhow in git history
for documentation purposes).
No functional change.
Unify extensions between PV and not PV nodes
and remove all but check extensions.
This is a simplification so tested at fixed number
of games where proved to not regress.
About 45k games at 15+0.05
ELO: 1.23 +-2.0 (95%) LOS: 88.5%
Total: 45643 W: 9107 L: 8946 D: 27590
About 45k games at 60+0.05
ELO: 1.07 +-1.8 (95%) LOS: 87.8%
Total: 46786 W: 7728 L: 7584 D: 31474
bench: 3172206
If the uci option 'Best Book Move' is set to true the lookup still
returns a move at random instead of the move with the highest
weight.
No functional change.
This should be enough for any legal position, even
the handcrafted ones, like the one presented by Reuven:
1Q5R/4Q1K1/B1Q5/B4Q2/N2Q4/pQ4Q1/pn2Q3/krQ4R w - -
Where currently we crash. This reverts the patch
0049d3f337 of 8/4/2012 where stack
was shrinked due to crashes while in deep analysys.
No functional change.
This greately reduces stack usage and is a
prerequisite for next patch.
Verified with 40K games both in single and SMP
case that there are no regressions.
No functional change.
This is an even safer setup proposed and tested
by Alexandre Meirelles.
Regression testing of 40K games at 10+0.05 show
result is stable both against current master:
ELO: -0.29 +-2.2 (95%) LOS: 39.7%
Total: 40000 W: 8010 L: 8043 D: 23947
and again original master (the one with smallest
time parameters):
ELO: 1.71 +-2.2 (95%) LOS: 93.8%
Total: 40000 W: 8325 L: 8128 D: 23547
Alexandre verified with LittleBlitzer time losses are
greately reduced with this setup:
Games Completed = 2100 of 3000 (Avg game length = 35.745 sec)
Settings = RR/128MB/15000ms+50ms/M 1000cp for 12 moves, D 150 moves/
Time = 39200 sec elapsed, 16800 sec remaining
1. Stockfish 190913 1091.5/2100 803-720-577 (L: m=313 t=1 i=0 a=406) (D: r=278 i=91 f=136 s=8 a=64) (tpm=212.5 d=14.75 nps=925427)
2. Houdini 2.0 w32 1008.5/2100 720-803-577 (L: m=250 t=299 i=0 a=254) (D: r=278 i=91 f=136 s=8 a=64) (tpm=204.1 d=12.04 nps=1326351)
No functional change.
Goes in the direction of avoiding time losses and seems
equivalent after almost 40K games at super fast TC of 10+0.05
ELO: 2.61 +-2.2 (95%) LOS: 99.1%
Total: 39869 W: 8258 L: 7959 D: 23652
No functional change.
Goes in the direction of avoiding time losses and seems
equivalent after almost 40K games at super fast TC of 10+0.05
ELO: 2.41 +-2.3 (95%) LOS: 98.1%
Total: 37222 W: 7843 L: 7585 D: 21794
No functional change.
The ideal setting for super-blitz might be something like:
"Emergency Base Time" = 50
"Emergency Move Time" = 5
This would give a total emergency time buffer of:
50 + 40 * 5 = 250 ms
This setup replaces the previous half cooked hack
"Don't blunder under extreme time pressure".
Test results are very good at super blitz, but keep good even
at 60 secs.
At 5+0.05
ELO: 24.30 +-2.4 (95%) LOS: 100.0%
Total: 37802 W: 10060 L: 7420 D: 20322
At 15+0.05
ELO: 13.41 +-2.9 (95%) LOS: 100.0%
Total: 22271 W: 4853 L: 3994 D: 13424
At 60+0.05
ELO: 5.30 +-3.2 (95%) LOS: 99.9%
Total: 16000 W: 2897 L: 2653 D: 10450
No functional change.
Instead of current code, give a bonus according to the frontmost
square among candidate + passed pawns.
This is a big simplification that removes a lot of accurate code
substituting it with a statistically based one using the common
'bonus' scheme, leaving to the search to sort out the details.
Results are equivalent but code is much less and, as an added bonus,
we now store candidates bitboard in pawns hash and allow this
info to be used in evaluation. This paves the way to possible
candidate pawns evaluations together with all the other pieces,
as we do for passed.
Patch passed short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16927 W: 3462 L: 3308 D: 10157
Then failed (quite quickly) at long TC
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 8451 W: 1386 L: 1448 D: 5617
But when ran with a conclusive 40K fixed games at 60 secs it proved
almost equivalent to original one.
ELO: 1.08 +-2.0 (95%) LOS: 85.8%
Total: 40000 W: 6739 L: 6615 D: 26646
bench: 3884003
Now that we use pre-increment on enums, it
make sense, for code style uniformity, to
swith to pre-increment also for native types,
although there is no speed difference.
No functional change.
ENABLE_OPERATORS_ON has incorrect definitions of
post-increment and post-decrement operators.
In particularly the returned value is the variable
already incremented/decremented, while instead they
should return the variable _before_ inc/dec.
This has no real effect because are only used in loops
and where the returned value is never used, neverthless
it is wrong. The fix would be to copy the variable to a
dummy, then inc/dec the variable, then return the dummy.
So instead, rename to pre-increment that can be implemented
without the dummy, actually the current implementation
it is already the correct pre-increment, with the only change
to return a reference (an l-value) and not a copy, so
to properly mimic the pre-increment on native integers.
Spotted by Kojirion.
No functional change.
We always attempt to keep at least this emergencyBaseTime
at clock. But if available time is very low it means that
we will force ourself to play immediately to satisfy the
emergencyBaseTime constrain and so leading to blunders.
Patch is good at short and very short TC (15secs and 5secs respectively)
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 26590 W: 5426 L: 5245 D: 15919
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5767 W: 1397 L: 1268 D: 3102
Instead seems has no influence at longer TC (60 secs)
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 79862 W: 13623 L: 13339 D: 52900
So it is committed to have a broader testing but is
to be consider still EXPERIMENTAL and can be reverted
easily.
No functional change.
In case we have less then 10ms to think as soon as
we wake up the timer, it immediately fires and calls
check_time() where due to condition:
elapsed > TimeMgr.maximum_time() - 2 * TimerResolution
the stop flag is set and search returns immediately, without
actually search anything.
Here the somewhat hacky fix is to start the timer after
at least one iteration as been completed.
No functional change.
The possible maximum mobility cardinality (plus one in case of
zero squares available) is:
- Knights: max. 8 squares -> max. 9 entries
- Bishops: max. 13 squares -> max. 14 entries
- Rooks: max. 14 squares -> max. 15 entries
- Queen: max. 27 squares -> max. 28 entries
So remove the extra entries in the table.
Spotted by Dariusz Orzechowski.
No functional change.
Union of
- LMR >= 3 plies from Gary tests.stockfishchess.org/tests/view/522522960ebc595d328fcafd
- allows() tweak from Reuven tests.stockfishchess.org/tests/view/5225fa1c0ebc595d328fcb53
Both passed Step I and failed Step II.
Instead this union passed both short TC:
LLR: 2.95 (-2.94,2.94)
Total: 14525 W: 3063 L: 2874 D: 8588
And long TC
LLR: 2.94 (-2.94,2.94)
Total: 31075 W: 5566 L: 5308 D: 20201
bench: 4238160
Idea is sound but implementation is partial. Ryan and Joona noticed that
we leave an hole in material table. Also we got another report by an user
of an odd behaviour. Namely, if you start stockfish and from the prompt
give 'bench' you get 3453941, then if you run again bench you get 3453940.
The reason is that two different positions with the same number of pieces,
but one with a bishop pair and another without have the same material key.
But after Eelco patch also different material imbalance and this yields
to this issue.
Restesting at long TC shows the patch does not really contribute at
ELO improvement. Actually patch failed at long TC.
LLR: -2.97 (-2.94,2.94)
Total: 23109 W: 4104 L: 4092 D: 14913
So revert.
bench: 3453945
Prefer pos.bishop_pair() to pos.count<BISHOP>(WHITE) > 1
because the first checks that the two bishops are on
different color squares.
Although the change seems to kick in only in very rare cases,
quite surprisingly it was able to pass SPRT test at short TC.
LLR: 2.95 (-2.94,2.94)
Total: 39818 W: 8174 L: 7956 D: 23688
bench: 3453941
This was thought to be a draw but the bishops generally win. However,
it takes up to 66 moves. The position in the diagram was thought to be
a draw for over one hundred years, but tablebases show that White wins
in 45 moves. All of the long wins go through this type of semi-fortress
position. It takes several moves to force Black out of the temporary
fortress in the corner; then precise play with the bishops prevents Black
from forming the temporary fortress in another corner (Nunn 1995:265ff).
Before computer analysis, Speelman listed this position as unresolved,
but "probably a draw" (Speelman 1981:109).
bench: 3453945
STANDALONE-TOOLCHAIN.html in Android NDK says:
It is recommended to use the -mthumb compiler flag to force the generation
of 16-bit Thumb-1 instructions (the default being 32-bit ARM ones).
If you want to target the 'armeabi-v7a' ABI, you will need ensure that the
following two flags are being used:
CFLAGS='-march=armv7-a -mfloat-abi=softfp'
Note: The first flag enables Thumb-2 instructions, and the second one
enables H/W FPU instructions while ensuring that floating-point
parameters are passed in core registers, which is critical for
ABI compatibility. Do *not* use these flags separately!
Thanks to Peter Osterlund for pointout this doc and for showing me
an example Makefile to follow.
No functional change.
Becuase castle is coded as "king captures the rook"
the to_sq(move), A1/8 or H1/8 is empty after the move,
leading to assert assert(p != NO_PIECE) in color_of().
Teach allows() asserts about castle and fix the crash.
Bug reported by Ryan Takker and tracked down by Tom Vijlbrief.
No functional change.
If our rook is behind a passed pawn, all
squares are defended.
One of the longest tests to pass !
Passed both short TC
LLR: 2.97 (-2.94,2.94)
Total: 44560 W: 9518 L: 9281 D: 25761
And long TC
LLR: 2.96 (-2.94,2.94)
Total: 61348 W: 11618 L: 11192 D: 38538
bench: 3787694
With
position fen 7k/8/8/8/8/7P/6K1/7B w - - 0 1
go depth 25
The evaluation at depth 22 is not draw as it should be. The reason is that
when search reaches the position 8/6kP/8/8/8/3B4/6K1/8 w - - 0 1 if white plays
h8R or h8N then we get a position that is a "KNOWN_WIN" and is _not_ a check, so
futility pruning in qsearch kicks in and black may think that it is "futile"
to reply Kxh8 since, according to the logic of the code, it cannot raise the score
back towards a draw.
bench: 4728533
The case of two lone kings on the board is already considered
by the "No pawns" scaling factor rules in material.cpp as is
KBK and KNK.
Moreover we had a small leak in endgames map because for
KK endgame it happens white and black material keys are the
same (both equal to zero), so when adding the black endgame in
Endgames::add() we were overwriting the already exsisting
white one, leading to a memory leak found by Valgrind.
So remove the endgames althogheter and rely on scaling
to correctly set the endgames value to a draw.
No functional change.
- COMPILER=clang++ V='Apple LLVM 6.0'# Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
- COMP=clang
branches:
only:
- master
before_script:
- cd src
script:
- make clean && make build ARCH=x86-64 && ./stockfish bench 2>&1 >/dev/null | grep 'Nodes searched' | tee bench1
- make clean && make build ARCH=x86-32 && ./stockfish bench 2>&1 >/dev/null | grep 'Nodes searched' | tee bench2
- echo "Checking for same bench numbers..."
- diff bench1 bench2 > result
- test ! -s result
# if valgrind is available check the build is without error, reduce depth to speedup testing, but not too shallow to catch more cases.
- if [ -x "$(command -v valgrind )" ] ; then make clean && make ARCH=x86-64 debug=yes build && valgrind --error-exitcode=42 ./stockfish bench 128 1 10 default depth 1>/dev/null ; fi
# use g++-6 as a proxy for having sanitizers ... might need revision as they become available for more recent versions of clang/gcc than trusty provides
- if [[ "$COMPILER" == "g++-6" ]]; then make clean && make ARCH=x86-64 sanitize=yes build && ! ./stockfish bench 2>&1 | grep "runtime error:" ; fi
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.