In case of a very high material score, we can
overflow VALUE_INFINITE.
This patch fixes an assert with:
position fen 7k/QQQQR3/2B5/4KN1Q/3QQ3/8/8/4R3 b - - 0 1
go depth 1
No functional change.
Resolves#546
Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.
No functional change.
Resolves#537
Tuned the global mobility factor for each piece, as well as some +- delta,
The master mobility factor was {266,334} and tuning gave
{267, 362} +S(-2,-2) for the Knight
{249, 328} +S( 0,-2) for the Bishop
{298, 353} +S(1,1) for the Rook
{265, 358} +S(2,-1) for the Queen
Passed STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 49402 W: 9367 L: 9037 D: 30998
and LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 26831 W: 3871 L: 3658 D: 19302
Bench: 8355485
Resolves#536
Fix a bug where we could stop the search after only 10% of time used due to a matching easy move but later switch to a different move that was never pre-screened as easy due to SMP thread select.
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 27227 W: 4910 L: 4800 D: 17517
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40368 W: 5826 L: 5733 D: 28809
Resolves#521
Simplify time management code by removing hard stops for unchanging first root moves.
Search is now stopped earlier at the end iteration if it did not have fail-lows at root.
This simplification also fixes pondering bug. Ponder flag was true by default
and cutechess-cli doesn't change it to false even though no pondering is possible.
Fix the issue by setting the default value of 'Ponder' flag to false.
10+0.1:
ELO: 3.51 +-3.0 (95%) LOS: 99.0%
Total: 20000 W: 3898 L: 3696 D: 12406
40+0.4:
ELO: 1.39 +-2.7 (95%) LOS: 84.7%
Total: 20000 W: 3104 L: 3024 D: 13872
60+0.06:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 37231 W: 5333 L: 5236 D: 26662
Stopped run at 100+1:
LLR: 1.09 (-2.94,2.94) [-3.00,1.00]
Total: 37253 W: 4862 L: 4856 D: 27535
Resolves#523Fixes#510
Also inline defintions of SpaceMask and CenterBindMask.
Verified from assembly that compiler computes the values
at compile time, so it is also theoretical faster.
While there factor out scale factor evaluation.
No functional change.
This is used by std::stable_sort() to sort moves from highest score to lowest score.
1) The comment is incorrect since highest to lowest means descending.
2) It's more natural to implement a less operator using another less operator rather than a greater operator.
No functional change.
Resolves#504
Comment is based on a misunderstanding of what unaligned memory access is. Here
is an article that explains it very clearly:
https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt
No matter how we define TTEntry or TTCluster, there will never be any unaligned
memory access. This is because the complier knows the alignment rules, and does
the necessary adjustments to make sure unaligned memory access does not occur.
The issue being adressed here has nothing to do with unaligned memory access. It
is about cache performance. In order to achieve best cache performance:
- we prefetch the cacheline as soon as possible.
- we ensure that TT clusters do not spread across two cachelines. If they did,
we would need to prefetch 2 cachelines, which could hurt cache performance.
Therefore the true conditions to achieve this are:
1/ start adress of TT is cache line aligned. void TranspositionTable::resize()
enforces this.
2/ TT cluster size should *divide* the cache line size. Currently, we pack 2
clusters per cache lines. It used to be 1 before "TT sardines". Does not matter
what the ratio is, all we want is to fit an integer number of clusters per cache
line.
No functional change.
Resolves#506
Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().
In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.
After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.
Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965
No functional change.
Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.
Also assorted reformatting while there.
To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103
No functional change.
When we reach the maximum depth, we can finish the
search without a raise of Signals.stop. However, if
we are pondering or in an infinite search, the UCI
protocol states that we shouldn't print the best move
before the GUI sends a "stop" or "ponderhit" command.
It was broken by lazy smp. Fix it by moving the stopping
of the threads after waiting for GUI.
No functional change.
Indeed, if we use a depth >= DEPTH_MAX, we start having negative depth in the
TT (due to int8_t cast).
No functional change in single thread mode
Resolves#490
Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.
Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."
So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.
Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629
TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083
And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646
And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555
bench: 8639247
The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).
No functional change.
Reduce the variation in Root Depth between different threads. This
prevents threads from searching at a depth much higher than Main Thread.
Performed well at STC 24 Threads:
ELO: 3.44 +-3.8 (95%) LOS: 96.1%
Total: 10000 W: 1627 L: 1528 D: 6845
And LTC 24 Threads
LLR: 1.43 (-2.94,2.94) [0.00,4.00]
Total: 3804 W: 500 L: 420 D: 2884
ELO : +7.31
p-value: 73.16%
Passed no regression at STC 3 Threads:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40457 W: 7148 L: 7060 D: 26249
And LTC 3 Threads:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 17704 W: 2489 L: 2364 D: 12851
Raising a pull request early as 24 Thread tests are very expensive and
this is clearly a positive gain at high thread counts and high time
controls. The change is a small parameter tweak with no additional
logic.
No functional change for single thread mode.
Resolves#481
Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.
v1: Sequentially consistent atomics
No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:
threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041
threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728
v2: Acquire/Release semantics
This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:
threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542
threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468
As suggested by Joona, another test at LTC:
threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931
v3: Final version: SeqCst/Relaxed
threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609
Resolves#474
Using less parameters and code to compute Threats
Includes also a few spacing edits.
Run as a simplification.
Passed STC 10+0.1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 18879 W: 3725 L: 3600 D: 11554
Passed LTC 60+0.4
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74116 W: 11001 L: 10958 D: 52157
bench: 8004751
Fishtest is a key factor of SF success.
Thanks to Fishtest we have not only greately
improved ELO but, even more important, we
have enabled a kind of joint development and
testing that it is the herat of on open
source project like SF.
Open sourcing is not just about open code, it is
about commuity development. In case of a chess engine
this has never been possible before due to missing
a strong and strict testing environment that allows
many people to contribute in a safe and coordinate way.
Fishtest is a new way of developing chess engines,
something that has never exsisted before.
No functional change.
Collect and give a second try to some almost passed tuning attempts and
one-line tweaks from the last month.
Passed STC
LLR: 3.07 (-2.94,2.94) [0.00,4.00]
Total: 15124 W: 2974 L: 2756 D: 9394
And LTC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 21577 W: 3507 L: 3289 D: 14781
Bench: 8855226
Resolves#464
Start all threads searching on root position and
use only the shared TT table as synching scheme.
It seems this scheme scales better than YBWC for
high number of threads.
Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194
Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480
Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407
Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036
Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520
Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760
Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).
Bench: 8116244
Apply bonus for the prior CMH that caused a fail low.
Balance Stats: CMH and History bonuses are updated differently.
This eliminates the "fudge" factor weight when scoring moves. Also
eliminated discontinuity in the gravity history stat formula. (i.e. stat
scores will no longer inverse when depth exceeds 22)
STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 21802 W: 4107 L: 3887 D: 13808
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 46036 W: 7046 L: 6756 D: 32234
Bench: 7677367
Add Travis CI support to GitHub repo.
After every push to master, Travis will build
the sources directly from GitHub repo according
to .travis.yml and verify everything is ok.
No functional change.
Fix issues after a run of PVS-STUDIO analyzer.
Mainly false positives but warnings are anyhow
useful to point out not very readable code.
Noteworthy is the memset() one, where PVS prefers ss-2
instead of stack. This is because memeset() could
be optimized away by the compiler when using 'stack',
due to stack being a local variable no more used after
memset. This should normally not happen, but when
it happens it leads to very sublte and difficult
to find bug, so better to be safe than sorry.
No functional change.
When changing 'search' and 'splitPointsSize' we have to
use thread locks, not split point ones, because can_join()
is called under the formers.
Verified succesfully with 24 hours toruture tests with 20
cores machine by Louis Zulli: it does not hangs.
Verifyed for no regressions with STC, 7 threads:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 52804 W: 8159 L: 8087 D: 36558
No functional change.
Only refresh TT entry when it's really necessary.
This should give a small speed boost for some machines.
And it's a risk-free change.
No functional change.
Resolves#429
Louis Zulli reported that Stockfish suffers from very occasional hangs with his 20 cores machine.
Careful SMP debugging revealed that this was caused by "a ghost split point slave", where thread
was marked as a split point slave, but wasn't actually working on it.
The only logical explanation for this was double booking, where due to SMP race, the same thread
is booked for two different split points simultaneously.
Due to very intermittent nature of the problem, we can't say exactly how this happens.
The current handling of Thread specific variables is risky though. Volatile variables are in some
cases changed without spinlock being hold. In this case standard doesn't give us any kind of
guarantees about how the updated values are propagated to other threads.
We resolve the situation by enforcing very strict locking rules:
- Values for key thread variables (splitPointsSize, activeSplitPoint, searching)
can only be changed when the thread specific spinlock is held.
- Structural changes for splitPoints[] are only allowed when the thread specific spinlock is held.
- Thread booking decisions (per split point) can only be done when the thread specific spinlock is held.
With these changes hangs didn't occur anymore during 2 days torture testing on Zulli's machine.
We probably have a slight performance penalty in SMP mode due to more locking.
STC (7 threads):
ELO: -1.00 +-2.2 (95%) LOS: 18.4%
Total: 30000 W: 4538 L: 4624 D: 20838
However stability is worth more than 1-2 ELO points in this case.
No functional change
Resolves#422
v = value without ep capture being considered
v1 = value of the ep capture
The correct logic is:
if without e.p. capture we are losing, and the value of e.p is either draw, or win or "loss, but 50 move rule saves us", then we should use the value of ep capture.
Credit and thanks to syzygy1 and lantonov !
No functional change (except with syzygy bases)
Resolves#415Resolves#394
Instead of using hard coded Min and Max values for history,
always adjust the old value slightly downwards before adding a new value.
The adjustment acts like gravity that prevents the value escaping too
far from zero.
Bench: 8020484
Resolves#407
Apart from usual renaiming, take advantage of
C++11 function template default parmeter to
get rid of Eval trampoline functions.
Some triviality fixes while there.
No functional change.
Align the behaviour with reductions. Initially castling moves had to be
treated differently, because the SEE did not handle them correctly. But now it
does.
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 83750 W: 15722 L: 15711 D: 52317
LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 97183 W: 15120 L: 15115 D: 66948
bench 7759837
Resolves#403
PawnSafePush, with the value S(5,5) proved not "necessary"
possibly due to recent changes to MobilityArea and other changes to Connected bonus.
STC:
LLR: 3.22 (-2.94,2.94) [-3.00,1.00]
Total: 98528 W: 18757 L: 18759 D: 61012
LTC:
LLR: 5.30 (-2.94,2.94) [-3.00,1.00]
Total: 204194 W: 31698 L: 31734 D: 140762
Bench: 7620871
Resolves#396
Use Position::square and Position::squares instead.
This allow us to remove king_square(), simplify
endgames and to have more naming uniformity.
Moreover, this is a prerequisite step in case
in the future we decide to retire piece lists
altoghter and use pop_lsb() to loop across
pieces and serialize the moves. In this way
we just need to change definition of Position::square
to something like:
template<PieceType Pt> inline
Square Position::square(Color c) const {
return lsb(byColorBB[c]);
}
No functional change.
Based off of Pull request #383:
Include squares occupied by some pawns in the MobilityArea
a) not blocked
b) on rank 4 and above
c) or captures
Passed STC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8157 W: 1644 L: 1516 D: 4997
And LTC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 26086 W: 4274 L: 4051 D: 17761
-----------
Then, a simplification test failed, trying to remove b and c)
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 6048 W: 1117 L: 1288 D: 3643
Another simplification test, was run to remove just (c)
Passed STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 28073 W: 5364 L: 5255 D: 17454
And LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 34652 W: 5448 L: 5348 D: 23856
A parameter tweak test showed that changing b) for "on rank 3 and above"
does not work
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 5233 W: 937 L: 1077 D: 3219
Finally, a small rewrite, and we have this version
Include squares occupied by some pawns in the MobilityArea which are
a) not blocked
b) on rank 4 and above
Bench: 8977899
Resolves#385
Under Windows with MSVC we use the IDE to compile,
in this case we infer some compiler flags usually
set by Makefile.
The condition to check this was wrong, namely when compiling
with mingw under Windows 64 bit we always set IS_64BIT and
USE_BSFQ even if compiled with ARCH=x86-32 (this is how I
found it).
Small code style touches while there.
No functional change.
Error is:
a value of type "int" cannot be assigned to an entity of type "Value"
Also retire the now unused squares_of_color() function.
No functional change.
In Chess960 we can have legal positions with
opponent rook in A or H file and with castling
available, for instance:
4k3/pppppppp/8/8/8/8/PPPPPPPP/rR2K3 w Q - 0 1
In those cases we pick up the wrong rook when
setting castling.
Fix it by checking the color of the rook.
Bug reported by Matthew Lai.
No functional change.
Instead of crafting a clever formula to calculate the array offset, simply use a
3 dimensional array. Remove the comment while at it, because now the code is
self-documenting.
No functional change.
Resolves#344
Introduce helper function Search::reset() which clears all kind of search
memory, in order to restore a deterministic search state.
Generalize TT.clear() into Search::reset() for the following use cases:
- bench: needed to guarantee deterministic bench (ie. if you call bench from
interactive command line twice in a row you get the same value).
- Clear Hash: restore clean search state, which is the purpose of this button.
- ucinewgame: ditto.
No functional change.
Resolves#346
Based on an idea and patch by VoyagerOne.
Small simplification, but was tedted for an ELO gain anyway.
STC:
LLR: 2.95 (-2.94,2.94) [-1.00,4.00]
Total: 5375 W: 1119 L: 977 D: 3279
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 17893 W: 2984 L: 2792 D: 12117
bench 8322847
Use symmetry along vertical middle axis of the board
to reduce the number of parameters.
For instance psqt value of SQ_A5 == SQ_A4 and value of
SQ_F8 == SQ_F1.
This is always true, at least until now nobody came in
with an asymmetric psqt table that worked.
Original patch by Lucas.
No functional change.
Easier for tuning psq tables:
TUNE(myParameters, PSQT::init);
Also move PSQT code in a new *.cpp file, and retire the
old and hacky psqtab.h that required to be included only
once to work correctly, this is not idiomatic for a header
file.
Give wide visibility to psq tables (previously visible only
in position.cpp), this will easy the use of psq tables outside
Position, for instance in move ordering.
Finally trivial code style fixes of the latest patches.
Original patch of Lucas Braesch.
No functional change.
In ei.attackedBy, Queen does not x-ray through Rook, but the Rook does
X-ray through the Queen.
So most of the rook contact checks supported by queen are, in fact,
Queen Contact Checks and they are already scored separately.
Bench: 7762189
Resolves#338
Currently Zobrist::castling[] are not properly zeroed
and rely on the compiler to do this at startup, but this
makes Position::init() to set different values every time
it is called!
This is a bit odd, and although not impacting normal usage,
can yield to subtle misbehaviour, very difficult to track
down, in case we happen to call it more than once for some
reason. I found this while developing tuning support and
it took me a while to track it down.
So properly init Zobrist::castling[]
No functional change.
Resolves#329
When running more games in parallel, or simply when running a game
with a background process, due to how OS scheduling works, there is no
guarantee that the CPU resources allocated evenly between the two
players. This introduces noise in the result that leads to unreliable
result and in the worst cases can even invalidate the result. For
instance in SF test framework we avoid running from clouds virtual
machines because are a known source of very unstable CPU speed.
To overcome this issue, without requiring changes to the GUI, the idea
is to use searched nodes instead of time, and to convert time to
available nodes upfront, at the beginning of the game.
When nodestime UCI option is set at a given nodes per milliseconds
(npmsec), at the beginning of the game (and only once), the engine
reads the available time to think, sent by the GUI with 'go wtime x'
UCI command. Then it translates time in available nodes (nodes =
npmsec * x), then feeds available nodes instead of time to the time
management logic and starts the search. During the search the engine
checks the searched nodes against the available ones in such a way
that all the time management logic still fully applies, and the game
mimics a real one played on real time. When the search finishes,
before returning best move, the total available nodes are updated,
subtracting the real searched nodes. After the first move, the time
information sent by the GUI is ignored, and the engine fully relies on
the updated total available nodes to feed time management.
To avoid time losses, the speed of the engine (npms) must be set to a
value lower than real speed so that if the real TC is for instance 30
secs, and npms is half of the real speed, the game will last on
average 15 secs, so much less than the TC limit, providing for a
safety 'time buffer'.
There are 2 main limitations with this mode.
1. Engine speed should be the same for both players, and this limits
the approach to mainly parameter tuning patches.
2. Because npms is fixed while, in real engines, the speed increases
toward endgame, this introduces an artifact that is equivalent to an
altered time management. Namely it is like the time management gives
less available time than what should be in standard case.
May be the second limitation could be mitigated in a future with a
smarter 'dynamic npms' approach.
Tests shows that the standard deviation of the results with 'nodestime'
is lower than in standard TC, as is expected because now all the introduced
noise due the random speed variability of the engines during the game is
fully removed.
Original NIT idea by Michael Hoffman that shows how to play in NIT mode
without requiring changes to the GUI. This implementation goes a bit
further, the key difference is that we read TC from GUI only once upfront
instead of re-reading after every move as in Michael's implementation.
No functional change.
And reformat a bit time manager code.
Note that now we set starting search time in think() and
no more in ThreadPool::start_thinking(), the added delay
is less than 1 msec, so below timer resolution (5msec) and
should not affect time lossses ratio.
No functional change.
Code like this is more a case of showing off one's C++ knowledge, rather than
using it adequately, IMHO.
**First loop (std::generate)**
Iterators are inadequate here, because they lose the key information which is
idx. As a result, we need to carry a redundant idx variable, and increment it
along the way. Very clumsy.
Usage of std::generate and a lambda function only obfuscate the code, which is
merely a simple and stupid loop over the elements of a vector.
**Second loop (std::accumulate)**
This code is thoroughlly incomprehensible. Restore the original, which was much
simpler to understand.
**Third loop (range based loop)**
Again, a range based loop is inadequate, because we lose idx! To resolve this
artificially created problem, the data model was made redundant (idx is a data
member of db[] elements!?), which is ugly and unjustified. A simple and stupid
for loop with idx does the job much better.
No functional change.
Resolves#313
Small speed-up in pawn.cpp
Results for 10 tests for each version:
Base Test Diff
Mean 1435636 1445238 -9602
StDev 22576 23189 1848
p-value: 1
speedup: 0.007
No functional change
Resolves#295
If the sum of CounterMoveHistory heuristic and History heuristic is below zero,
then reduce an extra ply in cut nodes
LTC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 6479 W: 1099 L: 967 D: 4413
Bench: 7773299
Resolves#315
Currently if we call it more than once, we crash.
This is not a real problem, because this function is
indeed called just once. Nevertheless with this small fix,
that gets rid of a hidden 'static' variable, we cleanly
resolve the issue.
While there, fix also ThreadPool::exit to return in a
consistent state. Now all the init() functions but
UCI::init() are reentrant and can be called multiple
times.
No functional change.
Profiling shows that resetting attacks table after
a failed candidate magic attempt is the biggest
time consumer, so rewrite the logic avoiding the
memset()
Magics init for rook+bishop goes from 200msecs to
under 100msec.
No functional change.
To sync UI with main thread it is enough a single
condition variable because here we have a single
producer / single consumer design pattern.
Two condition variables are strictly needed just for
many producers / many consumers case.
Note that this is possible because now we don't send to
sleep idle threads anymore while searching, so that now
only UI can wake up the main thread and we can use the
same ConditionVariable for both threads.
The natural consequence is to retire wait_for_think_finished()
and move all the logic under MainThread class, yielding the
rename of teh function to join()
No functional change.
Now that we use spinlocks everywhere and don't put
threads to sleep while idle, we can use the slower
(but no more in hot path) std::condition_variable_any
instead of our homwgrown ConditionVariable struct.
Verified fo rno regression at STC with 7 threads:
ELO: -0.66 +-2.7 (95%) LOS: 31.8%
Total: 20000 W: 3210 L: 3248 D: 13542
No functional change
There's no reason to define it as a Bitboard, so for consistency, use bool.
This is even a speedup on my machine: i7-3770k, using gcc 4.9.1 (linux):
stat test master diff
mean 2,341,338 2,327,998 13,134
stdev 15,765 14,717 5,405
speedup 0.56%
P(speedup>0) 100.0%
No functional change.
Resolves#298
Avoid redundant 'while' conditions. It is enough to
check them in the outer loop.
Quick tested for no regression 10K games at 4 threads
ELO: -1.32 +-3.9 (95%) LOS: 25.6%
Total: 10000 W: 1653 L: 1691 D: 6656
No functional change.
Spinlock must be used instead.
Tested for no regression at 15+0.05 th 4:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25928 W: 4303 L: 4190 D: 17435
No functional change.
Resolves#297
Unify the quites moves loop for both cases,
the compiler optimizes away the
if (is_ok((ss-1)->currentMove))
inside loop, so that the result is same
speed as original.
No functional change.
During the search, do not block on condition variable, but instead use std::this_thread::yield().
Clear gain with 16 threads. Again results vary highly depending on hardware, but on average it's a clear gain.
ELO: 12.17 +-4.3 (95%) LOS: 100.0%
Total: 7998 W: 1407 L: 1127 D: 5464
There is no functional change in single thread mode
Resolves#294
Both are the result of a SPSA tuning session with a custom book, 50k iterations each.
After an additional tuning session of the mobility values, tuning the delta values, with following result.
40k games at 9+0.05:
ELO: 4.13 +-2.2 (95%) LOS: 100.0%
Total: 40000 W: 8581 L: 8106 D: 23313
and LTC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 36518 W: 6049 L: 5782 D: 24687
Bench: 8567402
Resolves#284
Fixes reported startup error about missing libwinpthread-1.dll
when the dll is not in the path.
The current -static-xxxx flags, introduced with:
https://github.com/official-stockfish/Stockfish/commit/373503f4a9a990054b5
Only take in account standard libraries, but not thread
library.
No functional change.
Resolves#289
Idea and original implementation by Stephane Nicolet
7 threads 15+0.05
ELO: 3.54 +-2.9 (95%) LOS: 99.2%
Total: 17971 W: 2976 L: 2793 D: 12202
There is no functional change in single thread mode
Spend much less time in positions where one move is much better than all other alternatives.
We carry forward pv stability information from the previous search to identify such positions.
It's based on my old InstaMove idea but with two significant improvements.
1) Much better instability detection inside the search itself.
2) When it's time to make a FastMove we no longer make it instantly but still spend at least 10% of normal time verifying it.
Credit to Gull for the inspiration.
BIG thanks to Gary because this would not work without accurate PV!
20K
ELO: 8.22 +-3.0 (95%) LOS: 100.0%
Total: 20000 W: 4203 L: 3730 D: 12067
STC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 23266 W: 4662 L: 4492 D: 14112
LTC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 12470 W: 2091 L: 1931 D: 8448
Resolves#283
Introduce a counter move history table which additionally is indexed by the last move's piece and target square.
For quiet move ordering use now the sum of standard and counter move history table.
STC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 4747 W: 1005 L: 885 D: 2857
LTC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 5726 W: 1001 L: 872 D: 3853
Because of reported low NPS on multi core test
STC (7 threads):
ELO: 7.26 +-3.3 (95%) LOS: 100.0%
Total: 14937 W: 2710 L: 2398 D: 9829
Bench: 7725341
Resolves#282
Use Mutex instead.
This is in preparaation for merging with master branch,
where we stilll don't have spinlocks.
Eventually spinlocks will be readded in some future
patch, once c++11 has been merged.
No functional change.
minKingPawnDistance is used only as local variable in one place so we don't need it to be part of "Pawns::Entry" structure.
No functional change.
Resolves#277
This change in the Makefile restores the possibility to compile
Stockfish on Mac OS X 10.9 and 10.10 after the C++11 has been merged.
To use the default (fastest) settings, compile with:
make build ARCH=x86-64-modern
To test the clang settings, compile with
make build ARCH=x86-64-modern COMP=clang
Beware that the clang settings may provide a slightly slower (6%)
executable.
Backported from master.
No functional change
Resolves#275
This change in the Makefile restores the possibility to compile
Stockfish on Mac OS X 10.9 and 10.10 after the C++11 has been merged.
To use the default (fastest) settings, compile with:
make build ARCH=x86-64-modern
To test the clang settings, compile with
make build ARCH=x86-64-modern COMP=clang
Beware that the clang settings may provide a slightly slower (6%)
executable.
No functional change
Resolves#275
Now that c++11 branch has been merged in master,
disable unconditionally the spinlocks and use mutex
instead. This will allow to run fishtest even on HT
machines withouth changes.
In the future we will reintorduce spinlocks, once
we will have took care of fishtest.
No functional change.
And use mutex instead. You may never want to do this.
It is a workaround to run c++11 on fishtest where many
machiens have HTenabled and this can be a problem when
number of cores set is higher than number of physical cores.
To disable spinlocks, just compile with -DNO_SPINLOCK flag
No functional change.
Calling lock.test_and_set() in a tight loop creates expensive
memory synchronizations among processors and penalize other
running threads. So syncronize only only once at the beginning
with fetch_sub() and then loop on a simple load() that puts much
less pressure on the system.
Reported about 2-3% speed up on various systems.
Patch by Ronald de Man.
No functional change.
It is reported to be defenitly faster with increasing
number of threads, we go from a +3.5% with 4 threads
to a +15% with 16 threads.
The only drawback is that now when testing with more
threads than physical available cores, the speed slows
down to a crawl. This is expected and was similar at what
we had setting the old sleepingThreads to false.
No functional change.
Initialization is more complex than what I'd like due
to MSVC compatibility that for some reason does not like:
std::atomic_flag lock = ATOMIC_FLAG_INIT;
No functional change.
It should be used slavesMask.count() instead.
Verified 100% equivalent when sp->allSlavesSearching:
dbg_hit_on(sp->allSlavesSearching, sp->slavesCount != sp->slavesMask.count());
No functional change.
Document and clarify that we cannot rejoin on ourselves
and that we never late join if we are master and all
slaves have finished, inded in this case we exit idle_loop.
No functional change.
In case of Threads.size() == 2 we have that sp->allSlavesSearching
is always false (because we have finished our search), bestSp is
always NULL and we never late join, so there is no need to special
case here.
Tested with dbg_hit_on(sp && sp->allSlavesSearching) and
verified it never fires.
No functional change.
And retire a redundant field. This is important also
from a concept point of view becuase we want to keep
SMP structures as simple as possible with the only
strictly necessary data.
Verified with
dbg_hit_on(sp->spLevel != level)
that the values are 100% the same out of more 50K samples.
No functional change.
Balance threads between split points.
There are huge differences between different machines and autopurging makes it very difficult to measure the improvement in fishtest, but the following was recorded for 16 threads at 15+0.05:
For Bravone (1000 games): 0 ELO
For Glinscott (1000 games): +20 ELO
For bKingUs (1000 games): +50 ELO
For fastGM (1500 games): +50 ELO
The change was regression for no one, and a big improvement for some, so it should be fine to commit it.
Also for 8 threads at 15+0.05 we measured a statistically significant improvement:
ELO: 6.19 +-3.9 (95%) LOS: 99.9%
Total: 10325 W: 1824 L: 1640 D: 6861
Finally it was verified that there was no (significant) regression for
4 threads:
ELO: 0.09 +-2.8 (95%) LOS: 52.4%
Total: 19908 W: 3422 L: 3417 D: 13069
2 threads:
ELO: 0.38 +-3.0 (95%) LOS: 60.0%
Total: 19044 W: 3480 L: 3459 D: 12105
1 thread:
ELO: -1.27 +-2.1 (95%) LOS: 12.3%
Total: 40000 W: 7829 L: 7975 D: 24196
Resolves#258
This micro-optimization only complicates the code and provides no benefit.
Removing it is even a speedup on my machine (i7-3770k, linux, gcc 4.9.1):
stat test master diff
mean 2,403,118 2,390,904 12,214
stdev 12,043 10,620 3,677
speedup 0.51%
P(speedup>0) 100.0%
No functional change.
This micro-optimization only complicates the code and provides no benefit.
Removing it is even a speedup on my machine (i7-3770k, linux, gcc 4.9.1):
stat test master diff
mean 2,403,118 2,390,904 12,214
stdev 12,043 10,620 3,677
speedup 0.51%
P(speedup>0) 100.0%
No functional change.
Use integer arithmetic instead of floating point arithmetic.
Floating point arithmetic was causing different results for some 32-bit compiles
No functional change
Resolves#249Resolves#250
Also remove useless StateCopySize64 optimization:
compiler uses SSE movups instruction anyhow and
does not need this trick (verified with fishbench).
No functional change.
Funny enough, gcc __builtin_prefetch() expects
already a void*, instead Windows's _mm_prefetch()
requires a char*.
The patch allows to remove ugly casts from caller
sites.
No functional change.
Results for 10 tests for each version (gcc 4.8.3 on mingw):
Base Test Diff
Mean 1502447 1507917 -5470
StDev 3119 1364 4153
p-value: 0,906
speedup: 0,004
Results for 10 tests for each version (MSVC 2013):
Base Test Diff
Mean 1400899 1403713 -2814
StDev 1273 2804 2700
p-value: 0,851
speedup: 0,002
No functional change.
I went through all the individual compile options that differ between
-fprofile-generate/-fprofile-use and -fprofile-arcs/-fbranch-probabilities
and distilled the speed difference down to only turning off
-fno-peel-loops and -fno-tracer. Using this we still get the full speedup
(maybe a bit more because other optimizations stay on) and it's also much cleaner
because we can get rid of the "@rm -f ucioption.gc*" hack for all versions of gcc.
No functional change.
Resolves#237
Follow the usual approach to delay computation
as far as possible, in case an earlier killer
cut-offs we avoid to do useless work.
This also greatly simplifies the code.
No functional change.
Verified with perft there is no speed regression,
and code is simpler. It is also conceptually correct
becuase an extended move is just a move that happens
to have also a score.
No functional change.
This is an old patch from Jean-Francois Romang to send
UCI hashfull info to the GUI:
https://github.com/mcostalba/Stockfish/pull/60/files
It was wrongly judged as a slowdown, but it takes much
less than 1 ms to run, indeed on my core i5 2.6Ghz it
takes about 2 microsecs to run!
Regression test is good:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 7352 W: 1548 L: 1401 D: 4403
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 61432 W: 10307 L: 10251 D: 40874
I have set the name of the author to the original
one.
No functional change.
This patch has two positive effects:
- Retire a hackish formula and leave
just a natural, simple and plain one.
- Reduce strenght at very low level, but
don't impact medium/high levels.
Actually even at level 0, SF is still too
strong for many beginners (this was reported
many times for instance on Droidfish user
comments on Google Play).
Test on fishtest shows that ELO drop is around
170 ELO at level 0 (good!), 130 ELO at level 1
and smoothly reduces (as expected) until level
10 where the drop is just of 8 ELO.
No functional change.
- Fix a skill level problem: Don't allow move pruning at root node
- Revert "Fix profile build for gcc on Mac OSX". Results for a faster binary in x86-64.
- Fix a MSVC warning
Bench: 8918745
Seems to be a performance regression for standard build.
For SF6 people compiling on Mac OSX using profile-build option
just need to make necessary adjustments manually...
No functional change
Resolves#223
Instead of swapping sub-optimal move in Skill
d'tor, make it explicitly at the end of the search.
Also streamline and clarify relation with multiPV
and pass it directly instead of relying on the hacky
'candidates' member.
No functional change.
Warning is C4512 (assignment operator could not be generated)
Now, apart the foreign syzygy code, everything compiles
without warnings at warning level 4.
Backported from C++11 branch.
No functional change.
- Fix a compilation issue related to BMI2 PEXT instruction
- Retrieve a ponder move from TT if PV is only one move long
Bench: 8080602
No functional change
This intrinsic to call BMI2 PEXT instruction is
defined in immintrin.h. This header should be
included only when USE_PEXT is defined, otherwise
we define _pext_u64 as 0 forcing a nop.
But under some mingw platforms, even if we don't
include the header, immintrin.h gets included
anyhow through an include chain that starts with
STL <algorithm> header. So we end up both defining
_pext_u64 function and at the same time defining
_pext_u64 as 0 leading to a compile error.
The correct solution is of not using _pext_u64 directly.
This patch fixes a compile error with some mingw64
package when compiling with x86-64.
No functional change.
Resolves#222
In case we stop the search during a fail-high
it is possible we return to GUI without a ponder
move. This patch try harder to find a ponder move
retrieving it from TT. This is important in games
played with 'ponder on'.
bench: 8080602
Resolves#221
When not using Makefile, e.g. with MSVC, if hardware
supports BMI2 instructions, then USE_PEXT should be
added in project configuration to enable pext support.
No functional change.
This intrinsic to call BMI2 PEXT instruction is
defined in immintrin.h. This header should be
included only when USE_PEXT is defined, otherwise
we define _pext_u64 as 0 forcing a nop.
But under some mingw platforms, even if we don't
include the header, immintrin.h gets included
anyhow through an include chain that starts with
STL <algorithm> header. So we end up both defining
_pext_u64 function and at the same time defining
_pext_u64 as 0 leading to a compile error.
The correct solution is of not using _pext_u64 directly.
This patch fixes a compile error with some mingw64
package when compiling with x86-64.
No functional change.
Warning is C4512 (assignment operator could not be generated)
Now, apart the foreign syzygy code, everything compiles
without warnings at warning level 4.
No functional change.
Coverity scan warns about uninitialized 'sf' argument when
calling probe(). Actually it is a false positive because
argument is passed by reference and assigned inside
probe(). Nevertheless it is a hint that fucntion signature
is a bit tricky, so rewrite it in a more conventional way,
assigning 'sf' from probe() return value.
No functional change.
On platforms where size_t is 32 bit, we
can have an overflow in this expression:
(mbSize * 1024 * 1024)
Fix it setting max hash size of 2GB on platforms
where size_t is 32 bit.
A small rename while there: now struct Cluster
is definied inside class TranspositionTable so
we should drop the redundant TT prefix.
No functional change.
On Android-ARM current TB code crashes at
random times even in single thread mode.
Reported, debugged, fixed and verified
by Peter Osterlund.
No functional change.
Resolves#201
This optimization is aimed at old hardware only (withouth popcount), and even on
non popcount compile (ARCH=x86-64), it provides no mesurable speedup:
stat test master diff
mean 2,341,779 2,354,699 -12,920
stdev 12,910 14,770 18,150
speedup -0.55%
P(speedup>0) 23.8%
No functional change.
Resolves#187
- Change UCI::value() signature
This function should only return the value,
lowerbound and upperbound info is up to the
caller because it requires external knowledge,
out of the scope of this little helper.
- Retire 'key' command
It is not an UCI command and is absolutely
useless: never used.
- Comments fixing and other trivia
No functional change.
And reshuffle a bit the functions to place
them in a consistent order.
To be on the safe side, patch has been
validated for no regression/crashes with
a small 8K games test with 3 threads:
ELO: 3.98 +-4.4 (95%) LOS: 96.3%
Total: 8388 W: 1500 L: 1404 D: 5484
No functional change.
It is up to material (and pawn) table look up
code to know where the per-thread tables are,
so change API to reflect this.
Also some comment fixing while there
No functional change.
Move all in evaluation.
Simplify the code and concentrate in a single place
all the logic behind space evaluation, making it much
more clear.
Verified also at STC it does not regress due to a possible
slow down:
LLR: 3.91 (-2.94,2.94) [-3.00,1.00]
Total: 65744 W: 13285 L: 13194 D: 39265
No functional change.
Commenst are obsolete now, an updated description
would be quite obscure, so better let the code
to talk and remove them all together.
No functional change.
Use the same template of other pawns moves generation,
make the code more uniform, simplify generate_promotions
that has now been renamed.
No functional change (verified also with perft).
In some UNIX systems "rm" prompts user for confirmation.
However "rm -f" is always a guaranteed forced deletion.
Also move gcc profiling hack under the correct target
No Functional change
Resolves#168
ShelterWeakness and Stormdanger array are now indexed additionally by
file pair (a/h,b/g,c/f,d/e). The special case of king blocking a pawn
is incorporated in the StormDanger array. Finally the 93 parameters
are tuned by SPSA on LTC.
STC
ELO: 3.46 +-2.2 (95%) LOS: 99.9%
Total: 40000 W: 8275 L: 7877 D: 23848
LTC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 10311 W: 1876 L: 1721 D: 6714
Bench: 9498821
Resolves#163
The evaluation is already done by the specialized
function, don't need to add something elese later.
With this patch following positions are evaluated
correctly as draws:
8/6p1/1Pkp1p1p/2nNn2P/2P1K1P1/8/8/3B4 w - - 7
8/1k4p1/1P1p1p1p/3NnK1P/2P3P1/1n6/4B3/8 w - -
Verified it not regress with an STC test:
LLR: 3.15 (-2.94,2.94) [-3.00,1.00]
Total: 49812 W: 10095 L: 10016 D: 29701
Reported by Arjun Temurnikar.
bench: 8289983
In particular seems more natural to return
bool and TTEntry on the same line, actually
we should pass and return them as a pair,
but due to limitations of C++ and not wanting
to use std::pair this can be an acceptable
compromise.
No functional change.
Resolves#157
In some cases we want to go direcly to the moves loop
without checking for early return. The patch make this
logic more clear and consistent.
Tested for no regression, passed STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 25282 W: 5136 L: 5022 D: 15124
and LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 72007 W: 12133 L: 12095 D: 47779
bench: 9316798
This patch replaces RKISS by a simpler and faster PRNG, xorshift64* proposed
by S. Vigna (2014). It is extremely simple, has a large enough period for
Stockfish's needs (2^64), requires no warming-up (allowing such code to be
removed), and offers slightly better randomness than MT19937.
Paper: http://xorshift.di.unimi.it/
Reference source code (public domain):
http://xorshift.di.unimi.it/xorshift64star.c
The patch also simplifies how init_magics() searches for magics:
- Old logic: seed the PRNG always with the same seed,
then use optimized bit rotations to tailor the RNG sequence per rank.
- New logic: seed the PRNG with an optimized seed per rank.
This has two advantages:
1. Less code and less computation to perform during magics search (not ROTL).
2. More choices for random sequence tuning. The old logic only let us choose
from 4096 bit rotation pairs. With the new one, we can look for the best seeds
among 2^64 values. Indeed, the set of seeds[][] provided in the patch reduces
the effort needed to find the magics:
64-bit SF:
Old logic -> 5,783,789 rand64() calls needed to find the magics
New logic -> 4,420,086 calls
32-bit SF:
Old logic -> 2,175,518 calls
New logic -> 1,895,955 calls
In the 64-bit case, init_magics() take 25 ms less to complete (Intel Core i5).
Finally, when playing with strength handicap, non-determinism is achieved
by setting the seed of the static RNG only once. Afterwards, there is no need
to skip output values.
The bench only changes because the Zobrist keys are now different (since they
are random numbers straight out of the PRNG).
The RNG seed has been carefully chosen so that the
resulting Zobrist keys are particularly well-behaved:
1. All triplets of XORed keys are unique, implying that it
would take at least 7 keys to find a 64-bit collision
(test suggested by ceebo)
2. All pairs of XORed keys are unique modulo 2^32
3. The cardinality of { (key1 ^ key2) >> 48 } is as close
as possible to the maximum (65536)
Point 2 aims at ensuring a good distribution among the bits
that determine an TT entry's cluster, likewise point 3
among the bits that form the TT entry's key16 inside a
cluster.
Details:
Bitset card(key1^key2)
------ ---------------
RKISS
key16 64894 = 99.020% of theoretical maximum
low18 180117 = 99.293%
low32 305362 = 99.997%
Xorshift64*, old seed
key16 64918 = 99.057%
low18 179994 = 99.225%
low32 305350 = 99.993%
Xorshift64*, new seed
key16 65027 = 99.223%
low18 181118 = 99.845%
low32 305371 = 100.000%
Bench: 9324905
Resolves#148
Currently Search::RootMoves is accessed and even
modified by TB probing functions in a hidden
and sneaky way.
This is bad practice and makes the code tricky.
Instead explicily pass the vector as function
argument so to clarify that the vector is modified
inside the functions.
No functional change.
Simplified also some logic while there.
TBLargest needs renaming too, but itis for
a future patch because touches also syzygy
directory stuff.
No functional change.
We really don't need to uglify in this way
our nice count() API with this ad-hoc hack.
So remove the hack and use the already
existing infrastructure.
No functional change.
Resolves#134
Updating the makefile so that the clean and gcc-profile-clean targets also
remove the profiling data files in the syzygy directory.
No functional change.
Resolves#136
Just like in Physics, the ratio of 2 things in the same unit, should be
without unit.
Example use case:
- Ratio of a Depth by a Depth (eg. ONE_PLY) should be an int.
- Ratio of a Value by a Value (eg. PawnValueEg) should be an int.
Remove a bunch of useless const while there.
No functional change.
Resolves#128
Adds support for Syzygy tablebases to Stockfish. See
the Readme for information on using the tablebases.
Tablebase support can be enabled/disabled at the Makefile
level as well, by setting syzygy=yes or syzygy=no.
Big/little endian are both supported.
No functional change (if Tablebases are not used).
Resolves#6
It is possible that we won't have a ponder move if our PV
is too short. In that case, just don't print a ponder move.
No functional change
Resolves#130
When evaluating double pawns we use always
lsb() to extract the frontmost square.
This breaks evaluation color symmetry as is
possible to verify with an instrumented evaluate()
Value evaluate(const Position& pos) {
Value v = do_evaluate<false>(pos);
Position p = pos;
p.flip();
assert(v == do_evaluate<false>(p));
return v;
}
Passed no regression test:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 21035 W: 4244 L: 4122 D: 12669
LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 39839 W: 6662 L: 6572 D: 26605
bench: 8255966
It is a non functional change, but because
we now skip copying of pv[] in SpNode, patch
has been tested for regression with 3 threads:
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 54668 W: 9873 L: 9809 D: 34986
No functional change.
This is a regression from 428962a
We have to cast to char here, otherwise the compiler
interprets it as an integer, and writes a number.
No functional change
Resolves#122
This is the first of a patch series to
rearrange and simplify accurate PV.
In this patch there is simple coding
style and reformatting stuff.
Verified with fishtest it does not crash
with MAX_PLY = 8
No functional change.
Previously token would keep its value from the previous line when an empty
line was input, leading to unexpected behaviour.
No functional change
Resolves#119
When comparing to a Depth it is more
consistent to use DEPTH_MAX instead
of a int.
This is a subtle difference because we use
ply and depth almost interchangably in SF,
but they are different. FOr counting plies
makes ense to continue using ints, while
for Depth we have our specific enum.
This cleanly fixes a new Clang 3.5 warning:
No functional change.
This gives SF accurate PVs, such that the evaluation of the leaf node in
the PV matches the score backed up to the root (99% of the time.
q-search will use the value stored in the hash table instead of the eval
value sometimes).
One drawback is that fail-high/low only get a minimal 2 move PV.
It doesn't add any additional overhead to the non-PV codepath except an
extra eight bytes to the SearchStack structure in multi-threaded
searches.
A core part of this is not pruning based on TT score in PV nodes. This
was measured as not being a regression at multiple TCs, except for one
exception, fast TC with huge hash, which is not realistic for longer
searches.
STC - 1 thread, 128 mb hash
ELO: 1.42 +-3.1 (95%) LOS: 81.9%
Total: 20000 W: 4078 L: 3996 D: 11926
STC - 3 thread, 128 mb hash
ELO: -3.60 +-2.9 (95%) LOS: 0.8%
Total: 20000 W: 3575 L: 3782 D: 12643
STC - 3 thread, 8 mb hash
ELO: 0.12 +-2.9 (95%) LOS: 53.3%
Total: 20000 W: 3654 L: 3647 D: 12699
LTC - 3 thread, 32mb hash
ELO: 2.29 +-2.0 (95%) LOS: 98.8%
Total: 35740 W: 5618 L: 5382 D: 24740
Bench: 6984058
Resolves#102
Daniel Jose reported that it was an elo gain in his engine:
http://www.talkchess.com/forum/viewtopic.php?t=54290
STC: Hash=16
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33067 W: 6670 L: 6571 D: 19826
LTC: Hash=64
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 41181 W: 7008 L: 6920 D: 27253
And another one to verify no regression with hash pressure:
STC: Hash=4
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 25085 W: 5059 L: 4991 D: 15035
Verified that qsearch does not explode after this patch (recapture threshold).
Bench 7962287
Resolves#112
Real men jump/branch by hand...but
we prefer the humble way.
Moved also some uci info code where it
belongs, while there.
No functional change.
Resolves#110
16MB for 1" searches is more comensurate with the average use case.
And 16 is the default used by stockfish bench, so it makes sense to be
consistent, if only to have the same minimum memory requirement for using
SF and compiling it with PGO.
No functional change.
Resolves#109
Now we can directly replace it with
the definition resulting in simpler
and possibly faster code because
PvNode is evaluated at compile time.
No functional change.
* remove some erroneous comments, that were based on the ONE_PLY == 2.
* rename hd to d, because there's no more half-depth in SF.
* rescope variables into the for loops.
* reindent the for loops correctly.
* add a comment to explain the eval improving part (not so obvious to read
this code as array has 4 dimensions).
No functional change.
This should reduce search inconsistencies, and doesn't seem to have a measurable ELO Impact:
STC with Hash=16
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 49264 W: 10076 L: 10007 D: 29181
LTC with Hash=64
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 82149 W: 14044 L: 14023 D: 54082
Plus an extra test, to make sure it doesn't regress with strong hash pressure:
STC with Hash=4
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 21498 W: 4327 L: 4246 D: 12925
Bench: 7302735
Resolves#100
Idea is to apply king safety later in the endgame. Previously, we didn't
apply KS in a RR vs. Q ending for example, which causes poor play.
Now we calculate king attacks when the attacking side has a queen or more.
STC with 8moves_v3
LLR: 3.06 (-2.94,2.94) [0.00,4.00]
Total: 38481 W: 6228 L: 5952 D: 26301
LTC with 2moves_v1
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 51053 W: 8670 L: 8353 D: 34030
Bench: 7514010
Resolves#98
UCI specification is not clear on the size of
integers that are exchanged in the protocol, so
instead of a simple int, assume 'nodes' is a
int64_t because we need a bigger size to store
this value in many real cases, especialy with
very long searches.
No functional change.
Resolves#75
Clang 3.5 issues warning on constructs like: abs(f1 - f2). The thing is that
f1 and f2 are enum types, and the range given (all positive) allows the
compiler to choose an unsigned type (efficiency being one reason to prefer
unsigned arithmetic). If f1 < f2 are unsigned, then f1 - f2 wraps around zero
and the abs() becomes a no-op. It's the reinterpretation of the unsigned
result (large value) as a signed int that happens to give the correct result,
thanks to 2's complement. This is all tricky and dangerous!
In the spirit of the standard, we assume nothing on the signedness of enums,
and simply calculate the rank and file distances as:
- rank_dist(r1, r2) = r1 < r2 ? r2 - r1 : r1 - r2
- file_dist(f1, f2) = f1 < f2 ? f2 - f1 : f1 - f2
this logic can in fact be applied to any enum we may use, so for better
generality and to avoid code duplication, we use a template function diff()
here.
No functional change.
Resolves#95
This area has become obscure and tricky over the course of incremental
changes that did not respect the original's consistency and clarity. Now,
it's not clear why we use MAX_PLY = 120, or why we use MAX_PLY+6, among
other things.
This patch does the following:
* ID loop: depth ranges from 1 to MAX_PLY-1, and due to TT constraint (depth
must fit into an int8_t), MAX_PLY should be 128.
* stack[]: plies now range from 0 to MAX_PLY-1, hence stack[MAX_PLY+4],
because of the extra 2+2 padding elements (for ss-2 and ss+2). Document this
better, while we're at it.
* Enforce 0 <= ply < MAX_PLY:
- stop condition is now ss->ply >= MAX_PLY and not ss->ply > MAX_PLY.
- assert(ss->ply < MAX_PLY), before using ss+1 and ss+2.
- as a result, we don't need the artificial MAX_PLY+6 range. Instead we
can use MAX_PLY+4 and it's clear why (for ss-2 and ss+2).
* fix: extract_pv_from_tt() and insert_pv_in_tt() had no reason to use
MAX_PLY_PLUS_6, because the array is indexed by plies, so the range of
available plies applies (0..MAX_PLY before, and now 0..MAX_PLY-1).
Tested with debug compile, using MAX_PLY=16, and running bench at depth 17,
using 1 and 7 threads. No assert() fired. Feel free to submit to more severe
crash-tests, if you can think of any.
No functional change.
SSE4.2 has nothing to do with POPCNT. We must dispell this myth, because
Stockfish is a reference and many will copy this mistake if they see it in Stockfish:
* SSE is an SIMD instruction set, relative to vectorization (using special 128-bit registers).
* POPCNT/LZCNT work on normal registers (eg. AL, AX, EAX, RAX).
The confusion comes from the fact that, in the Intel product line, it just
so happens that SSE4.2 and POPCNT/LZCNT came out at the same time. But this
is not true for AMD. For example, all AMD Pheniom II have SSE3 but no
POPCNT/LZCNT, and that is why the modern compile uses -msse3 -popcnt and not -msse4.2.
No functional change.
Resolves#86
Objects that are only accessible at file-scope should be put in the anonymous namespace.
This is what the C++ standard recommends, rather than using static, which is really C-style and results in static linkage.
Stockfish already does this throughout the code. So let's weed out the few exceptions,
because... they have no reason to be exceptional.
No functional change.
Resolves#84
It is more idiomatic, we didn't used it
in the past because Position::pretty(Move)
had a calling argument, but now we can.
As an added benefit, we avoid a lot of string
copies in the process because now we avoid
std::ostringstream ss.
No functional change.
This commit fixes two issues:
1) Don't print PVs after the search has been interrupted
This solves the "mate 0 upperbound" scores that sometimes
creep up when a multi-PV analysis gets interrupted with
the `stop` command.
2) Print multipv before score
Shredder Classic fails to identify the main PV
(the one with multipv 1) if `score` comes first.
This leads to an eval graph that doesn't reflect
the scores actually reported by Stockfish when
doing a multiPV analysis.
No functional change
Closes#76
Helper function should just know how to find the
biggest piece type in a bitboard. All the threat
logic and data shoud be in evaluate_threats().
This nicely separates the scope of the two functions
in a more consistent way and simplifies the code.
No functional change.
Use the max_threat() helper function to estimate more precisely the
best hanging piece threat. Also retunes the Threat array using SPSA.
STC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 7598 W: 1596 L: 1468 D: 4534
LTC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 7896 W: 1495 L: 1350 D: 5051
Bench: 6816504
Resolves#73
Instead of hard-code the weights in a big table,
we prefer to calculate them out of few parameters
at startup. This allows to keep low the number of
independent parameters and hence is good for tuning
and for a better insight in the meaning of the numbers.
No functional change.
Make even more clear what are the terms that
contribute to evaluate connected pawns, and
completely separate them from the weights
that are now fully looked up in a table.
For future tuning makes sense to init the table with
a formula instead of hard-code it. This allows to
reduce problem space cardinality and makes tuning
easier.
And fix a MSVC warning while there:
warning C4804: '>>' : unsafe use of type 'bool' in operation
No functional change.
These two notions are very correlated. Since connected has the most
generality, it makes sense to generalize it to encompass what is
covered by candidate.
STC:
LLR: 4.03 (-2.94,2.94) [-3.00,1.00]
Total: 11970 W: 2577 L: 2379 D: 7014
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 13194 W: 2389 L: 2255 D: 8550
bench 7328585
Now that half plies have been removed from the engine, we can encode
TT depth into an int8_t.
Range is -128 to +127, so it goes still further than the previous
limit of 121 plies (with ONE_PLY == 2 where depth - DEPTH_NONE was
encoded as an uint8_t).
No functional change.
Resolved#60
Now that half-plies are no more used we can simplify
the code assuming that ONE_PLY is 1 and no more 2.
Verified with a SMP test:
LLR: 2.95 (-2.94,2.94) [-4.50,0.00]
Total: 8926 W: 1712 L: 1607 D: 5607
No functional change.
On top of previous patch, rename time variables to
reflect the simplification of UCI parameters.
It is more correct to use as varibales directly the
corresponding UCI option, without intorducing redundant
intermediate variables.
This allows also to simplify the code.
No functional change.
In case of a succesful late join we set again
'searching' flag, so we can restart search
immediately without an useless lock/unlock
cycle.
No functional change.
Now "Write Search Log" will pring moves in UCI format, consistent with all the rest. This functionality is
not aimed at end-users anyway. It's hardly useful at all, in fact. Also, pretty-printing SAN moves is
something that better belongs in the GUI than in the engine.
No functional change.
Where they better belong.
Also, this removes '#include <string>' from types.h, which reduces the amount of code to compile (every
translation unit includes types.h).
No functional change.
Unify various perft functions and move all the code
to search.cpp.
Avoid perft implementation to be splitted between
benchmark.cpp (where it has no reason to be) and
search.cpp
No functional and no speed change (tested).
So that one can redirect cout to /dev/null and only print print cerr in the terminal (for more accurate speed
tests).
Suggested by Marco.
No functional change.
The compiler tries to cast Options["Hash"] into a string, using:
Option::operator std::string() const {
assert(type == "string");
return currentValue;
}
And, as expected, the assert() fails.
std::to_string() would be the right solution, but it's C++11. And using a stringstream is too much code to
achieve so little. Let's keep it the way it was: hardcoded (ie. default hash defined in two places).
No functional change.
The eval already returns zero in KK, KBK, KNK (see material.cpp). The difference is:
- we lose the "TB pruning" benefit of the draw rule (ie. search goes on even if eval is zero)
- we gain some speed by removing a useless test from the hot path
STC:
LLR: 0.05 (-2.94,2.94) [-3.00,1.00]
Total: 128000 W: 21357 L: 21560 D: 85083
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 33023 W: 4613 L: 4509 D: 23901
bench 7461881
First, remove some dead code (function never called with a Move argument).
Then, remove printing of legal moves, which does not belong here. Let's keep commands orthogonal and minimal:
- the "d" command should display the board, nothing more, or less.
- "perft 1" will display the list of legal moves.
No functional change.
Stockfish allocates the default hash (32MB) in main(), before entering UCI::loop(). If there is not enough
memory, the program will crash even before UCI::loop() is entered and the GUI is given a change to specify a
lower Hash value.
This defective design could be resolved by doing a lazy allocation upon "isready" command, as the UCI protocol
guarantees that "isready" will be sent at least once before any search. But it's a bit cumbersome when using
Stockfish "manually" to have to remember to type "isready" everytime.
So leave the current design, but reduce the default hash to 16MB instread of 32MB. In order to perform such
quick searches (depth=13), there is no reason to use so much Hash anyway. Another benefit is to introduce a
bit of hash pressure in bench, which increases chances to detect rare bugs related to TT replacement, for
example.
This is not a functional change, although it obviously changes the bench.
bench 7461879
Instead of defining it both in ucioption.cpp and benchmark.cpp. Obviously changing the default Hash will
change the bench as a result.
No functional change.
The main purpose of perft is to help debugging. But without the breakdown in sum of perft(N-1), it is a
completely useless debugging tool.
So perft now displays the breakdown, and divide is therefore removed.
No functional change.
After commit 94b1bbb68b, in case available root moves are less than multiPV, we
could never reach condition:
PVIdx + 1 == multiPV
and as a consequence UCI output is not printed.
Fixed suggested by Joerg Oster.
No functional change.
Despite being neutral at STC, it turned out to be regressive at LTC:
40k games at LTC with Hash=8
ELO: -2.06 +-1.9 (95%) LOS: 1.4%
Total: 39720 W: 5740 L: 5976 D: 28004
40k games at LTC with Hash=128
ELO: -2.69 +-1.9 (95%) LOS: 0.2%
Total: 39149 W: 5702 L: 6005 D: 27442
bench 7477963
Also raise the admissible bounds to (-100,100), as there is no reason to prevent users from using high
values if they want to.
Does not regress in self play:
ELO: 0.10 +-2.0 (95%) LOS: 53.7%
Total: 40000 W: 7084 L: 7073 D: 25843
master vs SF 3
ELO: 182.86 +-2.7 (95%) LOS: 100.0%
Total: 40000 W: 21843 L: 2541 D: 15616
Contempt = 20 vs SF 3
ELO: 189.25 +-2.8 (95%) LOS: 100.0%
Total: 40000 W: 22721 L: 2859 D: 14420
Diff is therefore 6.4 +/- 3.9 elo against a 180-190 elo weaker engine, which is significantly positive,
as expected. This elo difference is likely understated, because of FishTest aggressive draw adjudication
though.
We could push Contempt further, but after 20cp, it would get in the way of FishTest draw adjudication
rule, and is likely to reduce the testing throughput as a result.
bench 8198667
Steamline a bit the implementation of
skill levels. As a side effect we can
retire MultiPV global and use a local
variable instead.
No functional change.
- Currently broken
- Never been really useful
- Does not work well with new splitting model
Verified for no regression at STC with 3 threads:
LLR: 2.96 (-2.94,2.94) [-6.00,0.00]
Total: 81905 W: 12122 L: 12381 D: 57402
No functional change
Bitboard init code is already noteasy to follow,
so don't make it even harder using 'smart' code.
Also reindent a while loop in standard way.
No functional change.
With Eelco's patch "Don't special case for abs(beta) >= VALUE_MATE_IN_MAX_PLY" condition "abs(ttValue) < VALUE_KNOWN_WIN" has been removed from singular extension search, and condition "abs(beta) < VALUE_KNOWN_WIN" was added to the SingularExtensionNode definition.
This might lead to problems, especially in positions, where a mate is due.
For example, this position 5rk1/4K1pp/8/5PPP/8/8/8/1R6 w - - 12 1 triggers an assert.
stockfish: search.cpp:434: Value {anonymous}::search(Position&, Search::Stack*, Value, Value, Depth, bool) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)2u; bool SpNode = false]: Assertion `-VALUE_INFINITE <= alpha && alpha < beta && beta <= VALUE_INFINITE' failed.
So let's re-insert the removed condition.
First spotted by Uri Blass, fix by me.
Bench: 8759675
* /boot/common was removed from Haiku
* The equivalent path now that package management has been implemented is /boot/system/non-packaged
No functional change
Bench: 8759681
The assert:
assert(ttValue != VALUE_NONE);
Could fire for multiple reasons (although is very rare),
for instance after an IID we can have ttMove != MOVE_NONE
while ttValue is still set at VALUE_NONE.
But not only this, actually SMP is a source of corrupted
ttValue and anyhow we can detect the condition:
ttMove != MOVE_NONE && ttValue == VALUE_NONE
even north of IID.
Reported by Ronald de Man.
It is so rare that bench didn't change.
bench: 7710548
Remove from the search this special case and apply
null search and razoring also in mate positions.
Tested in no-regression mode and passed both
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 65431 W: 10860 L: 10810 D: 43761
and LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 34928 W: 4814 L: 4713 D: 25401
This patch kicks in only in mate positions and in
these cases it seems beneficial in finding mates
faster as Yery Spark measured on the Chest mate suite:
Total number of positions 6425
Fixed nodes 200K per position
master: 1049
new: 1154
And also the 5446 'hard' positions again with 2000K nodes
(those not found by both engines in 200K nodes):
master: 1069
new: 1395
bench: 7710548
It seems this flag is only for gcc and
yields a warning under OSX Mavericks:
clang: warning: argument unused during compilation: '-ansi'
No functional change.
Here MSVC is worried that
StepAttacksBB[PAWN][psq]
could overflow, so change psq initialization
to clarify psq is always less than 64.
No functional change.
Don't take the split lock if we don't have
available slaves (about 30-40% of times).
This new condition allows to retire the now
redundant one on number of threads.
No functional change.
Split previous patch in 2 steps: first remove
the MOVE_NULL hack, then retire nullChild.
The first step is a prerequisite
for second one and affects bench.
The second step (next patch) just removes nullChild
without affecting bench.
bench: 8205159
Are broken for big-endian case and
I have verified with MSVC 2013 Premium
bench is correct and there is no
miscompilation, so the main reason
to change the original code drops.
No functional change.
Another attempt at retiring current asymmetric
king evaluation and use a much simpler symmetric
one. As a good side effect we can avoid recalculating
eval after a null move.
Tested in no-regression mode and passed
STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 21580 W: 3752 L: 3632 D: 14196
LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 18253 W: 2593 L: 2469 D: 13191
And a LTC regression test against SF DD to
verify we don't have regression against
weaker engines due to some kind of 'contempt'
effect:
ELO: 54.69 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 11072 L: 4827 D: 24101
bench: 8205159
Before it was working by accident in case of
see_sign() and failing with see() due to how
castle moves are coded (king captures the rook).
Better to explicitly filter out castling moves
and use see() without any surprise/trick.
No functional case.
Book handling belongs to GUI, we kept this code
for historical reasons, but nowdays there is
really no need of this old, (mostly) unused
and especially incorrect designed functionality.
It is up to the GUI to choose the book (far easier for
the user) and to select the book parameters. In no
place, including fishtest, TCEC, rating lists, etc.
the "own book" is used, moreover currently SF is
released without any book and even if in the future we
bundle a book in the release package, it will be the GUI
that will take care of it.
This corrects a wrong design decision that Galurung
and later Stockfish inherited from what was common
practice many yeas ago.
No functional change.
There is really little that user can achieve (apart
from a weakened engine) tweaking these parameters
that are already tuned and have no immediate or visible
effect.
So better do not expose them to the user and avoid the
typical "What is the best setup for my machine?" kind of
question (by far the most common, by far the most useless).
No functional change.
To show perft numbers for each move. Just
use 'divide' instead of 'perft', for instance:
position startpos moves e2e4 e7e5
divide 4
Inspired by Ronald de Man.
No functional change.
Retire current asymmetric king evaluation
and use a much simpler symmetric one.
As a side effect retire the infamous
'Aggressiveness' and 'Cowardice' UCI
options.
Tested in no-regression mode,
Passed both STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33855 W: 5863 L: 5764 D: 22228
And LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40571 W: 5852 L: 5760 D: 28959
bench: 8321835
At root we start counting plies from 1,
instead pv[] array starts from 0. So
the variable 'ply' we use in extract_pv_from_tt
to index pv[] is misnamed, indeed it is
not the real ply, but ply-1.
The fix is to leave ply name in extract_pv_from_tt
but assign it the correct start value and
consequentely change all the references to pv[].
Instead in insert_pv_in_tt it's simpler to rename
the misnamed 'ply' in 'idx'.
The off-by-one bug was unhidden when trying to use
'ply' for what it should have been, for instance in
this position:
position fen 8/6R1/8/3k4/8/8/8/2K5 w - - 0 1
at depth 24 mate line is erroneusly truncated due
to value_from_tt() using the wrong ply.
Spotted by Ronald de Man.
bench: 8732553
If razoring conditions are satisfied and
depth is low, then directly drop in qsearch.
Passed both STC
LLR: 2.98 (-2.94,2.94) [-1.50,4.50]
Total: 12914 W: 2345 L: 2208 D: 8361
And LTC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 50600 W: 7548 L: 7230 D: 35822
bench: 8739659
When there aren't legal moves after
a search, instead of returning imediately,
save bestValue in TT as in the usual case.
There is really no reason to special case
this one.
With this patch is fully fixed (again) follwing
position:
7k/6p1/6B1/5K1P/8/8/8/8 w - - 0 1
Also in SMP case.
bench: 8802105
After last Joona's patch there is no measurable
difference between the option set or unset.
Tested by Andreas Strangmüller with 16 threads
on his Dual Opteron 6376.
After 5000 games at 15+0.05 the result is:
1 Stockfish_14050822_T16_on : 3003 5000 (+849,=3396,-755), 50.9 %
2 Stockfish_14050822_T16_off : 2997 5000 (+755,=3396,-849), 49.1 %
bench: 880215
Instead of waiting to be allocated, actively search
for another split point to join when finishes its
search. Also modify split conditions.
This patch has been tested with 7 threads SMP and
passed both STC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2885 W: 519 L: 410 D: 1956
And a reduced-LTC at 25+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 4401 W: 684 L: 566 D: 3151
Was then retested against regression in 3 thread case
at standard LTC of 60+0.05:
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 40809 W: 5446 L: 5406 D: 29957
bench: 8802105
On a final fixed game number test it failed
to prove better than standard version.
STC 15+0.05
ELO: -0.86 +-1.7 (95%) LOS: 15.8%
Total: 57578 W: 10070 L: 10213 D: 37295
bench: 8802105
Unfortunatly we have a slow down that causes
a regression in STC with no-regression mode:
LLR: -2.96 (-2.94,2.94) [-3.00,1.00]
Total: 22454 W: 3836 L: 4029 D: 14589
bench: 8678654
The bug was found to be elsewhere. This version
is correct and also is able to detect as draw
positions like:
8/8/5b2/8/8/4k1p1/6P1/5K2 b - - 6 133
bench: 8678654
Position is win also if strong side has a bishop
and a knight (plus other material, otherwise
KBNK would be triggered instead of KXK).
This fixes a subtle bug where a search on position
k7/8/8/8/8/P7/PB6/K7 b - - 6 1
Instead of returning a draw score, suddendly returns
a big score. This happens because at one point in
search we reach this position:
8/Pk6/8/8/8/4B3/P7/K7 w - - 3 8
Where white can promote. In case of rook promotion (and also in case of
queen promotion) the resutling position gets a huge static eval that is
above VALUE_KNOWN_WIN (from the point of view of white). So for rook
promotion it is
&& futilityBase > -VALUE_KNOWN_WIN
that prevents futility pruning in qsearch. (Removing this condition indeed
lets the problem occur). Raising the static eval for K+B+N+X v K to a value
higher than VALUE_KNOWN_WIN fixes this particular problem without having to
introduce an extra futility pruning condition in qsearch.
I just checked and it seems K+R v K, K+2B v K and even K+B+N v K already get
a huge static eval. Why not K+B+N+P v K?
I think this fix corrects an oversight. There is special code for KBNK, but
KBNXK is handled by KXK, so the test for sufficient material should also test
for B+N.
bench: 8678654
In the (rare) cases when the two conditions
are true, then fully check again with a slow
but correct MoveList<LEGAL>(pos).size().
This is able to detect false positives like
this one:
8/8/8/Q7/5k1p/5P2/4KP2/8 b - - 0 17
When we have a possible simple pawn push that
is not stored in attacks[] array. Because the
third condition triggers very rarely, even if
it is slow, it does not alters in a measurable
way the average speed of the engine.
bench: 8678654
Currently a stealmate position is misevaluated
in a negative/positive score, this leads qsearch(),
that does not detects stealmates too, to return the
wrong score and this yields to some kind of endgames
to be completely misevaluated.
With this patch is fully fixed follwing position
7k/6p1/6B1/5K1P/8/8/8/8 w - - 0 1
Also in SMP case.
Correct root cause analysys by Ronald de Man.
bench: 8678654
After reverting to the original Tord's
endgame, a search on position
7k/6p1/6B1/5K1P/8/8/8/8 w - - 0 1
Reports, correctly, a draw score instead of
an advantage for white.
Issue reported by Uri Blass.
bench: 8678654
Remove the optimization for Intel, is not
standard and can break at any time, moreover
our release build is not done with Intel C++
anymore so we don't need to sqeeze the extra
speed out from this compiler.
No functional change.
If we return from split with a stale value
due to a stop or a cutoff upstream occurred,
then we exit moves loop and save a stale value
in TT before returning search().
This patch, from Joona, fixes this.
bench: 8678654
We can never have bestValue == -VALUE_INFINITE at
the end of move loop because if no legal move exists
we detect it with previous condition on !moveCount,
if a legal move exists we never prune it due to
futility pruning condition:
bestValue > VALUE_MATED_IN_MAX_PLY
So this code never executes, as I have also verified
directly.
Issue reported by Joona.
No functional change.
Intel compiler is very picky:
"error: this operation on an enumerated type requires an
applicable user-defined operator function"
Reported by Tony Gaor.
No functional change.
Put the division at the end to reduce
rounding errors. This alters the bench
due to different rounding errors, but
should not alter ELO in any way.
bench: 7615217
This apparentely silly tweak allows
to speed up the bench by almost 3%.
Not clear why, repeating with perft,
the speed up vanishes.
Suggested by Jonathan Calovski.
No functional change.
Believed to be a speed optimization as benched
on Windows with bench realtime affinity 0x1 deleting
highest and lowest runs:
Base Test
1549259 1608202
1538115 1583934
1543168 1556938
1536365 1554179
1533026 1582010
Signature remains unchanged and gives anywhere from 1-2% nps
boost in analysis depending on number of cores used.
No functional change.
This is a very discussed patch with many
argumentations pro and against. The fact is
it passed both STC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16305 W: 3001 L: 2855 D: 10449
And LTC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 34273 W: 5180 L: 4931 D: 24162
Although it is true that a correct test should
include foreign engines, we commit it anyhow so
people can test it out in the wild, under broader
conditions.
bench: 7384368
Idea from Lyudmil Tsvetkov.
The value seems to be raised a bit abruptly, but as
Gary said, a blocked pawn on the sixth rank has been
instrumental in limiting king mobility in multiple
losses that I've seen from SF. A blocked pawn on fifth
rank is much less serious on the king safety impact.
Passed both STC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 14551 W: 2750 L: 2607 D: 9194
and LTC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 43595 W: 6917 L: 6618 D: 30060
And even a retest at 60" fixed games 40K
ELO: 1.79 +-1.9 (95%) LOS: 97.0%
Total: 39889 W: 6018 L: 5813 D: 28058
bench: 7154916
Adding BMI1 allows the compiler to use _blsr_u64
automatically (the advertised 0.3% speed gain).
I verified that the compiler does not use this
instruction with the -mbmi2 flag only. Also, all
processors supporting BMI2 is also supporting BMI1.
No functional change
Prefer
file_of(s) < file_of(ksq)
to the inidrect
file_of(ksq) < FILE_E
To evaluate if semiopen side to check is the left side.
Also other small touches while there.
No functional change.
Right now the Makefile is cluttered with OS X equivalents
of all the x86 targets. We can get rid of all of them and
just check UNAME against "Darwin" for the few OS X-specific
things we need to do.
We also disable Clang LTO when using BMI2 instructions. For
some reason, LLVM cannot find the PEXT instruction when using
LTO. I don't know why, but disabling LTO for BMI2 fixes it.
No functional change.
Intel Haswell and newer CPUs can calculate sliders
attacks using special PEXT asm instructions instead
of magic bitboards. This gives a +3% speed up.
To enable it just compile with ARCH=x86-64-bmi2
No functional change.
Retire software pext and introduce hardware
call when USE_PEXT is defined during compilation.
This is a full complete implementation of sliding
attacks using PEXT.
No functional change.
Reshuffle functions to define them in reverse
calling order (C style).
This allow us to define templates before they are
used. Currently it is not like this, for instance
evaluate_pieces is defined after do_evaluate that
calls it. This happens to work for some strange
reason (two phase lookup?) but we want to avoid
code that works 'by magic'.
As a nice side-effect we can now remove the function
prototypes.
No functional change.
This is more consistent with what other engines are doing.
Often people thinks that SF's scores are overblown. In the
end, it just boils down to the arbitrary way of rescaling them.
No functional change.
I have noticed that increasing the bench depth produces
progressively smaller and slightly faster executables at
the cost of longer compile times. Also using bench "time"
instead of "depth" seems to produce slightly smaller/faster
executables given comparable compile times.
I have made a new Makefile that generates smaller and
about 1% to 2% faster profile executables at only a
little extra compile time. On my mobile 2GHz i7 a
full profile build time goes from 3'48" to 4'13" and
the exe goes down by 5% from 416,310 bytes to 395,567
bytes.
No functional change.
Small simplification.
Passed SPRT(-3,1) both at STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 17051 W: 3132 L: 3005 D: 10914
and LTC:
LLR: 4.55 (-2.94,2.94) [-3.00,1.00]
Total: 24890 W: 3842 L: 3646 D: 17402
The rationale behind this is that I've never managed to add a
Queen on 7th rank bonus in DiscoCheck, because it never showed
to be positive (evne slightly) in testing. The only thing that
worked is Rook on 7th rank.
In terms of SF code, it seemed natural to group it with QueenOnPawn
as well as those are done together. I know you're against groupping
in general, but when it comes to non regression test, you are being
more conservative by groupping. If the group passes SPRT(-3,1) it's
safer to commit, than test every component in SPRT(-3,1) and end up
with the risk of commiting several -1 elo regression instead of just
one -1 elo regression.
In chess terms, perhaps it's just easier to manouver a Queen (which
can more also diagonaly) than a Rook. Therefore you can let the search
do its job without needing eval ad-hoc terms to guide it. For the Rook
which takes more moves to manouver such eval terms can be (marginally)
useful.
bench: 7473314
We chose this instead of negamax sign convention
(ie. from the point of view of the side to move)
because it is more in line to how the eval
table is presented.
Also some tweak to formatting while there.
No functional change.
In some legal positions like this one:
R6R/3Q4/1Q4Q1/4Q3/2Q4Q/Q4Q2/Np1Q4/kB1N1KB1 b -- 0 1
We can have a very high score, in this case 30177 and 29267
for midgame and endgame respectively, and because
VALUE_INFINITE = 30001 we have an assert in interpolate()
Midgame and endgame scores are stored in 16 bit signed integers
so we can rise VALUE_INFINITE a little bit. This does not fix
the possibility of overflow in general case, just makes the
condition more difficult to trigger and anyhow better uses all
the score width.
Raising VALUE_INFINITE to 32000 seems to fix the problem for this
particular case.
No functional change.
Tested directly at LTC because previous long
test series on this topic shows it is TC dependant.
Tested with no-regression mode because gets rid of
an ugly and ad-hoc rule.
Test at LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 67918 W: 10590 L: 10541 D: 46787
bench: 7926803
Here the new idea is to link pinned pieces
with king safety.
Passed both STC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 10047 W: 1867 L: 1737 D: 6443
And LTC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 10419 W: 1692 L: 1543 D: 7184
bench: 8325087
We add a penalty for each pawn which is not protected by another pawn
of the same color.
Passed both short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 12107 W: 2411 L: 2272 D: 7424
And long TC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 9204 W: 1605 L: 1458 D: 6141
bench: 7682173
We want all the UCI options are printed in the order in which are
assigned, so we use an index that, depending on Options.size(),
increases after each option is added to the map. The problem is
that, for instance, in the first assignment:
o["Write Debug Log"] = Option(false, on_logger);
Options.size() can value 0 or 1 according if the l-value (that
increments the size) has been evaluated after or before the
r-value (that uses the size value).
The culprit is that assignment operator in C++ is not a
sequence point:
http://en.wikipedia.org/wiki/Sequence_point
(Note: to be nitpick here we actually use std::map::operator=()
that being a function can evaluate its arguments in any order)
So there is no guarantee on what term is evaluated first and
behavior is undefined by standard in this case. The net result
is that in case r-value is evaluated after l-value the last
idx is not size() - 1, but size() and in the printing loop
we miss the last option!
Bug was there since ages but only recently has been exposed by
the removal of UCI_Analyze option so that the last one becomes
UCI_Chess960 and when it is missing engine cannot play anymore
Chess960.
The fix is trivial (although a bit hacky): just increase the
last loop index.
Reported by Eric Mullins that found it on an ARM and MIPS
platforms with gcc 4.7
No functional change.
Thanks to std::bitset we can easily increase
the limit of active threads above 64.
Thanks to Lucas Braesch for pointing at the
correct solution of using std::bitset.
No functional change.
Using memset on a std::vector is undefined behavior,
so manually init all the data memebers of LimitsType.
Bug intorduced in 41641e3b1e
No functional change.
Because we test for available slaves before
entering split(), we almost always allocate a
slave, only in the rare case of a race (less
then 2% of cases) this is not true, but to
special case this occurrence is not worth
the added complexity.
bench: 7451319
Split delta value in aspiration window so that when
search depth is less than 24 a smaller delta value
is used. The idea is that the search is likely to
be more accurate at lower depths and so we can exclude
more possibilities, 25% to be exact.
Passed STC
LLR: 2.96 (-2.94, 2.94) [-1.50, 4.50]
Total: 20430 W: 3775 L: 3618 D: 13037
And LTC
LLR: 2.96 (-2.94, 2.94) [0.00, 6.00]
Total: 5032 W: 839 L: 715 D: 3478
Bench: 7451319
During endgame initialization we get the material
hash key of each endgame forging and ad-hoc position
that in same cases is illegal (leaves teh king under
capture). This is not a problem for the material key,
but rises an assert when SF is run in debug mode with
'testKingCapture' set in pos_is_ok().
So rewrite the code to always produce legal positions.
No functional change.
Big simplification of pawn move check.
Code has been tested with a brute force approach: for
every position reached during a bench search, the function
has been called for each combinations of Move(from, to)
and verified the result is the same of old code.
Actually this function is very critical becuase is the
one that ensures corrupted TT moves are discarded, so
to properly test it a simple bench is not enough.
Verified also speed is not changed.
No functional chnage.
It has been obsoleted out already some time ago
and currently there is no point in changing eval
score according to if we are in game or analyzing.
So retire the option.
No functional change.
Try to avoid repetition draws at early midgame,
this should give an edge against weaker opponents
and reduce draw rate.
Tested for regressions with SPRT[-3, 1] and
passed both short TC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 68498 W: 12928 L: 12891 D: 42679
And long TC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40212 W: 6386 L: 6295 D: 27531
bench: 7990513
When running the following position:
8/kPp5/2P3p1/p1P1p1P1/2PpPp2/3p1p2/3P1P2/5K2 w - - 0 1
An assert is raised at depth 92:
assert(-VALUE_INFINITE <= alpha && alpha < beta && beta <= VALUE_INFINITE);
This is because it happens that beta = 29832,
so rbeta = 30032 that is > VALUE_INFINITE
Bug spotted and analyzed by Uri, fix suggested by Joerg.
Other fixes where possible but this one is pointed
exactly at the source of the bug, so it is the best
from a code documentation point of view.
bench: 8430785
Under some very rare case 100 plies of search
could be not enough. Increasing more could lead
to crashes due to reached stack size limit on
some platforms.
Strongly requested by Uri.
bench: 8430785
Currently king has no material key associated because
it can never happen to find a legal position without
both kings, so there is no need to keep track of it.
The consequence is that a position with only the two
kings has material key set at zero and if the material
hash table is empty any entry will match and this is
wrong.
Normally bug is hidden becuase the checking for a draw
with pos.is_draw() is done earlier than evaluate() call,
so that we never check in gameplay the material key of a
position with two kings.
Nevertheless the bug is there and can be reproduced setting
at startup a position with only two kings and typing
'eval' from prompt.
The fix is very simple: add a random key also for the king.
Also fixed the condition in material.cpp to avoid asserting
when a 'just 2 kings' postion is evaluated.
No functional change.
Actually MultiCut is too different from current scheme.
Note that neither ProbCut is exactly what we do because
we try just a handful of captures instead of all moves,
nevertheless it seems more in line with what we do.
Suggested by Joona.
No functional change.
Makes more sense than returning a draw score. Tested
with reduced MAX_PLY = 30 and passed both short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 17434 W: 3345 L: 3194 D: 10895
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 2610 W: 488 L: 373 D: 1749
With current limit of MAX_PLY = 100 the patch should not
introduce any measurable change, nevertheless is the correct
approach.
Idea of returning eval is from Michel Van den Bergh.
bench: 8430785
Fix small overflow error while converting
magic boosters from right rotate to left rotate,
in particular booster 38 was converted to 4122
instead of the corrcet value 26.
Formula used was:
s1 = original & 63, s2 = (original >> 6) & 63;
new = (64 - s1) | ((64 - s2) << 6);
Instead of:
s1 = original & 63, s2 = (original >> 6) & 63;
new = ((64 - s1) & 63) | (((64 - s2) & 63) << 6);
This has no impact in number of cycles needed, but
just in the resultig number that yields to a rotate
amount bigger than 63.
Spotted by Ehsan Rashid.
No functional change.
Latest master triggers a compiler warning due
to comparing int64_t to uint64_t.
notation.cpp: In Funktion »std::string pretty_pv(Position&, int, Value, int64_t, Move*)«:
notation.cpp:230:30: Warnung: Vergleich zwischen vorzeichenbehafteten und vorzeichenlosen Ganzzahlausdrücken [-Wsign-compare]
This patch should fix it.
No functional change.
When initializing the magic numbers used to
compute sliding attacks, we endless generate a
random and test it as a possible magic.
In the general case this takes a lot of iterations,
but here, insteaad of picking a casual random, we
rotate it a couple of times and generate a number that
we know has a good probability to be a magic candidate.
This is becuase the quantities by which we rotate the
number are known in advance to produce quickly a good
canidate.
The patch, inspired by DON, just moves the shuffle to RKISS
changing the boosters to take in account a left rotation
instead of a right rotation as in the original.
No functional change.
Although does not change ELO level, it seems
verification is useful in many zugzwang positions
as reported by many sources.
So revert this simplification.
bench: 8430785
Another SEE speed up that passed the SPRT short TC test!
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 81337 W: 15060 L: 14745 D: 51532
No functional change.
Actually race conditions do exist in an engine, just
think for a moment to TT concurrent access. Racy code
is not a problem per se, if the consequences are well
known and correctly handled.
In case of TT access we ensure that the TT move is validated
before to be tried, here we just retry the same move in less
that 1 case out of a million: this is totally harmless considering
that very probably the second time the move is tried we get
immediately a TT hit and search quickly returns.
So we simplify the code for no harm.
No fuctional change (in single thread case)
Tested with SPRT in simplification mode [-4.00,0.00],
this ensures that the patch is (very probably) not
a regression.
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 27543 W: 4278 L: 4209 D: 19056
And long TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 39483 W: 7325 L: 7305 D: 24853
bench: 8347121
Hopefully this patch makes the code more:
* Self-documenting: Null search is always a zero window search,
because it is testing for a fail high. It should never be done
on a full window! The current code only works because we don't
do it at PV nodes, and therefore (alpha, beta) = (beta-1, beta):
that's the kind of "clever" trick we should avoid.
* Idiot-proof: If we want to enable null search at PV nodes, all we
need to do now is comment out the !PvNode condition. It's that simple!
In theory, null search should not be done at PV nodes, because PV nodes
should never fail high. But in practice, they DO fail high, because of
aspiration windows, and search inconsistencies, for example. So it makes
sense to keep that flexibility in the code.
No functional change.
Use ralpha instead of rbeta
* rbeta is confusing people. It took THREE attempts to code razoring
at PV nodes correctly in a recent test, because of the rbeta trick.
Unnecessary tricks should be avoided.
* The more correct and self-documenting way of doing this, is to say
that we use a zero window around alpha-margin, not beta-margin.
The fact that, because we only do it at PV nodes, alpha happens to be
beta-1 and that the current stuff with rbeta works, may be correct,
but is confusing.
Remove the misleading and partially erroneous comment about returning
v + margin:
* comments should explain what the code does, not what it could have done.
* this comment is partially wrong in saying that v+margin is "logical",
and that it is "surprising" that is doesn't work.
From a theoretical perspective, at least 3 ways of doing this are equally
defendable:
1/ fail hard: return alpha: The most conservative. We bet that the search
will fail low, but we don't know by how much and don't want to take risks.
2/ aggressive fail soft: return v (what the current code does). This
corresponds to normal fail soft, with the added assumption that we don't
care about the reduction effect (see below point 3/)
3/ conservative fail soft: return v + margin. If the reduced search (qsearch)
gives us a score <= v, we bet that the non reduced search will give us a
score <= v + margin.
* Saying that 2/ is "logical" implies that 1/ and 3/ are not, which is
arguably wrong. Besides, experimental results tell us that 2/ beats 3/,
and that's not something we can argue against: experimental results are
the only trusted metric.
* Also, with the benefit of hindsight, I don't think the fact that 2/ is
better than 3/ is surprising at all. The point is that it is YOUR turn to
move, and you are assuming that by NOT playing (and letting the opponent
capture your hanging pieces in QS) you cannot generally GAIN razor_margin(depth).
No functional change.
With some positions like
8/8/8/2p2K2/1pp5/br1p1b2/2p2r2/qqkqq3 w - -
The eval score is higher than VALUE_INFINITE because
is the sum of VALUE_KNOWN_WIN plus a big material
advantage. This leads to an assert. Here are the
steps to reproduce:
Compile SF with debug=yes then do
./stockfish
position fen 8/8/8/2p2K2/1pp5/br1p1b2/2p2r2/qqkqq3 w - -
go depth 1
This patch fixes the issue in this case, but do exsist
other positions for which the patch is not enough and
we will need to limit the eval score to be sure not
overflow the limit.
Note that is not possible to increase the value of
VALUE_INFINITE because should remain within int16_t
type to be stored in a TT entry.
bench: 7356053
Depth is already dependent on the actual value
of ONE_PLY, in particular can be expressed like:
Depth = n * ONE_PLY
And because formula is used to calculate R that is
also dependent on the value of ONE_PLY and can be
expressed like:
R = x * ONE_PLY
We don't want to divide depth by a 'ply' value but
directly by an integer number.
Spotted by sf-x
No functional change.
Instead of a fixed reduction of ONE_PLY, now
Null move dynamic reduction based on value can
grow larger in case we are above beta of a value
much higher then PawnValueMg.
Note that now an eval returning VALUE_KNOWN_WIN
makes null search to drop in qsearch.
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 26141 W: 4871 L: 4699 D: 16571
And long TC:
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 33695 W: 5309 L: 5056 D: 23330
bench: 7356053
Verified there are no hidden bugs and is
actually a speed optimization:
Fixed games at 15+0.05 TC
ELO: 1.72 +-2.9 (95%) LOS: 87.5%
Total: 20000 W: 3741 L: 3642 D: 12617
No functional change
When time remaining is less than Emergency Move Time,
we won't even complete one iteration and engine reports
a stale +M0 score.
To reproduce run "go wtime 10"
info depth 1 seldepth 1 score mate 0 upperbound nodes 2 nps 500 time 4 multipv 1 pv a2a3
info nodes 2 time 4
bestmove a2a3 ponder (none)
This patch fixes the issue.
Tested by Binky at very short TC: 0.05+0.05
ELO: 5.96 +-12.9 (95%) LOS: 81.7%
Total: 1458 W: 394 L: 369 D: 695
And at a bit longer TC:
ELO: 1.56 +-3.7 (95%) LOS: 79.8%
Total: 16511 W: 3983 L: 3909 D: 8619
bench: 7804908
TCEC season 3, which is due to start in a few weeks, just
had its server upgraded to 64GB RAM and will therefore allow
16GB hash to be used per engine.
This is almost the upper limit without changing the
type of size and hashMask. After this we need to
move to uint64_t instead of uint32_t.
No functional change.
Retire KmmKm evaluation function. Instead give a very drawish
scale factor when the material advantage is small and not much
material remains.
Retire NoPawnsSF array. Pawnless endgames without a bishop will
now be scored higher. Pawnless endgames with a bishop pair will
be scored lower. The effect of this is hopefully small.
Consistent results both at short TC (fixed games):
ELO: -0.00 +-2.1 (95%) LOS: 50.0%
Total: 40000 W: 7405 L: 7405 D: 25190
And long TC (fixed games):
ELO: 0.77 +-1.9 (95%) LOS: 78.7%
Total: 39690 W: 6179 L: 6091 D: 27420
bench: 7213723
When we have a fail-high of a quiet move, store it in
a Followupmoves table indexed by the previous move of
the same color (instead of immediate previous move as
is in countermoves case).
Then use this table for quiet moves ordering in the same
way we are already doing with countermoves.
These followup moves will be tried just after countermoves
and before remaining quiet moves.
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10350 W: 1998 L: 1866 D: 6486
And long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 14066 W: 2303 L: 2137 D: 9626
bench: 7205153
This hacky rule allows to get an about right eval out of this position:
r2qk2r/ppp2p2/2npbn2/2b1p3/2P1P1P1/2NB1PPp/PPNP3K/R1BQ1R2 b kq - 0 13
And, more importantly, passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 6239 W: 1249 L: 1127 D: 3863
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 38371 W: 6165 L: 5888 D: 26318
bench: 8183238
As pointed out by Joona, Lucas and otehr people in
the forum, this endgame is not a known, there are many
positions where it takes more than 50 moves to claim the
win and becasue exact rules is not possible better to
retire and allow the search to workout the endgame for us.
bench: 8502826
While editing original Uri's messy patch
I have incorrectly simplified the logic
condition. Here is the correct original
version, as it was tested.
bench: 8502826
Remove the easy move code and add the condition to
play instantly if only one legal move is available.
Verified there is no regression at 60+0.05
ELO: 0.17 +-1.9 (95%) LOS: 57.0%
Total: 40000 W: 6397 L: 6377 D: 27226
bench: 8502826
If we are still at first move, without a fail-low and
current iteration is taking too long to complete then
stop the search.
Passed short TC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 26030 W: 4959 L: 4785 D: 16286
Long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 18019 W: 2936 L: 2752 D: 12331
And performed well at 40/30
ELO: 4.33 +-2.8 (95%) LOS: 99.9%
Total: 20000 W: 3480 L: 3231 D: 13289
bench: 8502826
First tested with 50K games at very short TC of 5+0.05
ELO: 3.11 +-2.0 (95%) LOS: 99.9%
Total: 49665 W: 10941 L: 10497 D: 28227
Then retested with usual SPRT at short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16875 W: 3198 L: 3049 D: 10628
And at long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 5890 W: 985 L: 857 D: 4048
bench: 7800379
In case ply is very high, function will round
to zero (although mathematically it is always
bigger than zero). On my system this happens at
movenumber 6661.
Although 6661 moves in a game is, of course,
probably impossible, for safety and to be locally
consistent makes sense to ensure returned value
is positive.
Non functional change.
Function move_importance() is already always
positive, so we don't need to add a constant
term to ensure it.
Becuase move_importance() is used to calculate
ratios of a linear combination (as explained in
previous patch), result is not affected. I have
also verified it directly.
No functional change.
Drop a useless parameter. This works because ratio1 and ratio2
are ratios of linear combinations of thisMoveImportance and
otherMovesImportance and so the yscale cancels out.
Therefore the values of ratio1 and ratio2 are independent
of yscale and yscale can be retired.
The same applies to yshift, but here we want to ensure
move_importance() > 0, so directly hard-code this safety
guard in function definition.
Actually there are some small differences due to rounding errors
and usually are at most few millisecond, that's means below 1% of
returned time, apart from very short intervals in which a difference
of just 1 msec can raise to 2-3% of total available time.
No functional change.
The flag raises also in case of a pawn duo, i.e.
when we have two adjacent pawns on the same rank,
and not only in case of a chain, i.e. when the two
pawns are on a diagonal line.
See this for a reference:
http://en.wikipedia.org/wiki/Connected_pawns
Renaming suggested by Ralph.
No functional change.
Use a skew-logistic function to replace the
MoveImportance[] array.
Verified it does not regress at fixed number
of games both at short TC:
LLR: -2.91 (-2.94,2.94) [-1.50,4.50]
Total: 39457 W: 7539 L: 7538 D: 24380
And long TC:
ELO: -0.49 +-1.9 (95%) LOS: 31.0%
Total: 39358 W: 6135 L: 6190 D: 27033
bench: 7335588
Passed short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 18331 W: 3608 L: 3453 D: 11270
And scored above 50% on a very long test in long TC
LLR: -2.97 (-2.94,2.94) [0.00,6.00]
Total: 51533 W: 8181 L: 8047 D: 35305
bench: 7335588
In endgame it's better to have pawns on both wings.
So give a bonus according to file distance between left
and right outermost pawns.
Passed both short TC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 39073 W: 7749 L: 7536 D: 23788
And long TC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 6149 W: 1040 L: 910 D: 4199
bench: 7665034
Reduce eval discontinuity becuase now we kick in
king safety evaluation in many more cases.
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 8708 W: 1742 L: 1613 D: 5353
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 6743 W: 1122 L: 990 D: 4631
bench: 6835416
Tighter lower bound for pawn attacks so to
activate king safety in some cases like here:
6k1/2B3p1/2Pp1p2/2nPp3/2Q1P2K/P2n1qP1/R6P/1R6 w
Original patch by Chris, further simplified by
Jörg Oster.
Passed both short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 30171 W: 5887 L: 5700 D: 18584
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 20706 W: 3402 L: 3204 D: 14100
bench: 7607562
Add a bonus according if the attacking
pieces are minor or major.
Passed both short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 13142 W: 2625 L: 2483 D: 8034
And long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 18059 W: 3031 L: 2844 D: 12184
bench: 7425809
A great simplification that shows no regression
and it seems even a bit scalable.
Tested with fixed number of games:
Short TC
ELO: 0.60 +-2.1 (95%) LOS: 71.1%
Total: 39554 W: 7477 L: 7409 D: 24668
Long TC
ELO: 2.97 +-2.0 (95%) LOS: 99.8%
Total: 36424 W: 5894 L: 5583 D: 24947
bench: 8184352
Change updating rule after a TT hit to match
the same one at the end of the search.
Small change in functionality, but we want to
have uniform rules in the code.
bench: 7767864
We already update killers so it is natural to extend to
history and counter move too.
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 52690 W: 9955 L: 9712 D: 33023
And long TC
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 5555 W: 935 L: 808 D: 3812
bench: 7876473
After a fail high in LMR, if reduction is very high do
a research at lower depth before teh full depth one.
Chances are that the re-search will fail low and the
full depth one is skipped.
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 11363 W: 2204 L: 2069 D: 7090
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 7292 W: 1195 L: 1061 D: 5036
bench: 7869223
Since all ENPASSANT moves are now considered dangerous, this
change of order should give a slight speedup.
Also simplify futilityValue formula.
No functional change.
We avoid to use an ad-hoc table at the cost of a
relative_rank() call in advanced_pawn_push().
On my 32 bit system it is even slightly faster (on 64bit
may be different). This is the speed in nps alternating
old and new bench runs:
new
368890
368825
369972
old
367798
367635
368026
No functional change.
Instead of a passed pawn now we just require the pawn to
be in the opponent camp to be considered a dangerous
move. Added some renaming to reflect the change.
Passed both short TC test
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10358 W: 2033 L: 1900 D: 6425
And long TC
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 21459 W: 3486 L: 3286 D: 14687
bench: 8322172
To align to same named Position function and
avoid using std::cout directly.
Also remove some stale <iostream> include while
there.
No functional change.
Add a Mac SSE4.2 target. Also change the Mac OS X minimum version to
10.6. Rationale: 97% of Macs run at least 10.6, version 10.9 is now
free, and using 10.6 as the minimum version gives a small 5% boost in
benchmark speed over versions using 10.0 as the minimum version.
Finally, enable Clang’s Link Time Optimization when compiling for the
Mac.
No functional change.
An old idea retested at SPRT(0, 3) with 60+0.05 TC:
LLR: 2.95 (-2.94,2.94) [0.00,3.00]
Total: 98872 W: 15549 L: 15123 D: 68200
This is a very small elo increase patch so it really
stresses the limits of fishtest.
bench: 8596156
It seems to intorduce a regression when tested
with 3 threads at 15+0.05:
ELO: -2.26 +-2.2 (95%) LOS: 2.4%
Total: 30000 W: 4813 L: 5008 D: 20179
bench: 8331357
Tested setting FakeSplit to true and running
./stockfish bench 128 2
There is a different signature with and without
the patch so it affects functionality but
only in SMP case.
bench: 8331357
SMP case is very tricky and raises an assert in stage_moves():
assert(stage == KILLERS_S1 || stage == QUIETS_1_S1 || stage == QUIETS_2_S1)
So rewrite the code to just return moves[] when we are sure
we are in quiet moves stages.
Also rename stage_moves to quiet_moves to reflect that.
No functional change (but needs testing in SMP case)
Use MovePicker moves[] to access already tried
quiet moves. A bit of care shall be taken
to avoid calling stage_moves() when we are still
at ttMove stage, because moves are yet to be
generated. Actually our staging move generation
makes this code a bit more tricky than what I'd
like, but removing an ausiliary redundant
array like quietsSearched[] is a good thing.
Idea by DiscoCheck
bench: 9355734
Use the newly introduced LineBB[] to simplify this
super hot-path function.
Verified with perft we don't have any speed regression, although
the number of squares removed is less than before in case of
contact check.
Insipred by DiscoCheck implementation.
Perft numbers are the same, but we have an harmless functional
change due to reorder of moves, because now some illegal moves
are no more detected at generation time, but in the search.
bench: 8331357
This seems a die hard idea :-)
Passed both short TC
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 17485 W: 3307 L: 3156 D: 11022
And long TC
LLR: 2.97 (-2.94,2.94) [0.00,6.00]
Total: 38181 W: 6002 L: 5729 D: 26450
bench: 8659830
Actually, it is not used, as both arrays have the
same values. Some local tests in either direction
showed no improvement.
Also some minor corrections in the comments.
No functional change.
Previously some squares could be "incorrectly" awarded
to a pinned piece.
e.g. in 3k4/1q6/3b4/3Q4/8/5K2/B7/8 b - - 0 1 the black
bishop get 4 squares too many and the white queen gets 6.
Passed both short TC.
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 4871 W: 934 L: 817 D: 3120
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 38968 W: 6113 L: 5837 D: 27018
bench: 9282549
But compensate by reducing rook and queen
value by 53 = (160 / 3)
Material imbalances are affected as follows:
Red. Major Rook Queen Total
QRR +160 -2*53 -53 +1
QR +160 -53 -53 +54
RR +160 -2*53 0 +54
R 0 -53 0 -53
Q 0 0 -53 -53
so that the imbalance changes by at most 54 + 53 = 107 units.
This corresponds to appromximately 3.5cp in the final evaluation.
Verified with fixed number 40000 games at both short
and long TC it does not regress.
Short TC 15+0.05
ELO: 1.93 +-2.1 (95%) LOS: 96.6%
Total: 40000 W: 7520 L: 7298 D: 25182
Long TC 60+0.05
ELO: -0.33 +-1.9 (95%) LOS: 36.5%
Total: 39663 W: 6067 L: 6105 D: 27491
bench: 6703846
As, Gary (that analyzed the bug) says:
SF does not print a PV when the original best move fails low,
we hit our time allowance, and stop the search. The output from
the SF search is below. It was failing low on Ne1 at depth 34.
Then, we get bestmove Qd3, but no PV change.
info depth 34 seldepth 45 score cp 38 upperbound nodes 483484489 nps 15464575 time 31264 multipv 1 pv f3e1 h5h4 e1d3 h4g3 f2g3 a6f6 f1f6 e7f6 d1a4 f6e7 a1f1 d8f8 a4b3 b7b6 b3c2 f7f6 c2a4 h3g5 b2b3 g5f7 a4c6 f7d6 h1g2 f6f5 e4f5 d6f5
info depth 34 seldepth 45 score cp 38 upperbound nodes 483484489 nps 15464575 time 31264 multipv 1 pv f3e1 h5h4 e1d3 h4g3 f2g3 a6f6 f1f6 e7f6 d1a4 f6e7 a1f1 d8f8 a4b3 b7b6 b3c2 f7f6 c2a4 h3g5 b2b3 g5f7 a4c6 f7d6 h1g2 f6f5 e4f5 d6f5
info depth 34 seldepth 47 score cp 30 upperbound nodes 2112334132 nps 17255517 time 122415 multipv 1 pv f3e1 h5h4 d1a4 a6f6 e1d3 d8f8 a4c2 h4g3 f2g3 f6f1 a1f1 h7g8 b2b3 f7f6 a3a4 b7b6
info depth 34 seldepth 47 score cp 30 upperbound nodes 2112334132 nps 17255517 time 122415 multipv 1 pv f3e1 h5h4 d1a4 a6f6 e1d3 d8f8 a4c2 h4g3 f2g3 f6f1 a1f1 h7g8 b2b3 f7f6 a3a4 b7b6
info nodes 18235667001 time 969824
bestmove e2d3 ponder c8d7
Looking at the code, if we hit Signals.stop, we return from id_loop
before printing any PV. It is possible for us to have resorted the
RootMove list though, which will change the move that is actually
played.
No functional change.
1/ eval margin and gains removed:
16bit are now free on TT entries, due to the removal of eval margin. may be useful
in the future :) gains removed: use instead by Value(128). search() and qsearch()
are now consistent in this regard.
2/ futility_margin()
linear formula instead of complex (log(depth), movecount) formula.
3/ unify pre & post futility pruning
pre futility pruning used depth < 7 plies, while post futility pruning used
depth < 4 plies. Now it's always depth < 7.
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench 7243575
1/ eval margin and gains removed:
- gains removed by Value(128): search() and qsearch() now behave consistently!
2/ futility_margin()
- testing showed that there is no added value in this weird (log(depth), movecount)
formula, and a much simpler linear formula is just as good. In fact, it is most
likely better, as it is not yet optimally tuned.
- the new simplified formula also means we get rid of FutilityMargins[], its
initialization code, and more importantly ss->futilityMoveCount, and the hacky
code that updates it throughout the search().
- the current formula gives negative futility margins, and there is a hidden interaction
between the move coutn pruning formula and the futility margin one: what happens is
that MCP is supposed to be triggered before we use the non-sensical negative futility
margins.
3/ unify pre & post futility pruning
- pre futility pruning (what SF calls value based pruning) used depth < 7 plies,
while post futility pruning (what SF calls static null move pruning) used depth < 4 plies.
- also the condition depth < 7 in pre futility pruning was not obvious, and it seemd
to be depth < 16 (futility_margin() returns an infinite value when depth >= 7).
Tested with fixed number of games both at short TC:
ELO: 0.82 +-2.1 (95%) LOS: 77.3%
Total: 40000 W: 7939 L: 7845 D: 24216
And long TC
ELO: 0.59 +-2.0 (95%) LOS: 71.9%
Total: 40000 W: 6876 L: 6808 D: 26316
bench: 10206576
RedundantRook and RedundantQueen replaced by simple
variable RedundantMajor. Also the SameColor coefficient
for Queen<->Queen has been set by definition to 0.
The remaining 5 parameters:
LinearCoefficients[ROOK]
LinearCoefficients[QUEEN]
QuadraticCoefficientsSameColor[ROOK][ROOK]
QuadraticCoefficientsSameColor[QUEEN][ROOK]
RedundantMajor
are sufficient to equate the material imbalances for the
5 common material configurations of R, RR, Q, QR and QRR
to any desired values simultaneously.
With the chosen parameters there should be no functional
change unless one side has more than 2 rooks or more
than 1 queen. For example bench from the start position
using the commands:
./stockfish
go depth 16
produces identical output except for one extra node
in the last iteration.
bench: 8198094
Coefficients for Bishop<->BishopPair and Bishop<->Bishop
are also pretty much redundant. By altering the values
in LinearCoefficients[] these coefficients can be zeroed
without changing the imbalance calculations in any position
with less than 3 bishops for one side.
bench: 7995098
First coefficient in the SameColor array does an
equivalent job when folded into the LinearCoefficients
array.
All of the diagonal terms in the OppositeColor array
are redundant due to cancellation.
No functional change.
In case we find a very good move after a
troubled start, we don't return immediately
anymore.
Tested directly at long TC where it passed:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 13910 W: 2397 L: 2228 D: 9285
bench: 7995098
And remove a complex (and broken) formula.
Indeed previous code was broken in case of TC with big
time increments where available_time() was too similar
to total time yielding to many time losses, so for instance:
go wtime 2600 winc 2600
info nodes 4432770 time 2601 <-- time forfeit!
maximum search time = 2530 ms
available_time = 2300 ms
For a reference and further details see:
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/dCPAvQDcm2E
Speed tested with bench disabling timer alltogheter vs timer set at
max resolution, showed we have no speed regressions both in single
core and when using all physical cores.
No functional change.
A combo of two patches that failed SPRT with score
higher than 50% but togheter they succeed:
SPRT at 60+0.05
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 7312 W: 1276 L: 1139 D: 4897
bench: 8029334
If the game got late enough that move_importance(currentPly) * slowMover / 100
rounds to 0, then we ended up dividing 0 by 0 when only looking 1 move ahead.
This apparently caused the search to almost immediately abort and Stockfish
would blunder in long games. So convert thisMoveImportance to a double.
No functional change.
This seems more a material imbalance topic,
anyhow test is good and so patch is applied
as is.
Passed both short TC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 17391 W: 3548 L: 3393 D: 10450
And long TC:
LLR: 3.00 (-2.94,2.94) [0.00,6.00]
Total: 34660 W: 5972 L: 5700 D: 22988
bench: 8291883
Dumb down a bit the code and trade some possible
speed (but this is far from hot path anyhow) for
some added readability for the layman.
No functional change.
The normalising transformation is computed all at
once by the helper function get_flip_sq and then
applied immediately to the relevant squares as soon
as they are loaded from the position class.
bench: 8350690
New formula mathces the old formula until d = 45
Test code:
int main() {
for(int d=1; d<=45; d++)
{
int a = int(log(double(d * d) / 2) / log(2.0) + 1.001);
int b = int(2.9 * log(double(d)));
if (a != b) std::cout << d << std::endl;
}
return 0;
}
bench: 8455956
This is the first chain bonus version
from Ralph that also passed both
Short TC:
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 23460 W: 4727 L: 4556 D: 14177
And long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 31858 W: 5497 L: 5240 D: 21121
And performed better against current
committed version, always at 60secs:
LLR: -2.94 (-2.94,2.94) [-3.00,3.00]
Total: 26301 W: 4477 L: 4580 D: 17244
This test was done by Leonid.
bench: 8455956
After 40K games at 60 secs, result is still
not clear, but not a regression against SF 4
After
ELO: 50.11 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 10547 L: 4817 D: 24636
Before
ELO: 49.51 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 10483 L: 4821 D: 24696
So re-apply the patch to avoid to
special-case this one.
bench: 7403882
It was never updated !
Currently it only affects evaluate_passed_pawns()
and in particularly the rule to increase the bonus
if we have more non-pawn pieces. We could simply use
popcount() instead and avoid the little slowdown
in put_piece() and remove_piece(), but this would
leave a very subtle and tricky hole where people
are forced to remember that pos.count<ALL_PIECES>()
does not work. This is not obvious and so dangerous.
Thanks to Ronald de Man for spotting this.
bench: 7931424
Due to a strange issue (bug?) the ternary
operator does not return a BitCountType for
icc, so revert to the expression.
The same patch was already applied in
9749f1f14c
Thanks to NssY Wanyonyi for pointing out
this.
No functional change.
This reverts commit 4bc2374450 for
two reasons.
First regression testing shows almost equal
score:
Before the patch:
ELO: 49.75 +-2.5 (95%) LOS: 100.0%
Total: 27205 W: 7113 L: 3244 D: 16848
After the patch:
ELO: 48.87 +-2.9 (95%) LOS: 100.0%
Total: 20860 W: 5478 L: 2563 D: 12819
Second, and more sensible to me, this patch
increases safe check bonuses to 4 times their
original value (!) and considering:
- Values were already well tuned
- Values are highly critical
- King safety is highly critical, very TC
dependent and very difficult to test
- Our testing coverage is partial (self-testing,
blitz times)
I think is better to be safe than sorry and so
I revert the patch.
bench: 8440524
Passed both short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 10466 W: 2087 L: 1953 D: 6426
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 26334 W: 4540 L: 4310 D: 17484
And also proved stronger than a slightly
different patch, also succesful against master:
https://github.com/mcostalba/Stockfish/commit/dc6830a3b4ed12
But losing against current one in a match
at 60secs with SPRT [-3, 3]:
LLR: -2.96 (-2.94,2.94) [-3.00,3.00]
Total: 44484 W: 7360 L: 7463 D: 29661
bench: 9160831
Also the drawing criteria has been slightly loosened.
It now detects a draw if the king is ahead of all the
pawns and on the same file or the adjacent file.
bench: 7700683
This lost position 8/8/3q4/8/5k2/2P1R3/2K2P2/8 w - - 0 1
was previously evaluated as a draw.
The king and rook need to be correctly placed with
respect to the _same_ pawn.
(Note also that the check for the pawn being on RANK_2
in the old version is redundant: it must be on RANK_2 if
it hopes to protect a rook on RANK_3)
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Good both at short TC:
LLR: 2.95 (-2.94,2.94) [-1.50,4.50]
Total: 5448 W: 1133 L: 1012 D: 3303
And at long TC:
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 40509 W: 6836 L: 6541 D: 27132
bench: 7700683
Don't need a struct here. Speed test shows
result is teh same. Moreover RKISS is used
mainly at startup to compute magics, so
prefer to keep it simple...RKISS ;-)
Also some assorted triviality while there.
No functional change.
These two changes go in opposite directions and it
seems that the combination is stronger than original.
Here are the positive tests at various TC:
15+0.05
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 24561 W: 4946 L: 4772 D: 14843
60+0.05
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 15259 W: 2598 L: 2423 D: 10238
40/30
LLR: 2.96 (-2.94,2.94) [-3.00,3.00]
Total: 2570 W: 527 L: 422 D: 1621
Unfortunately there is also a bad result
with one sec time increment that needs
to be further investigated:
12+1
LLR: -2.97 (-2.94,2.94) [-3.00,3.00]
Total: 2694 W: 438 L: 543 D: 1713
bench: 8340585
Rationale:
- Speed of double and float is about the same (not on the hot path anyway)
- Double makes code prettier (no need to write 1.0f, just 1.0)
- Only practical advantage of float is to use less memory, but since we never
store large arrays of double, we don't care.
No functional change.
Increase bench default depth from 12 to 13 and
add 15 new endgame positions to have broader
coverage and also more reliable nps calulcation
used for fishtest framework.
Due to the new endgame positions, where nps is higher,
the total nps is increased of about 15%.
Thanks to Lucas and Jörg for the suggestions.
No functional change, but bench number is now:
bench: 8336338
For some users -stack_size,0x4000 does not work,
so revert for now.
osX 10.6.8
gcc version 4.7.3 (MacPorts gcc47 4.7.3_2)
g++: error: unrecognized command line option '-stack_size,0x4000'
make[2]: *** [stockfish] Error 1
make[1]: *** [gcc-profile-make] Error 2
make: *** [profile-build] Error 2
No functional change.
This reverts commit 800410eef1 and instead increases
stack size.
I went through the old emails with Daylen that reported the
crash issue on Mac OS X and was fixed by 0049d3f337.
It was reported default stack size for a thread in Mac OS X is 8
megabytes while the patch that we are reverting allows to reduce
stack size at max of about 217KB, so the reason for the crash was
only marginal in MAX_MOVES value. On those emails Daylen also
hinted how to increase stack size for Mac OS X to 16MB.
So prefer to increase stack size to 16MB instad of re-inventing
the wheel and do our home grown stack as we did with the patch
that we are now reverting (it will remain anyhow in git history
for documentation purposes).
No functional change.
Unify extensions between PV and not PV nodes
and remove all but check extensions.
This is a simplification so tested at fixed number
of games where proved to not regress.
About 45k games at 15+0.05
ELO: 1.23 +-2.0 (95%) LOS: 88.5%
Total: 45643 W: 9107 L: 8946 D: 27590
About 45k games at 60+0.05
ELO: 1.07 +-1.8 (95%) LOS: 87.8%
Total: 46786 W: 7728 L: 7584 D: 31474
bench: 3172206
If the uci option 'Best Book Move' is set to true the lookup still
returns a move at random instead of the move with the highest
weight.
No functional change.
This should be enough for any legal position, even
the handcrafted ones, like the one presented by Reuven:
1Q5R/4Q1K1/B1Q5/B4Q2/N2Q4/pQ4Q1/pn2Q3/krQ4R w - -
Where currently we crash. This reverts the patch
0049d3f337 of 8/4/2012 where stack
was shrinked due to crashes while in deep analysys.
No functional change.
This greately reduces stack usage and is a
prerequisite for next patch.
Verified with 40K games both in single and SMP
case that there are no regressions.
No functional change.
This is an even safer setup proposed and tested
by Alexandre Meirelles.
Regression testing of 40K games at 10+0.05 show
result is stable both against current master:
ELO: -0.29 +-2.2 (95%) LOS: 39.7%
Total: 40000 W: 8010 L: 8043 D: 23947
and again original master (the one with smallest
time parameters):
ELO: 1.71 +-2.2 (95%) LOS: 93.8%
Total: 40000 W: 8325 L: 8128 D: 23547
Alexandre verified with LittleBlitzer time losses are
greately reduced with this setup:
Games Completed = 2100 of 3000 (Avg game length = 35.745 sec)
Settings = RR/128MB/15000ms+50ms/M 1000cp for 12 moves, D 150 moves/
Time = 39200 sec elapsed, 16800 sec remaining
1. Stockfish 190913 1091.5/2100 803-720-577 (L: m=313 t=1 i=0 a=406) (D: r=278 i=91 f=136 s=8 a=64) (tpm=212.5 d=14.75 nps=925427)
2. Houdini 2.0 w32 1008.5/2100 720-803-577 (L: m=250 t=299 i=0 a=254) (D: r=278 i=91 f=136 s=8 a=64) (tpm=204.1 d=12.04 nps=1326351)
No functional change.
Goes in the direction of avoiding time losses and seems
equivalent after almost 40K games at super fast TC of 10+0.05
ELO: 2.61 +-2.2 (95%) LOS: 99.1%
Total: 39869 W: 8258 L: 7959 D: 23652
No functional change.
Goes in the direction of avoiding time losses and seems
equivalent after almost 40K games at super fast TC of 10+0.05
ELO: 2.41 +-2.3 (95%) LOS: 98.1%
Total: 37222 W: 7843 L: 7585 D: 21794
No functional change.
The ideal setting for super-blitz might be something like:
"Emergency Base Time" = 50
"Emergency Move Time" = 5
This would give a total emergency time buffer of:
50 + 40 * 5 = 250 ms
This setup replaces the previous half cooked hack
"Don't blunder under extreme time pressure".
Test results are very good at super blitz, but keep good even
at 60 secs.
At 5+0.05
ELO: 24.30 +-2.4 (95%) LOS: 100.0%
Total: 37802 W: 10060 L: 7420 D: 20322
At 15+0.05
ELO: 13.41 +-2.9 (95%) LOS: 100.0%
Total: 22271 W: 4853 L: 3994 D: 13424
At 60+0.05
ELO: 5.30 +-3.2 (95%) LOS: 99.9%
Total: 16000 W: 2897 L: 2653 D: 10450
No functional change.
Instead of current code, give a bonus according to the frontmost
square among candidate + passed pawns.
This is a big simplification that removes a lot of accurate code
substituting it with a statistically based one using the common
'bonus' scheme, leaving to the search to sort out the details.
Results are equivalent but code is much less and, as an added bonus,
we now store candidates bitboard in pawns hash and allow this
info to be used in evaluation. This paves the way to possible
candidate pawns evaluations together with all the other pieces,
as we do for passed.
Patch passed short TC
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 16927 W: 3462 L: 3308 D: 10157
Then failed (quite quickly) at long TC
LLR: -2.95 (-2.94,2.94) [0.00,6.00]
Total: 8451 W: 1386 L: 1448 D: 5617
But when ran with a conclusive 40K fixed games at 60 secs it proved
almost equivalent to original one.
ELO: 1.08 +-2.0 (95%) LOS: 85.8%
Total: 40000 W: 6739 L: 6615 D: 26646
bench: 3884003
Now that we use pre-increment on enums, it
make sense, for code style uniformity, to
swith to pre-increment also for native types,
although there is no speed difference.
No functional change.
ENABLE_OPERATORS_ON has incorrect definitions of
post-increment and post-decrement operators.
In particularly the returned value is the variable
already incremented/decremented, while instead they
should return the variable _before_ inc/dec.
This has no real effect because are only used in loops
and where the returned value is never used, neverthless
it is wrong. The fix would be to copy the variable to a
dummy, then inc/dec the variable, then return the dummy.
So instead, rename to pre-increment that can be implemented
without the dummy, actually the current implementation
it is already the correct pre-increment, with the only change
to return a reference (an l-value) and not a copy, so
to properly mimic the pre-increment on native integers.
Spotted by Kojirion.
No functional change.
We always attempt to keep at least this emergencyBaseTime
at clock. But if available time is very low it means that
we will force ourself to play immediately to satisfy the
emergencyBaseTime constrain and so leading to blunders.
Patch is good at short and very short TC (15secs and 5secs respectively)
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 26590 W: 5426 L: 5245 D: 15919
LLR: 2.96 (-2.94,2.94) [-1.50,4.50]
Total: 5767 W: 1397 L: 1268 D: 3102
Instead seems has no influence at longer TC (60 secs)
LLR: -2.96 (-2.94,2.94) [0.00,6.00]
Total: 79862 W: 13623 L: 13339 D: 52900
So it is committed to have a broader testing but is
to be consider still EXPERIMENTAL and can be reverted
easily.
No functional change.
In case we have less then 10ms to think as soon as
we wake up the timer, it immediately fires and calls
check_time() where due to condition:
elapsed > TimeMgr.maximum_time() - 2 * TimerResolution
the stop flag is set and search returns immediately, without
actually search anything.
Here the somewhat hacky fix is to start the timer after
at least one iteration as been completed.
No functional change.
The possible maximum mobility cardinality (plus one in case of
zero squares available) is:
- Knights: max. 8 squares -> max. 9 entries
- Bishops: max. 13 squares -> max. 14 entries
- Rooks: max. 14 squares -> max. 15 entries
- Queen: max. 27 squares -> max. 28 entries
So remove the extra entries in the table.
Spotted by Dariusz Orzechowski.
No functional change.
Union of
- LMR >= 3 plies from Gary tests.stockfishchess.org/tests/view/522522960ebc595d328fcafd
- allows() tweak from Reuven tests.stockfishchess.org/tests/view/5225fa1c0ebc595d328fcb53
Both passed Step I and failed Step II.
Instead this union passed both short TC:
LLR: 2.95 (-2.94,2.94)
Total: 14525 W: 3063 L: 2874 D: 8588
And long TC
LLR: 2.94 (-2.94,2.94)
Total: 31075 W: 5566 L: 5308 D: 20201
bench: 4238160
Idea is sound but implementation is partial. Ryan and Joona noticed that
we leave an hole in material table. Also we got another report by an user
of an odd behaviour. Namely, if you start stockfish and from the prompt
give 'bench' you get 3453941, then if you run again bench you get 3453940.
The reason is that two different positions with the same number of pieces,
but one with a bishop pair and another without have the same material key.
But after Eelco patch also different material imbalance and this yields
to this issue.
Restesting at long TC shows the patch does not really contribute at
ELO improvement. Actually patch failed at long TC.
LLR: -2.97 (-2.94,2.94)
Total: 23109 W: 4104 L: 4092 D: 14913
So revert.
bench: 3453945
Prefer pos.bishop_pair() to pos.count<BISHOP>(WHITE) > 1
because the first checks that the two bishops are on
different color squares.
Although the change seems to kick in only in very rare cases,
quite surprisingly it was able to pass SPRT test at short TC.
LLR: 2.95 (-2.94,2.94)
Total: 39818 W: 8174 L: 7956 D: 23688
bench: 3453941
This was thought to be a draw but the bishops generally win. However,
it takes up to 66 moves. The position in the diagram was thought to be
a draw for over one hundred years, but tablebases show that White wins
in 45 moves. All of the long wins go through this type of semi-fortress
position. It takes several moves to force Black out of the temporary
fortress in the corner; then precise play with the bishops prevents Black
from forming the temporary fortress in another corner (Nunn 1995:265ff).
Before computer analysis, Speelman listed this position as unresolved,
but "probably a draw" (Speelman 1981:109).
bench: 3453945
STANDALONE-TOOLCHAIN.html in Android NDK says:
It is recommended to use the -mthumb compiler flag to force the generation
of 16-bit Thumb-1 instructions (the default being 32-bit ARM ones).
If you want to target the 'armeabi-v7a' ABI, you will need ensure that the
following two flags are being used:
CFLAGS='-march=armv7-a -mfloat-abi=softfp'
Note: The first flag enables Thumb-2 instructions, and the second one
enables H/W FPU instructions while ensuring that floating-point
parameters are passed in core registers, which is critical for
ABI compatibility. Do *not* use these flags separately!
Thanks to Peter Osterlund for pointout this doc and for showing me
an example Makefile to follow.
No functional change.
Becuase castle is coded as "king captures the rook"
the to_sq(move), A1/8 or H1/8 is empty after the move,
leading to assert assert(p != NO_PIECE) in color_of().
Teach allows() asserts about castle and fix the crash.
Bug reported by Ryan Takker and tracked down by Tom Vijlbrief.
No functional change.
If our rook is behind a passed pawn, all
squares are defended.
One of the longest tests to pass !
Passed both short TC
LLR: 2.97 (-2.94,2.94)
Total: 44560 W: 9518 L: 9281 D: 25761
And long TC
LLR: 2.96 (-2.94,2.94)
Total: 61348 W: 11618 L: 11192 D: 38538
bench: 3787694
With
position fen 7k/8/8/8/8/7P/6K1/7B w - - 0 1
go depth 25
The evaluation at depth 22 is not draw as it should be. The reason is that
when search reaches the position 8/6kP/8/8/8/3B4/6K1/8 w - - 0 1 if white plays
h8R or h8N then we get a position that is a "KNOWN_WIN" and is _not_ a check, so
futility pruning in qsearch kicks in and black may think that it is "futile"
to reply Kxh8 since, according to the logic of the code, it cannot raise the score
back towards a draw.
bench: 4728533
The case of two lone kings on the board is already considered
by the "No pawns" scaling factor rules in material.cpp as is
KBK and KNK.
Moreover we had a small leak in endgames map because for
KK endgame it happens white and black material keys are the
same (both equal to zero), so when adding the black endgame in
Endgames::add() we were overwriting the already exsisting
white one, leading to a memory leak found by Valgrind.
So remove the endgames althogheter and rely on scaling
to correctly set the endgames value to a draw.
No functional change.
Instead of classical flags, throw an
exception when we want to immediately halt
the search. Currently only one type
is used for both UCI stop and threads
cut off.
No functional change.
Idea originated from a post of Don Dailey
on talkchess and reported by Eelco.
This is the last succesful attempt of a long
series of trials (as usually happens, the
'idea' alone is not enough).
Passed both short 15secs TC
LLR: 2.97 (-2.94,2.94)
Total: 7629 W: 1645 L: 1515 D: 4469
And long 60secs TC
LLR: 2.96 (-2.94,2.94)
Total: 10218 W: 1932 L: 1775 D: 6511
bench: 4944581
With the new automatic setting of split depth
instead of a default, the user no longer needs
guidance on setting the split point.
Also threads now defaults to one.
No functional change.
Search::RootColor is a global parameter set
before to start a search, it is not something
trace() should change.
This patch allows to add trace() calls, for
debugging, inside search itself without altering
the bench, and also ensures that the values
returned by trace() and evaluate() are fully
equivalent.
No functional change.
The rounding formula is different between
positive and negative scores due to the
GrainSize/2 term that is asymmetric.
So use truncation instead of rounding. This
guarantees that evaluation is rounded to zero
in the same way for both positive and negative
scores.
Found with position's flip
bench: 4634244
Because VALUE_NONE is 30002, it happens that
after a check the next move is never an improving
one.
After this patch bench signature is independent from
VALUE_NONE actual value.
bench: 4303194
Apply to LMR the same Eelco's idea
applied to move count pruning.
This is the result of a series of
attempts started by Thomas Kolarik.
Passed both short TC
LLR: 2.95 (-2.94, 2.94)
Total: 5675 W: 1241 L: 1117 D: 3317
And long TC:
LLR: 2.95 (-2.94, 2.94)
Total: 8748 W: 1689 L: 1539 D: 5520
bench: 4356801
Not a real functional change, but bench changed due to different piecelist
reordering. To verify it a temporary my canonicalize_rooks function was
written as follows. It just ensures that the rook on the "smaller" square
is listed first.
void Position::canonicalize_rooks(Color c)
{
if (pieceCount[c][ROOK] == 2)
{
Square s0 = pieceList[c][ROOK][0];
Square s1 = pieceList[c][ROOK][1];
if (s0 > s1)
{
pieceList[c][ROOK][0] = s1;
pieceList[c][ROOK][1] = s0;
index[s0] = 1;
index[s1] = 0;
}
}
}
With this both bench and the test on Chess960 positions
./stockfish bench 128 1 8 Chess960.epd file > /dev/null
Gives same result.
bench: 4424151
Set threads number always to 1 at startup and let the
user explicitly to chose the number of threads.
Also preserve the useful behavior of automatically set
"Min Split Depth" according to the requested threads,
indeed this parameter is too technical for a casual user,
so, when left to zero, we set it on a sensible value.
No functional change
The new Position methods add_piece, move_piece, and remove_piece
now manage the member variables pieceList, pieceCount, and index,
and 9 blocks of code in Position that used to manipulate those
data structures by hand now call the new methods.
There is a slightly slowdown (< 1%) on Clang and on perft,
but the cleanup compensates the little speed loss.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Introduce ThreadBase struct that is search
agnostic and just handles low level stuff,
and derive all the other specialized classes
form here.
In particular TimerThread does not hinerits
anymore all the search related stuff from Thread.
Also some renaming while there.
Suggested by Steven Edwards
No functional change.
At thread creation start_routine() is called
and from there the virtual function idle_loop()
because we do this inside Thread c'tor, where the
virtual mechanism is disabled, it could happen that
the base class idle_loop() is called instead.
The issue happens with TimerThread and MainThread
where, at launch, start_routine calls
Thread::idle_loop instead of the derived ones.
Normally this bug is hidden because c'tor finishes
before start_routine() is actually called in the
just created execution thread, but on some platforms
and in some cases this is not guaranteed and the
engine hangs.
Reported by Ted Wong on talkchess
No functional change.
The endgame king + minor vs king is erroneusly
detected as king + minor vs king + minor
Here the fix is to detect king + minor earlier,
in particular to add these trivial cases to
endgame evaluation functions.
Spotted by Reuven Peleg
bench: 4727133
And #ifdef instead of #if defined
This is more standard form (see for example iostream file).
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
A big simplification and removing of useless code.
Finished at 50% both at short TC (with SPRT) than
at long TC at fixed number of games:
ELO: -0.14 +-3.4 (95%) LOS: 46.8%
Total: 15206 W: 2836 L: 2842 D: 9528
bench: 5059948
This reverts commit 4b3a0fdab0.
As Gary says: " It failed when I tried it at long TC previously, and only
barely passed this time. Some anecdotal evidence is that it hurts vs other
engines as well (the Lightspeed rating list showed a 16 elo drop from previous
best version - still +- 5 error bars on both, but that's still significant)"
I also agree that if we have some doubts (like in this case) it is better to
be safe than sorry.
bench: 4615572
Reduces the influence of PSQT for entries such as
the extended center and the h-file.
Passed both short TC test:
LLR: 2.95 (-2.94,2.94)
Total: 23919 W: 5207 L: 5029 D: 13683
And long TC one:
LLR: 2.96 (-2.94,2.94)
Total: 5762 W: 1108 L: 974 D: 3680
Bench: 4617880
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This very speed critical code was full of clever (!)
tricks and subtle details.
So I have rewritten it in a more straithforward way
and, as very often happens, result is even faster
than original.
No functional change.
This reverts commit 3e95800814
For some reason it fails the short TC test:
LLR: -2.96 (-2.94,2.94)
Total: 20033 W: 4214 L: 4265 D: 11554
bench: 4769737
It is somewhat unbilievable but our SEE is broken !
If the first SEE move is a king capture and square is
defended then SEE continues instead of breaking.
The bug shows only on normal SEE, not see_sign() so
probing with a:
dbg_hit_on_c(slIndex==1, captured == KING);
reports just a tiny:
Total 3465656 Hits 6646 hit rate (%) 0
Bug was there since Retire seeValues[] and move PieceValue[] out of Position of 26/6/2011 (!)
although for some reason didn't show immediately, indeed the
bougous patch was a "No functional change" (!!)
bench: 4699504
It is somewhat unbilievable but our SEE is broken !
If the first SEE move is a king capture and square is
defended then SEE continues instead of breaking.
The bug shows only on normal SEE, not see_sign() so
probing with a:
dbg_hit_on_c(slIndex==1, captured == KING);
reports just a tiny:
Total 3465656 Hits 6646 hit rate (%) 0
Bug was there since 351ef5c85b of 26/6/2011 (!)
although for some reason didn't show immediately, indeed the
bougous patch was a "No functional change" (!!)
bench: 4793754
Without patch we have 333198 nps, with patch 334249.
A very small +0.3%, not a lot manily becuase this is a
side path that is taken very few times.
Anyhow idea is correct becuase first 'quick' condition
has an hit rate of about 95%.
No functional change.
But still keep the same original
margin for score.
Passed both short TC test
LR: 2.95 (-2.94,2.94)
Total: 3710 W: 845 L: 726 D: 2139
And long TC
LLR: 2.95 (-2.94,2.94)
Total: 57859 W: 10939 L: 10532 D: 36388
bench: 4769737
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Partially revert previous patch and use
unlikey() just as code annotation.
Actually it is better to rely on a profiler for branch prediction:
http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html
"In fact, even when only one in ten thousand values is nonzero,
we're still at only roughly the break-even point"
No functional change,
It is somewhat redundant and could make SF
name too long, so use just Version, in case
of a signature build Version will be set to
'sig-xxx' otherwise, if left empty, we fall
back on usual date stamp.
No functional change.
When compiling with:
make signature-build ARCH=xxx COMP=xxx
After binary has been roduced, it will be run to
get the signature 'stockfish bench' and this
number will be used as Version, so that it
will be easy to track the original sources
from a binary.
No functinal change.
Due to a strange issue (bug?) the ternary
operator does not return a BitCountType for
icc, so revert to the expression used
before bcbc9bfd1f
No functional change.
Here speed up is the name of the game.
Speed up is gained:
- Removing the useless enoughMaterial code
- Limiting trapped rook evaluation to where it counts
Tested at long TC:
LLR: 2.97 (-2.94,2.94)
Total: 10061 W: 1948 L: 1790 D: 6323
bench: 4558173
Thanks to Don, Miguel, Louis and the other people
of talkchess forum for the suggestion:
http://www.talkchess.com/forum/viewtopic.php?t=48612
Also sync polyglot.ini with current UCI options
No functional change.
Unfortunatly a reverse test at long TC failed:
master^ vs master
LLR: 1.37 (-2.94,2.94)
Total: 33682 W: 6294 L: 6071 D: 21317
So becuase short TC score is 50% there is a good
possibility patch is not scalable.
So revert it.
bench: 4507288
Do not use threat move to detect the condition. This
let us to retire the big allows() function.
Test at short TC was within 50% score:
LLR: -2.95 (-2.94,2.94)
Total: 38272 W: 7941 L: 7940 D: 22391
To be verified with reverse long TC
bench: 4191565
Here the main difference is that now we center
aspiration window on last returned score. This allows
to simplify handling of mate scores.
We have done a reversed SPRT tests, where we wanted to
verify if master is stronger than this patch.
Long TC: master vs this patch (reverse test)
LLR: -2.95 (-2.94,2.94)
Total: 37992 W: 7012 L: 6920 D: 24060
bench: 4507288
Temporary revert aspiration window patch
so to be visible to everybody: it will be
re-applied with next patch
No functional change (together with next one)
Here the main difference is that now we center
aspiration window on last returned score. This allows
to simplify handling of mate scores.
We have done a reversed SPRT tests, where we wanted to
verify if master is stronger than this patch.
Long TC: master vs this patch (reverse test)
LLR: -2.95 (-2.94,2.94)
Total: 37992 W: 7012 L: 6920 D: 24060
bench: 4507288
A simplification of the 'dangerous' definition.
Seems neutral at reverse test at long TC
master vs patch
LLR: -2.96 (-2.94,2.94)
Total: 16974 W: 3122 L: 3139 D: 10713
bench: 4689029
Function calloc() already initializes memory to
zero, so avoid calling clear() afterwards.
Also some renaming while there (inspired by DiscoCheck).
No functional change.
Here we skip the call to pos.attacks_from<ROOK>(s) in the 98%
of cases, testing the first 2 members first. Unfortunatly
code is a bit triky and not clear. So we give up to the
speed optimization in exchange of more code clarity.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
So when we are doing a LMR search at the parent ALL node.
This patch didn't prove stronger at 60" TC
LLR: -2.97 (-2.94,2.94)
Total: 22398 W: 4070 L: 4060 D: 14268
But, first, it scores at 50%, second (and most important for me) the opposite,
i.e. normal reduction when parent node is not reduced, seems very bad:
LLR: -2.95 (-2.94,2.94)
Total: 7036 W: 1446 L: 1534 D: 4056
According to Don, this idea of increased reduction of CUT nodes
works because if parent node is reduced, missing a cut-off due to
reduced depth search (meaning position is somehow tricky) forces
a full depth research at parent node, giving due insight in this
set of sensible positions.
IOW if we expect a node to fail-high at depth n, then we assume it
should fail-high also at depth n-1, if this doesn't happen it means
position is tricky enough to deserve a research at depth n+1.
bench: 4687419
We got a good result from this tweak, in line with
what was already found by Don Dailey.
At short TC:
LLR: 2.95 (-2.94,2.94)
Total: 13097 W: 2742 L: 2598 D: 7757
At long TC:
LLR: 2.97 (-2.94,2.94)
Total: 7281 W: 1408 L: 1265 D: 4608
bench: 5108393
Follow Don Dailey definition of cut/all node:
"If the previous node was a cut node, we consider this an ALL node.
The only exception is for PV nodes which are a special case of ALL nodes.
In the PVS framework, the first zero width window searched from a PV
node is by our definition a CUT node and if you have to do a re-search
then it is suddenly promoted to a PV nodes (as per PVS search) and only
then can the cut and all nodes swap positions. In other words, these
internal search failures can force the status of every node in the subtree
to swap if it propagates back to the last PV nodes."
http://talkchess.com/forum/viewtopic.php?topic_view=threads&p=519741&t=47577
With this definition we have an hit rate higher than 90% on:
if (!PvNode && depth > 4 * ONE_PLY)
dbg_hit_on_c(cutNode, (bestValue >= beta));
And an hit rate of just 28% on:
if (!PvNode && depth > 4 * ONE_PLY)
dbg_hit_on_c(!cutNode, (bestValue >= beta));
No functional change.
Fix was wrong becuase search starts from ss+1,
code is a bit tricky here, so rewrite in a way
to be more easy to read and understand.
Spotted by Eelco.
No functional change.
In case of we pick a sub-optimal move be
sure to print this, and not the best one
on seach log file.
Bug spotted by Guenther Demetz.
No functional change.
Search is started after setting a position and
issuing UCI 'go' command. Then if we stop the search
and call 'go' again without setting a new position it
is assumed that the previous setup is preserved, but
this is not the case because what happens is that
SetupStates is reset to NULL, leading to a crash as
soon as RootPos.is_draw() is called because st->previous
is now stale.
UCI protocol is not very clear about requiring that a
position is setup always before launching a search,
so here we easy the life of GUI developers assuming
that the current state is preserved after returning
from a 'stop' command.
Bug reported by Gregor Cramer.
No functional change.
A small number of tests with simulated
annealing at 15s indicated these values
may be better
And this is verified at long 60+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 40658 W: 7821 L: 7501 D: 25336
bench: 4931544
This patch is the sum of:
- Grainsize of 4 instead of 8
- Removing "depth < DEPTH_ZERO"
- Change DEPTH_QS_RECAPTURES = -5 to -7
All the patches individually failed to pass SPRT but scored
around 50%.
Together they pass easily short TC:
LLR: 2.96 (-2.94,2.94)
Total: 4429 W: 964 L: 844 D: 2621
And with some difficult long TC of 60+0.05:
LLR: 2.95 (-2.94,2.94)
Total: 64133 W: 11968 L: 11532 D: 40633
bench: 4821467
Add MOVE_NONE at the tail, this allows to loop
across MoveList checking for *it != MOVE_NONE,
and because *it is used imediately after compiler
is able to reuse it.
With this small patch perft speed increased of 3%
And it is also a semplification !
No functional change.
Most of the time we cut-off earlier, at captures, so this
results in useless work.
There is a small functionality change becuase 'ss' can change
from MovePicker c'tor to when killers are tried due, for
instance, to singular search.
bench: 4603795
Very good at long 60"+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 5954 W: 1151 L: 1016 D: 3787
[edit: slightly changed form original patch to avoid useless loop
across killers when killer is MOVE_NONE]
bench: 4327405
Performed more or less well at short TC
LLR: 2.95 (-2.94,2.94)
Total: 50517 W: 9815 L: 9574 D: 31128
And a bit better at long TC
LLR: 2.96 (-2.94,2.94)
Total: 15564 W: 2805 L: 2624 D: 10135
bench: 4375253
It seems that do not limiting checking the
trapped rook only on rank 1 improves the
score.
At long TC
LLR: 2.97 (-2.94,2.94)
Total: 6581 W: 1346 L: 1204 D: 4031
bench: 4985012
Don't update refutation table in case of
previous move is MOVE_NULL or MOVE_NONE
and don't try refutation if is already
a killer move.
Pass both short TC
LLR: 2.96 (-2.94,2.94)
Total: 4310 W: 953 L: 869 D: 2488
And long one
LLR: 2.95 (-2.94,2.94)
Total: 6707 W: 1254 L: 1184 D: 4269
bench: 4785954
Very good result both at short TC 15+0.05
LLR: 2.95 (-2.94,2.94)
Total: 2803 W: 596 L: 483 D: 1724
And at long TC 60+0.05
LLR: 2.95 (-2.94,2.94)
Total: 2862 W: 548 L: 431 D: 1883
bench: 4329221
Unortunatly we have no guarantee that the call to
operator~(Color c) is resolved at compile time.
Perhaps the solution would be to use C++11 const_expr,
but for now simply use the good old-style ternary operator
that works as expected.
No functional change.
A rook is trapped if on rank 1 as is the king.
Currently the condition aloows for the rook
to be also in front of the pawns as long
as king is on first rank.
Verified with short TC test:
LLR: -1.71 (-2.94,2.94)
Total: 23234 W: 4317 L: 4317 D: 14600
Here what it counts is that after 23K games
result is equal.
bench: 4696542
When it is already defined(_WIN32).
According to Microsoft documentation:
http://msdn.microsoft.com/en-us/library/b0084kay.aspx
_WIN32 Defined for applications for Win32 and Win64. Always defined.
_WIN64 Defined for applications for Win64.
Patch suggested by Joona.
No functional change.
This info is normally printed together with
PV info in uci_pv() but when search is stopped,
for instance when max search time is reached,
uci_pv is not called and we miss this bits.
Suggested by gravy_train
No functional change.
But this time do not play with pointers, in
particular do not assume that size_t is an
unsigned type of the same width as pointers.
This code should be fully portable.
No functional change.
This fixes an assert while testing with debug on.
Assert was due to static null pruning returning value
eval - futility_margin(depth, (ss-1)->futilityMoveCount)
That was sometimes higher than VALUE_INFINITE triggering
an assert at the caller site.
Because eval con be equal to ttValue and anyhow is read from
TT that can be corrupted in SMP case, we need to sanity
check it before to use.
bench: 4176431
Let TT clusters (16*4=64 bytes) to hold on a singe cache line.
This avoids the need for the double prefetch.
Original patches by Lucas and Jean-Francois that has also tested
on his AMD FX:
BIG HASHTABLE
./stockfish bench 1024 1 18 > /dev/null
Before:
1437642 nps
1426519 nps
1438493 nps
After:
1474482 nps
1476375 nps
1475877 nps
SMALL HASHTABLE
./stockfish bench 128 1 18 > /dev/null
Before:
1435207 nps
1435586 nps
1433741 nps
After:
1479143 nps
1471042 nps
1472286 nps
No functional change.
Crash is due to uninitialized ss->futilityMoveCount that
when happens to be negative, yields to an out of range
access in futility_margin().
Bug is subtle because it shows itself only in SMP case.
Indeed in single thread mode we only use the
Stack ss[MAX_PLY_PLUS_2];
Allocated at the begin of id_loop() and due to pure
(bad) luck, it happens that for all the MAX_PLY_PLUS_2
elements, ss[i].futilityMoveCount >= 0
Note that the patch does not prevent futilityMoveCount
to be overwritten after, for instance singular search
or null verification, but to keep things readable and
because the effect is almost unmeasurable, we here
prefer a slightly incorrect but simpler patch.
bench: 4311634
Instead of a pointer. This should fix the issue of
remaining with a stale pointer when for instance calling
IID, but also null search verification, singular search
and razoring where we call search with the same ss
pointer. In this case ss->ei is overwritten in the
search() call and upon returning remains stale.
This patch could have a performance hit because Eval::Info
is big (176 bytes) and during splitting we copy 4 ss entries.
On the good side, this patch is a clean solution.
Proposed by Gary.
No functional change.
Allow to use EvalInfo struct, populated by
evaluation(), in search.
In particular we allocate Eval::Info on the stack
and pass a pointer to this to evaluate().
Also add to Search::Stack a pointer to Eval::Info,
this allows to reference eval info of previous/next
nodes.
WARNING: Eval::Info is NOT initialized and is populated
by evaluate(), only if the latter is called, and this
does not happen in all the code paths, so care should be
taken when accessing this struct.
No functional change.
The idea is to fail high more easily in static
null test if in the parent node we are already
very deep in the move list, so the propability
to fail high there is very low.
[edit: I have slightly changed the functionality
moving
ss->futilityMoveCount = moveCount;
At the end of the pruning code, this should not affect
ELO in anyway, but makes code more natural and logic]
Test with SPRT is good at 15+0.05
LLR: 2.96 (-2.94,2.94)
Total: 50653 W: 10024 L: 9780 D: 30849
And at 60+0.05
LLR: 2.97 (-2.94,2.94)
Total: 40799 W: 7227 L: 6921 D: 26651
bench: 4530093
When we use sysconf(_SC_NPROCESSORS_ONLN) to get number of
cores, we have to include sysconf library that is unistd.h
Sometimes it happens to work just becuase unistd.h indirectly
included by some other libraries, but not always.
Reported and fixed by Eyal BD
No functional change.
The idea is to penalize a bishop in case of
its pawns are on the same colored squares.
Good at short 15"+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 4252 W: 925 L: 806 D: 2521
And at longer 60"+0.05 TC
LLR: 2.95 (-2.94,2.94)
Total: 15006 W: 2743 L: 2564 D: 9699
bench: 5274705
It seems stronger both at fast 15+0.05 TC with fixed game number test:
ELO: 2.74 +-2.7 (95%) LOS: 97.6%
Total: 24000 W: 4698 L: 4509 D: 14793
And also at long 60+0.05 TC with SPRT
LLR: 3.05 (-2.94,2.94)
Total: 38986 W: 6845 L: 6547 D: 25594
bench: 5157061
Use an optinal argument instead of a template
parameter. Interestingly, not only is simpler,
but also faster, perhaps due to less L1 instruction
cache pressure because we don't duplicate the very
used SEE code path.
No functional change.
Rationale:
- Current settings seem to make engine *significantly* weaker in analysis mode.
- In practice this setting only has effect when king safety scores are high.
- Even in analysis mode its far more important to know if one side is getting mated,
rather than get evaluation correct with 1cp accuracy.
No functional change
According to Jean-Paul this setup should be stronger
than default.
And SPRT test seems to confirm it:
At fast TC 15"+0.05
ELO: 3.33 +-2.7 (95%) LOS: 99.2%
Total: 25866 W: 5461 L: 5213 D: 15192
At longer TC 60"+0.05
ELO: 7.27 +-5.0 (95%) LOS: 99.8%
Total: 6544 W: 1212 L: 1075 D: 4257
bench: 5473339
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Increasing depth limit to 10 plies seems stronger
after 16K games at 15"+0.05 (ELO: +3.56) and also
repeating the test at 60"+0.05 TC:
ELO: 2.08 +-3.1 (95%) LOS: 90.9%
Total: 16000 W: 2641 L: 2545 D: 10814
Moreover setting the limit to 12 is proved stronger
then limit set to 10 by direct SPRT test at 15"+0.05:
ELO: 2.56 +-2.0 (95%) LOS: 99.5%
Total: 46568 W: 9240 L: 8897 D: 28431
So we directly set the limit to 12, the strongest setup.
bench: 4361224
Master IID formula is depth / 2
Previous patch is depth - 4 * ONE_PLY
This one is the middle way:
(dept/2 + depth-4*ONE_PLY)/2 -> depth-2*ONE_PLY-depth/4
After 16000 games at 60+0.05 th 1
ELO: 4.08 +-3.1 (95%) LOS: 99.5%
Total: 16000 W: 2742 L: 2554 D: 10704
bench: 4781239
Same idea of 5af8179647
in qsearch() but applied to search()
After 15500 games at 15+0.05
ELO: 4.48 +-3.4 (95%) LOS: 99.5%
Total: 15500 W: 3061 L: 2861 D: 9578
bench: 4985829
Always before pruning the move, it's important to check that:
bestValue > VALUE_MATED_IN_MAX_PLY
See example position:
8/2p1p3/P1NpP3/3k4/1P1BN3/2P1P3/2Q5/6K1 w - - 0 1
http://support.stockfishchess.org/discussions/problems/268-wrong-declaring-a-forced-mate-in-3-moves
This problem was present in 2.3.1, then it was fixed by my patch.
After 24000 games at 15+0.05
ELO: 2.40 +-4.4 (95%) LOS: 95.7%
Total: 24000 W: 4774 L: 4608 D: 14618
bench: 4465997
This reverts commit a24da071f0
Seems a regression when tested against 2.3.1
With this patch, have after 20000 games at 60+0.05, we have
ELO: 13.42 +-4.8 (95%) LOS: 100.0%
Total: 20000 W: 3746 L: 2974 D: 13280
Instead with the patch reverted:
ELO: 16.62 +-4.8 (95%) LOS: 100.0%
Total: 20000 W: 3816 L: 2860 D: 13324
Although we are within error bounds here we take the conservative
approach of not introducing changes that are not proved stronger
It doesn't mean that the change shall be weaker, simply that we
don't want to take any risk.
No functional change.
Here the rational seems to be that if after one try easy
move detection fails then the easy move is not so easy :-)
After 15563 games at 60+0.05
ELO: 3.04 +-5.5 (95%) LOS: 97.0%
Total: 15563 W: 2664 L: 2528 D: 10371
No functional change.
Increase MaxRatio to use more time when in trouble.
After 16000 games at 60+0.05
ELO: 4.89 +-5.4 (95%) LOS: 99.9%
Total: 16000 W: 2700 L: 2475 D: 10825
No functional change.
This seems good at short TC controls.
After 10000 games at 20+0.05
ELO: 9.56 +-6.8 (95%) LOS: 100.0%
Total: 10000 W: 1949 L: 1674 D: 6377
Testing at long TC and regression testing is still
ongoing. So this is a bit speculative commit and
could be reverted in the future.
Also re-testing at long TC the SEE pruning in PV nodes
seems less effective (perhaps even a regression, but
still ongoing) so disabled for now.
bench: 4968764
This reverts commit 0d68b523a3.
After easy move semplification this machinery is not
needed anymore (because of we don't need to know if a
root move is a recapture)
No functional change.
Detect a move as easy only if it is the only one ;-)
or if is much better than remaining ones after we
have spent 20% of search time.
Tests are ongoing, but it seems this semplification
stands. Anyhow it is experimental for now and could
be reverted/improved with further work Gary is
testing right now.
No functional change.
After previous patch if split point master is
waiting for job and "Use Sleeping Threads" is
false (our condition for official releases) then
it will lock/unlock splitPoint mutex in a super
tight loop badly affecting performance.
Rewrite the code to lock only when we are about
to finish. Note that race condition on slavesMask
is anyhow fixed.
No functional change.
There is no point searching a move that is forced.
It wastes time while allowing computer opponents to
fill hash with 100% accuracy.
[edit: Condition moved together with "easy move" ones]
Bench identical: 4922272
We detect an easy move as a recapture with an
high margin on the second best move.
Unfortunatly the recapture detection is broken
becuase we identify as a recapture any move that
follows an opponent's previous capture !
This patch fix the logic to correctly detect a
real re-capture.
No functional change.
Note that we read shared data without lock
protection, so code is theoretically prone to
torn reads. But, first splitPoint pointer
never changes, and alpha is of integer type so
it is read in a single DWORD access.
No functional change.
This patch is actually the sum of two contributions that
have been tested independently:
1) Pruning of negative SEE moves in PV
After 10000 games at 20+0.05
ELO: 5.18 +-7 (95%) LOS: 99.2%
Total: 10000 W: 1952 L: 1803 D: 6245
2) Remove of bestValue > VALUE_MATED_IN_MAX_PLY condition
After 23000 games at 20+0.05
ELO: 1.63 +-4 (95%) LOS: 88.1%
Total: 23000 W: 4232 L: 4124 D: 14644
The whole patch as been re-tested at long TC with positive results:
After 10000 games at 60+0.05
ELO: 4.31 +-7 (95%) LOS: 98.3%
Total: 10000 W: 1765 L: 1641 D: 6594
kf = (kf == FILE_A) ? kf++ : ....
is tricky becuase kf is updated twice and it happens
to do the right thing just by accident.
Rewrite in a better way.
Spotted by pdimov
No functional change.
Give a bonus if a bishop can pin a piece or can
give a discovered check through an x-ray attack.
Seems good after 24000 games at 15"+0.05 (single thread):
ELO: 12.30 +- 99%: 5.79 95%: 4.40 LOS: 100.00%
Total: 24000 W: 4931 L: 4082 D: 14987
bench: 4917064
Rename startPosPly to gamePly and increment/decrement
the variable after each do/undo move. This adds a little
overhead in do_move() but we will need to have the
game ply during the search for the next patches now
under test.
Currently we don't increment gamePly in do_null_move()
becuase it is not needed at the moment. Could change
in the future.
As a nice side effect we can now remove an hack in
startpos_ply_counter().
No functional change.
Another trick, along the same lines of previous
patch. This time we first check positions with
white side to move that, becuase we start with
pawn on rank 7, are easily classified as wins,
then black ones.
Number of cycles reduced to 15 !
Becuase now it is faster we can remove a lot of
code to detect theoretical draws. We will calculate
them anyhow, although a bit slower, but the speed
up trick more than compensates it.
Verified that generated bitbases match original ones.
No functional change.
Change the way the index is coded so that
now looping from 0 to IndexMax generates
the pawns from RANK_7 down to RANK2.
Becuase positions with pawns at RANK_7
are easily classified as wins/draws, this
small trick allows to reduce the number
of needed iterations from 30 down to 26!
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Rename to insertion_sort so to avoid confusion
with std::sort, also move it to movepicker.cpp
and use the bit slower std::stable_sort in
search.cpp where it is used in not performance
critical paths.
No functional change.
We can encode the ClusterSize directly in the
hashMask, this allows to skip the left shift.
There is no real change, but bench number is now
different because instead of using the lowest order
bits of the key to index the start of the cluster,
now we don't use the last two lsb bits that are
always set to zero (cluster size is 4). So for
instance, if 10 bits are used to index the cluster,
instead of bits [9..0] now we use bits [11..2].
This changes the positions that end up in the same
cluster affecting TT hits and so bench is different.
Also some renaming while there.
bench: 5383795
Do a 32bit bitwise 'and' instead of a 64bit
subtract and bitwise 'and'.
This is possible because even in the biggest
hash table case (8GB) the number of entries
is 2^29 so storable in an unsigned int.
No functional change.
Save the current active position in each Thread
instead of keeping a centralized array in struct
SplitPoint.
This allow to skip a memset() call at each split.
No functional change.
Obsolete renmant of when position was directly
passed to the search instead of being copied
for the main thread as is now.
From Jundery.
No functional change.
The syntax splitPoints() should force the compiler to
value-initialize the array and because there is no
user defined c'tor it falls back on zero-initialization.
Unfortunatly this is broken in MSVC compilers, because
value initialization for non-POD types is not supported,
so left splitPoints un-initialized and add in split()
initialization of slavesPositions, that is the only
member not already set at split time.
This fixes an assert under MSVC when running with
more than one thread.
Spotted and reported by Jundery.
No functional change.
It is a draw if pawns are on G or B files, weaker pawn is
on rank 7 and bishop can't attack the pawn.
No functional change (because it is very rare and does not appear in bench)
To return a pointer to the available
thread instead of a bool. This allows
to simplify the core loop in split().
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
This function "returns" two values: bestValue and bestMove
Instead of returning one and passing as pointer the other
be consistent and pass as pointers both.
No functional change.
Currently a ttMove is reduced with ss->reduction = DEPTH_ZERO,
so it is actually not reduced (as it should be), but the
trick works just becuase it happens that ttMove is the first
to be tried and
reduction(depth, 1)
Always returns zero. So explicitly forbid reduction of ttMove
in the LMR condition. This is much clear and self-documented.
No functional change.
Surprisingly this rare case was not considered
when scoring a capture.
Also take in account that in the promotion case
we gain a new piece (typically a queen) but we
lose the promoting pawn.
These small issues were present since Glaurung times!
Found while browsing DiscoCheck sources
bench: 5400063
Handling of History and Gains is almost the same, with
the exception of the update logic, so unify both
classes under a single Stats struct.
No functional change.
And handle the castle directly in do/undo_move().
This allow to greatly simplify the code.
Here the beast is the nasty Chess960 that is
really tricky to get it right because could be
that 'from' and 'to' squares are the same or
that king's 'to' square is rook's 'from' square.
Anyhow should work: verified on all Chess960
starting positions.
No functional and no speed change also in Chess960.
Use a more traditional approach, along the same lines
of do_move().
It is true that we copy more in do_null_move(), but we
save the work in undo_null_move(). Speed test shows the
new code to be even a bit faster.
No functional change.
We have only one call place so inline its content.
BTW, function is already declared as FORCE_INLINE.
Also some small refactoring while there.
No functional change.
When a thread is allocated a bit is set in slavesMask.
This bit corresponds to the thread's index field that,
because it happens to be the position in the threads
array, eventually it is equal to the loop index 'i'.
But instead of relying on this 'coincidence', explicitly
use the 'idx' field so to clarify slavesMask usage.
Backported from c++11 branch.
No functional change.
Intel Compiler has 'invented' this pearl:
warning #1476: field uses tail padding of a base class
Just becuase we have subclassed MainThread and added
the field 'bool thinking'.
Pure nosense. Silence the warning.
No functional change.
Fix again TimerThread::idle_loop() to prevent a
theoretical race with 'exit' flag in ~Thread().
Indeed in Thread d'tor we raise 'exit' and then
call notify() that is lock protected, so we
have to check again for 'exit' before going to
sleep in idle_loop().
Also same change in Thread::idle_loop() where we
now check for 'exit' before to go to sleep.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Great code simplification: - instead do not futility
prune threat refutations. allows_move() is therefore removed.
4000 games at 50,000 nodes/move:
1085-989-1926 [51.2%] LOS=98.3%
4000 games in 10"+0.1"
756-751-2493 [50.1%] LOS=55.1%
EDIT: I have retested the patch of Lucas in a slightly different form
(without pruning in PvNode) and test mre or less confirms that
60 lines of code are totally unuseful:
After 6195 games at 15"+0.05"
1333 - 1325 - 3537 ELO 0
bench 5140990
Silly logic bug introduced in dda7de17e7
Timer thread, when msec = 0, instead of going
to sleep, calls check_time() in an endless loop.
Spotted and reported by snino64 due to abnormally
high CPU usage.
No functional change.
Subclass MainThread and TimerThread and declare
idle_loop() virtual. This allow us to cleanly
remove a good bunch of hacks, relying on C++
polymorphism to do the job.
No functional change.
Rename it is_finished and use it only in main
thread to signal search is finished. This allows
us to simplify the complex SMP logic.
Ultra tricky patch: deep test is required under
wide conditions like pondering on and option
"Use Sleeping Threads" set to false.
No functional change.
This reverts commit 869c924410
I misunderstood here. Actually it can happen that
thread is created but still not entered idle_loop
and at the same time start_searching() is called.
Becuase 'do_sleep' is set start_searching() will
set it to false and start the search, but when,
at last, the thread enters idle_loop(), resets
the flag and goes to sleep: not what we want.
Revert the hack waiting for a better solution
in the next patches.
No functional change.
Finally we can now merge the 'ponderhit' case with
'stop' and 'quit'.
The patches have been done step by step to help debugging
becuase this is really tricky code.
No functional change.
Reset Limits.ponder only if search continue, but if
we are going to stop the search there is no need
(and is also confusing) to clear the 'ponder' flag.
This mimics the behaviour upon rceiving 'stop' when
pondering.
No functional change.
In SAN notation when two pieces of the same type can
move to a given destination square, a disambiguation
additional info (like starting file) shall be added
to the SAN move.
If one of the two pieces is pinned, the corresponding
move _could_ be illegal and in this case disambiguation
is not needed. But to be pinned alone it is not enough
to deduce that the move is illegal, for instance in this
position:
R3rk2/2r6/8/8/8/8/8/K7 b - - 0 1
The move Rc8 is ambiguous although the rook in e8 is pinned
and the correct SAN notation should be Rcc8.
No functional change.
Don't wait for the search to finish after a 'stop'
command, but keep processing the GUI input if any.
Also explicitly wake up the main thread (that could be
sleeping) after a 'stop' or 'quit' command and do not
rely on wait_for_search_finished() doing it for us.
This patch cleans up the code and functions's definitions,
but it is risky and needs a good test under different
conditions to be sure it does not introduces hungs up.
No functional change.
Revert patch c581b7ea36
Seems a regression after testing from Gary:
ELO: 7.24 +- 99%: 17.03 95%: 12.93
LOS: 97.86%
Wins: 439 Losses: 381 Draws: 1962
And mine:
After 5410 games at 15"+0.05
Wins: 936 Losses: 1141 Draws: 3333 ELO -13
Moreover we know that there is a regression in the range
of patches which include the fromNull patch.
Probably this is not the only regression since 2.3.1 and
perhaps the idea under fromNull is good, but at the moment,
while in deep regression hunting, better to be on the safe
side and revert it entirely.
My guess on why this is a regression is that using the
negated evaluation of previous ply in case of null search
fails to take in account the king safety asymmetry between
the two colors. This is of course just a guess.
bench 5503830
They are not self-describing and create a lot of user
requests about them.
Given that the values are already well tuned there
is no need to expose them as UCI options.
No functional change.
Sometimes is faster, but not always and on very long mates
produces strange scores probably due to truncation of PV
artifacts.
So simply perform normal search also in case of UCI 'mate x'
command, with the only difference that when a mate in x is
found search returns immediately.
No functional change.
Now that we can call 'bench' command also from
interactive terminal it makes no more sense to
exit the application if the user types a wrong
file name.
No functional change.
Since &-ing with SpaceMask restricts the set to the home
half of the board, it is possible to use just one popcount
instead of 2 by shifting "safe" to the other half of the
board. This gives a small speedup especially on systems
where hardware popcount is not available.
Patch kindly sent by Richard Vida.
No functional change.
It is now possible to run SF on a 'mate in x' testsuite.
For instance in case of a file with fen strings of
positions with mate in 10 we can now 'bench' on it:
stockfish bench 128 1 10 mate_in_10.epd mate
No functional change.
Following a user request I added the handling of UCI:
go mate x
Currently we just return from a PV node if x moves have been
done. Probably not the best approach. I have looked at Fruit/Toga
sources and there is even simpler: engine falls back on a fixed
depth search.
No functional change.
And return on using TT as backing store for position
evaluations.
Tests (even on single thread) show eval cache was a regression.
In multi thread result should be even worst because eval cache
is a per-thread struct, while TT is shared.
After 4957 games at 15"+0.05 (single thread)
eval cache vs master 969 - 1093 - 2895 -9 ELO
So previous reported result of +18 ELO was probably due to an
issue in the testing framework (a bug in cutechess-cli) that
has been fixed in the meanwhile.
bench: 5386711
In case of null search at low depths returns a fail low
due to a threat then, rather than return beta-1 (to cause
a re-search at full depth in the parent node), we set a flag
threatExtension = true (false by default) that will cause
moves that prevent the threat to be extended of one ply
in the following search.
Idea and patch is by Lucas Braesch.
Lucas also did the tests:
1500 games in 5"+0.05":
SF_threatExtension vs SF_20121222: 366 - 331 - 803 [51.2%] LOS=90.8%
3000 games in 10"+0.1":
SF_threatExtension vs SF_20121222: 610 - 559 - 1831 [50.8%] LOS=93.2%
Tests confirmed by Gary after 10570 games,
ELO: 2.79 +- 99%: 8.72 95%: 6.63
LOS: 94.08%
Wins: 1523 Losses: 1438 Draws: 7607
And finally by me at 15"+0.05, single thread, 3824 games
threatExtension vs master 768 - 692 - 2364 +7 ELO
bench 4918443
This is difficult code becuase a bug here could lead
to very subtle crashes in case of SMP games where we
have TT move corruption due to concurrent access.
Anyhow I have fully verified te code throwing at it
random moves. It shoudl work.
No functional change.
Test by Joona prooves the new feature don't value 70 added lines of code.
Grand totals after 10040 games (crashes: 0) for tt_both
master_9edc7 - 6a93488_6a934: 1756 - 1688 - 6596 ELO +2 (+- 2.7)
Confirmed by test of Gary:
After 8680 games:
ELO: 0.80 +- 99%: 9.62 95%: 7.31
LOS: 65.38%
Wins: 1288 Losses: 1268 Draws: 6130
Thanks a lot to both for testing it !!!
bench 5149248
This is more complex than what I'd like but I
was unable to split in small chunks.
Here we add 2 slots to TTEntry (valueUpper and depthUpper)
so that sizeof(TTEntry) returns to the original 16 bytes
and we can pack exactly 4 entries in a 64 bytes cache line.
Now we save an upper bound score alongside a lower (exact)
score. The idea is to increase TT cut-offs rates becuase
there is now an higher probability for a node to use TT info.
This patch is highly experimental and probably needs further
steps as is hinted by an unrealistic bench number:
bench: 2022385
In almost all cases we already know in advance that
color_of() argument is different from NO_PIECE.
So avoid the check for NO_PIECE in color_of() and
test at caller site in the very few places where
this case could occur.
As a nice side effect, this patch fixes a (bogus)
warning under some versions of gcc.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Use an eval cache instead of TT to store node
position evaluations.
It is already an improvment and, because it frees
two TT entry slots, paves the way to extend TT to
store both upper and lower bounds.
After 4855 games, single thread, 15"+0.05
Mod vs Orig 1165 -920 - 2770 ELO +18
bench: 5149248
And document why this is an hard limit. It
seems for some (lucky) people 32 threads
are not enough.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
In search(), after we evalute the position, in case there
isn't any TT entry we create one with just the evaluation
score.
This patches removes that code. The reason becuase the patch
deserves a single commit it is becuase introduces a (very small)
functional change due to the fact that the total number of
TT stores is less now and this slightly alters the TT hits
of our benchmark.
bench: 4983262
Rely fully on eval cache. Note that we still save eval
info to TT, this is not needed at this moment and will be
removed in future patches. We keep it so to have a "non
functional change" patch.
No functional change.
With this patch series we want to introduce a per-thread
evaluation cache to store node evaluation and do not
rely anymore on the TT table for this.
This patch just introduces the infrastructure.
No functional change.
In case of a RootNode or a SpNode move has
been already checked for legality so we can
skip a redundant check.
Spotted by Frank Genot.
No functional change.
In qsearch we should update the bestValue as we do
in case of futilityValue < beta, also when pruning
moves with non-positive see.
Spotted by Lucas Braesch
Bench: 5695710
Send the PV lines to GUI only once at the end of the
PV search loop or just in case of long searches.
We need to sync also sending of "currmove" info to
avoid sending info on current move without first
informing the GUI on the PV line we are searching on.
No functional change.
At this point we have already verified (value > alpha)
and this implies, in case of a non-PV node, where search
window size is zero, that value >= beta.
This is not so self-evident, so document the code with
an assert condition.
No functional change.
Let the caller to decide where to redirect (cout or cerr) the
ASCII representation of the position. Rename the function to
reflect this.
Renamed also from_fen() and to_fen() to set() and fen() respectively.
No functional change.
In case a PvNode node has a static evaluation above alpha but
no available moves we want to flag the node as BOUND_EXACT,
not as BOUND_UPPER as is currently.
The behaviour was recently introduced with patch d471c49700
of 3/10/2012
Spotted by Hongzhi Cheng.
bench: 5558464
Both Lucas re-test and Jean-Francois confirrm it
is a regression.
Here Jean-Francois's results after 3600 games :
Score of 96d3b1c92b vs 3b87314: 690 - 729 - 2181 [0.495] 3600
ELO: -3.86 +- 99%: 14.94 95%: 11.35
LOS: 15.03%
Wins: 690 Losses: 729 Draws: 2181 Total: 3600
Bench: 5404066
Don't prune and eventually extend check moves of type
DISCO_CHECK (pun intended, Lucas will understand :-) ).
Patch from Lucas Braesch that has also tested it:
Result: 879-661-2137, score=52.96%, LOS=100.00% (time 10"+0.1")
I have started a verification test right now.
bench: 6004966
From Jean-Francois's
Final result after 5000 games :
Score of c581b7e vs a878312: 1163 - 970 - 2867 [0.519] 5000
ELO: 13.35 +- 99%: 12.71 95%: 9.65
LOS: 100.00%
Wins: 1163 Losses: 970 Draws: 2867 Total: 5000
From me
After 3266 games at 20"+0,05
Score of c581b7e vs a878312: 612 - 607 - 2047
So no regression at longer TC and perhaps a little gain at
fast TC.
In this case we try a rather drastic
approach: we simply don't futility prune
in qsearch when arriving from a null move.
So we save evaluating and also save to mess
with eval margins at all because margin is used
only in futility.
Also accuracy should not be affected, actually it
improves because we don't prune anything anymore.
bench: 5404066
Performs well at very short TC of 40/4+0.05 (courtesy of Jean-Francois):
Wins: 2503 Losses: 2146 Draws: 5581 Total: 10230 +12 ELO
But is poor at longer TC of 20"+0.05
Wins: 321 Losses: 373 Draws: 1141 Total: 1808 -10 ELO
The patch was clearly a tradoff between speed and accuracy and
the most interesting part of it are test results that can be
commented as follows:
- A short TC is very sensible to any speed increase
- A longer TC is more sensible to accuracy and less to speed
So a patch that does not change speed is suitable to be tested at
short TC, while a speed/accuracy compromise patch is IMO better to
be tested at longer TC to verify loss of accuracy can be tolerated.
In this case the revert is only temporary. We will come back again
once we will be able to preserve the evaluation margin.
bench: 5809010
Reuse the evaluation of the parent with inverted sign and
set margin to zero (this is an hack!).
This is done only in qsearch where almost 15% of calls are
from a null move. In normal search the number of nodes where
(ss-1)->currentMove == MOVE_NULL
is almost zero and so there is no need of using this trick.
The big advantage of this patch is a speed-up due to skipped
evaluate() calls, that are very costly.
Functionality is of course affected and we will need to proper
test it later. For now we just register a 3-4% speed up.
Suggested by Hongzhi Cheng.
bench: 5051328
When testing if a move blocks the threat path there is no
reason to require the threat to be a slider. Indeed threat
can be a double pawn push like in this example:
r1bq1rk1/ppp1np1p/4n1p1/3p4/3P2Q1/2P1B3/PPBN2PP/R4RK1 w - - 0 16
Where white's move Rf6 blocks the threat f5.
As a nice side effect we can retire the now useless helper
piece_is_slider().
This patch kicks in only very rare cases, indeed the bench is
still the same!
bench: 5809010
There is no guarantee that split() consumes all the node's
moves. Indeed split() can return without performing any job
for instance because MAX_SPLITPOINTS_PER_THREAD is reached
or becuase no available threads are found (this latter case
is much more common).
So search must continue in those cases and we cannot force
exiting from move's loop.
Bug introduced by 1ac417edb8 of 5/10/2012
Spotted by Frank Genot.
No functional change.
If a previous move attacks the king (with the piece
of the threat move removed) then must be a discovered
check, otherwise it means that first move gave check
and we were not able to do a null move.
Also renamed stuff to better document the function's
context.
No functional change.
When testing if a piece is moving through the squares
vacated by a previous move there is no reason to require
the piece to be a slider, indeed we can have a double
pawn push like in this example:
r1q2rk1/2p1bppp/2Pp4/pN5b/Q1P1p3/4B2P/PP1R1PP1/1K5R w - - 3 18
Where black's move f5 is connected to previous move Be7 that
frees the path.
Or we can have a castle move:
r1bqkb1r/pppp1ppp/2n1pn2/1B6/4P3/2N2N2/PPPP1PPP/R1BQK2R b KQkq - 5 1
Where a previous move Bb5 allows the white to castle king side.
This time patch is mine ;-)
new bench: 5809010
We send to GUI multi-pv info after each cycle,
not just once at the end of the PV loop. This is
because at high depths a single root search can
be very slow and we want to update the gui as
soon as we have a new PV score.
Idea is good but implementation is broken because
sort() takes as arguments a pointer to the first
element and one past the last element.
So fix the bug and rename sort arguments to better
reflect their meaning.
Another hit by Hongzhi Cheng. Impressive!
No functional change.
When checking if the moving piece p1 in a previous
move m1 defends the destination square of a move m2
we have to use the occupancy with the from square of
m2 removed so to take in account the case in which
f2 will block an x-ray attack from p1.
For instance in this position:
r2k3r/p1pp1pb1/qn3np1/1N2P3/1p3P2/2B5/PPP3QP/R3K2R b KQ - 1 9
The move eXf6 is connected to the previous move Bc3 that
defends the destination square f6.
With this patch we have about 10% more moves detected as
'connected'. Anyhow the absolute number is very low, about
4000 more moves out of 6M nodes searched.
Another issue spotted by Hongzhi "Hawk Eye" Cheng ;-)
new bench: 5757373
On Intel, perhaps due to 'lea' instruction this way of
zeroing the lsb of *b seems faster than a shift+negate.
On perft (where any speed difference is magnified) I
got a 6% speed up on my Intel i5 64bit.
Suggested by Hongzhi Cheng.
No functional change.
Compiler complies that 'cnt' is initialized but
unused (in !CheckThreeFold case). Moving the
definition of 'cnt'out of the loop seems to do
the trick.
No functional change.
Instead of use a variable so to resolve many conditions
already at compile time. In quiesce is also where we
have most of the InCheck nodes and is one of the most
performance critical code paths.
Speed up of 1.5% with Clang and 1% with gcc
Suggested by Hongzhi Cheng.
No functional change.
When checking if a move defends the threatened piece
we correctly remove from the occupancy bitboard the
moved piece. This patch removes from the occupancy also
the threatening piece so to consider the cases of moves
that defend the threatened piece x-raying through the
threat move.
As example in this position:
r3k2r/p1ppqp2/Bn4p1/3p1n2/4P1N1/5Q1P/PPP2P1P/R3K2R w KQkq - 1 10
The threat black move is dxe4. With this patch we include
(and so don't prune) white's Bb7 that would be pruned otherwise.
The number of affected position is very low, around 1% of cases,
so we don't expect ELO changes, neverthless this is the logical
and natural thing to do.
Patch suggested by Hongzhicheng.
new bench: 5323798
ReducedStateInfo is a redundant struct that is also
prone to errors, indeed must be updated any time is
updated StateInfo. It is a trick to partial copy a
StateInfo object in do_move().
This patch takes advantage of builtin macro offsetof()
to directly calculate the number of quad words to copy.
Note that we still use memcpy to do the actual job of
copying the (48 bytes) of data.
Idea by Richard Vida.
No functional and no performance change.
Only in not performance critical code like pretty_pv(),
otherwise continue to use the good old C-style arrays
like in extract/insert PV where I have done some code
refactoring anyhow.
No functional change.
Silly typo (introduced in e304db9d1e) completely
messed up move notation in case of promotions causing
"Illegal move" warning in cutechess-cli.
Reported by Jörg Oster.
No functional change.
In multi-threads runs with debug on we experience some
asserts due to the fact that TT access is intrinsecally
racy and its contents cannot be always trusted so must
be validated before to be used and this is what the
patch does.
No functional case.
And restore old behaviour of not returning from a RootNode
without updating RootMoves[].
Also renamed is_draw() template parameters to reflect a
'positive' logic (Check instead of Skip) that is easier
to follow.
New bench: 5312693
When signal 'stop' is raised we return bestValue
that could be still set at -VALUE_INFINITE and
this triggers an assert. Fix it by returning
a value we know for sure is not +-VALUE_INFINITE.
Reported by 平岡拓也 Hiraoka.
No functional change.
Note that the actual pickup is done in the class
d'tor so to be sure it is always triggered, even
in case of a sudden exit due to a 'stop' signal.
No functional change.
Function move_to_san() requires the Position to be
passed by referenced because a do/undo move is done
inside the function to detect a possible mate and to
add to the san string the corresponding '#' suffix.
Instead of passing a copy of current position pass
directly the original position object after const
casting it. This has the advantage to avoid a costly
Position copy, on the down side a bench test could
report different searched nodes if print(move) is
used, due to the additionals do_move() calls.
No functional change.
Existing Makefile is buggy for PowerPC, it has no
SSE, yet it is given it if Prefetch is enabled,
because it isn't ARMv7.
Patch from Matthew Brades.
No functional change.
We can have ttValue == VALUE_NONE when we use a TT
slot to just save a position static evaluation, but
in this case we also save DEPTH_NONE so to avoid
using the ttValue in search. This happens to work,
but due to a number of lucky and tricky cases that
we now documnet through a bunch of asserts and a
little change to value_from_tt() that has no real
effect but clarifing the code.
No functional change.
On some platforms 128 MB of RAM for TT is too much,
so run 'bench' with the default 32 MB size.
No functional change although of course now 'bench'
reports a different number: 5545018
Use popcount() instead in the only calling place.
It is used only at initialization so there is no
speed regression and anyhow even initialization
itself is not slowed down: magic bitboard setup
stays around 175 msec on my slow 32bit Core Duo.
No functional change.
Simplify the code and doing this introduce a couple
of (very small) functional changes:
- Always compare to depth even in "mate value" condition
- TT cut-off in qsearch also in case of PvNode, as in search
Verified against regression with 2500 games at 30"+0.05
on 2 threads: 451 - 444 - 1602
Functional changed: new bench is 5544977
As Joona says: "The problem is that when doing full
window search (-VALUE_INFINITE, VALUE_INFINITE), and
pruning all the moves will return fail low which is
mate score, because only clause touching alpha is
"mate distance pruning". So we are returning mate score
although we are just pruning all the moves. In reality
there probably is no mate in sight.
Bug spotted and fixed by Joona.
Implement lsb/msb using armv7 assembly instructions.
msb is the easiest one, using a gcc intrinsic that generates
code using the ARM's clz instruction. lsb is also using this
clz instruction, but with the help of ARM's 'rbit' (bit
reversing) instruction. This leads to a >2% speed gain.
I also renamed 'arm-32' to the more meaningfull 'armv7' in the Makefile
No functional change.
Port to qsearch() the same changes we recently
added to search().
Overall this search refactoring series shows
almost 2% speed up on gcc compile.
No functional change.
When using the Makefile (as for the mingw case),
IS_64BIT and USE_BSFQ are already set with
ARCH=x86-64 and do not need to be redefined.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
First disable Contempt Factor during analysis, then
calculate the modified draw score from the point of
view of the player, so from the point of view of
RootPosition color.
Thanks to Ryan Taker for suggesting the fixes.
No functional change.
These kind of arch specific code is really nasty
to make it right becuase you need to verify on
all the platforms.
Now should compile properly also on ARM
Reported by Jean-Francois.
No functional change.
In particular on ARM processors. Original patch by
Jean-Francois, sligtly modified by me to preserve
the meaning of NO_PREFETCH flag.
Verified with gcc, clang and icc that prefetch instruction
is correctly created.
No functional change.
A simplification and also a small speed-up of
about 1% mainly due to reducing calls to
thisThread->cutoff_occurred().
Worst case split point recovering time after a
cut-off occurred is limited to 3 msec on my slow
PC, and usually is below 1 msec, so it seems safe
to remove the cutoff_occurred() check.
No functional change.
This is very crude and very basic: simply in case
of a draw for repetition or 50 moves rule return
a negative score instead of zero according to the
contempt factor (in centipawns). If contempt is
positive engine will try to avoid draws (to use
with weaker opponents), if negative engine will
try to draw. If zero (default) there are no changes.
No functional change.
Use a value related to PawnValue instead.
This is a different patch from previous one because
could affect game play and skill levels, although
in a mostly unmeasurable way. Indeed thresold has
been raised so easy move is a bit harder to trigger
and skill level is a bit more prone to blunders.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Show the real value in the code, not hide it
behind a variable name, especially when there
is only once occurence.
No functional change.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Extend for an extra half-ply in case the node is (probably)
going to fail high. In this case the added overhead is limited.
A novelity is the way this patch has been tested: Always in
self-play but with a much longer TC to allow the singular
extension to fully kick in and also (my impression) to have
less noisy results.
Ater 1015 games on my QUAD at 60"+0.05
Mod vs Orig 173 - 150 - 692 ELO +8
Handle also the SMP case. This has been quite tricky, not
trivial to enforce the node limit in SMP case becuase
with "helpful master" concept we can have recursive split
points and we cannot lock them all at once so there is the
risk of counting the same nodes more than once.
Anyhow this patch should be race free and counted nodes are
correct.
No functional change.
As long as isPvMove (renamed to pvMove) is set after
legality check, we can postpone legality even in PV case.
Patch aligns the PV case with the common non-pv one.
No functional change.
Patch and tuning by Gary Linscott from an idea of Ryan Taker.
Double tested by Gary:
Wins: 3390 Losses: 2972 Draws: 11323
LOS: 99.999992%
ELO: 8.213465 +- 99%: 6.746506 95%: 5.124415
Win%: 51.181792 +- 99%: 0.969791 95%: 0.736740
And by me:
After 5612 games 1255 1085 3271 +11 ELO
We have a crash with this position:
rkqbnnbr/pppppppp/8/8/8/8/PPPPPPPP/RKQBNNBR w HAha -
What happens is that even if we are castling QUEEN_SIDE,
in this case we have kfrom (B8) < kto (C8) so the loop
that checks for attackers runs forever leading to a crash.
The fix is to check for (kto > kfrom) instead of
Side == KING_SIDE, but this is slower in the normal case of
ortodhox chess, so rewrite generate_castle() to handle the
chess960 case as a template parameter and allow the compiler
to optimize out the comparison in case of normal chess.
Reported by Ray Banks.
Broken by commit a44c5cf4f7 of 3 /12 / 2011 that
was labeled "No functional change" because our 'bench'
test didn't triggered that particular endgame. Indeed
we need to run a specific bench on a set of endgames
position when touching endgame.cpp because normal bench
does not cover endgames properly.
Found by MSVC 2012 code analyzer.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.