Since the era of mechanical machines that played rook and king endings and electrical machines that played other games like
hex in the early years of the 20th century, scientists and theoreticians have sought to develop a procedural representation of how humans learn, remember, think and apply knowledge, and the game of chess, because of its daunting complexity, became the "
Drosophila of artificial intelligence (AI)". The procedural resolution of complexity became synonymous with thinking, and early computers, even before the chess automaton era, were popularly referred to as "electronic brains". Several different schema were devised starting in the latter half of the 20th century to represent knowledge and thinking, as applied to playing the game of chess (and other games like checkers): • Search based (brute force vs selective search) • Search in search based schema (
minimax/
alpha-beta,
Monte Carlo tree search) • Evaluations in search based schema (
machine learning,
neural networks,
texel tuning,
genetic algorithms,
gradient descent,
reinforcement learning) •
Knowledge based (PARADISE,
endgame tablebases) Using "ends-and-means" heuristics a human chess player can intuitively determine optimal outcomes and how to achieve them regardless of the number of moves necessary, but a computer must be systematic in its analysis. Most players agree that
looking at least five moves ahead (ten
plies) when necessary is required to play well. Normal tournament rules give each player an average of three minutes per move. On average there are more than 30 legal moves per chess position, so a computer must examine a quadrillion possibilities to look ahead ten plies (five full moves); one that could examine a million positions a second would require more than 30 years. The earliest attempts at procedural representations of playing chess predated the digital electronic age, but it was the stored program digital computer that gave scope to calculating such complexity. Claude Shannon, in 1949, laid out the principles of algorithmic solution of chess. In that paper, the game is represented by a "tree", or digital data structure of choices (branches) corresponding to moves. The nodes of the tree were positions on the board resulting from the choices of move. The impossibility of representing an entire game of chess by constructing a tree from first move to last was immediately apparent: there are an average of 36 moves per position in chess and an average game lasts about 35 moves to resignation (60-80 moves if played to checkmate, stalemate, or other draw). There are 400 positions possible after the first move by each player, about 200,000 after two moves each, and nearly 120 million after just 3 moves each. So a limited lookahead (search) to some depth, followed by using domain-specific knowledge to evaluate the resulting terminal positions was proposed. A kind of middle-ground position, given good moves by both sides, would result, and its evaluation would inform the player about the goodness or badness of the moves chosen. Searching and comparing operations on the tree were well suited to computer calculation; the representation of subtle chess knowledge in the evaluation function was not. The early chess programs suffered in both areas: searching the vast tree required computational resources far beyond those available, and what chess knowledge was useful and how it was to be encoded would take decades to discover. The developers of a chess-playing computer system must decide on a number of fundamental implementation issues. These include: •
Graphical user interface (GUI) – how moves are entered and communicated to the user, how the game is recorded, how the time controls are set, and other interface considerations •
Board representation – how a single position is represented in data structures; • Search techniques – how to identify the possible moves and select the most promising ones for further examination; •
Leaf evaluation – how to evaluate the value of a board position, if no further search will be done from that position.
Adriaan de Groot interviewed a number of chess players of varying strengths, and concluded that both
masters and beginners look at around forty to fifty positions before deciding which move to play. What makes the former much better players is that they use
pattern recognition skills built from experience. This enables them to examine some lines in much greater depth than others by simply not considering moves they can assume to be poor. More evidence for this being the case is the way that good human players find it much easier to recall positions from genuine chess games, breaking them down into a small number of recognizable sub-positions, rather than completely random arrangements of the same pieces. In contrast, poor players have the same level of recall for both. The equivalent of this in computer chess are
evaluation functions for leaf evaluation, which correspond to the human players' pattern recognition skills, and the use of machine learning techniques in training them, such as Texel tuning,
stochastic gradient descent, and
reinforcement learning, which corresponds to building experience in human players. This allows modern programs to examine some lines in much greater depth than others by using forwards pruning and other selective heuristics to simply not consider moves the program assume to be poor through their evaluation function, in the same way that human players do. The only fundamental difference between a computer program and a human in this sense is that a computer program can search much deeper than a human player could, allowing it to search more nodes and bypass the
horizon effect to a much greater extent than is possible with human players.
Graphical user interface Computer chess programs usually support a number of common
de facto standards. Nearly all of today's programs can read and write game moves as
Portable Game Notation (PGN), and can read and write individual positions as
Forsyth–Edwards Notation (FEN). Older chess programs often only understood
long algebraic notation, but today users expect chess programs to understand standard
algebraic chess notation. Starting in the late 1990s, programmers began to develop separately
engines (with a
command-line interface which calculates which moves are strongest in a position) or a
graphical user interface (GUI) which provides the player with a chessboard they can see, and pieces that can be moved. Engines communicate their moves to the GUI using a protocol such as the Chess Engine Communication Protocol (CECP) or
Universal Chess Interface (UCI). By dividing chess programs into these two pieces, developers can write only the user interface, or only the engine, without needing to write both parts of the program. (See also
chess engine.) Developers have to decide whether to connect the engine to an opening book and/or endgame
tablebases or leave this to the GUI.
Board representations The
data structure used to represent each chess position is key to the performance of move generation and
position evaluation. Methods include pieces stored in an array ("mailbox" and "0x88"), piece positions stored in a list ("piece list"), collections of bit-sets for piece locations ("
bitboards"), and
huffman coded positions for compact long-term storage.
Search techniques Computer chess programs consider chess moves as a
game tree. In theory, they examine all moves, then all counter-moves to those moves, then all moves countering them, and so on, where each individual move by one player is called a "
ply". This evaluation continues until a certain maximum search depth or the program determines that a final "leaf" position has been reached (e.g. checkmate).
Minimax search One particular type of search algorithm used in computer chess are
minimax search algorithms, where at each ply the "best" move by the player is selected; one player is trying to maximize the score, the other to minimize it. By this alternating process, one particular terminal node whose evaluation represents the searched value of the position will be arrived at. Its value is backed up to the root, and that evaluation becomes the valuation of the position on the board. This search process is called minimax. A naive implementation of the minimax algorithm can only search to a small depth in a practical amount of time, so various methods have been devised to greatly speed the search for good moves.
Alpha–beta pruning, a system of defining upper and lower bounds on possible search results and searching until the bounds coincided, is typically used to reduce the search space of the program. In addition, various selective search heuristics, such as
quiescence search, forward pruning, search extensions and search reductions, are also used as well. These heuristics are triggered based on certain conditions in an attempt to weed out obviously bad moves (history moves) or to investigate interesting nodes (e.g. check extensions,
passed pawns on seventh
rank, etc.). These selective search heuristics have to be used very carefully however. If the program overextends, it wastes too much time looking at uninteresting positions. If too much is pruned or reduced, there is a risk of cutting out interesting nodes.
Monte Carlo tree search Monte Carlo tree search (MCTS) is a heuristic search algorithm which expands the search tree based on random sampling of the search space. A version of Monte Carlo tree search commonly used in computer chess is PUCT, Predictor and Upper Confidence bounds applied to Trees. DeepMind's
AlphaZero and
Leela Chess Zero uses MCTS instead of minimax. Such engines use
batching on
graphics processing units in order to calculate their
evaluation functions and policy (move selection), and therefore require a
parallel search algorithm as calculations on the GPU are inherently parallel. The minimax and alpha-beta pruning algorithms used in computer chess are inherently serial algorithms, so would not work well with batching on the GPU. On the other hand, MCTS is a good alternative, because the random sampling used in Monte Carlo tree search lends itself well to parallel computing, and is why nearly all engines which support calculations on the GPU use MCTS instead of alpha-beta.
Other optimizations Many other optimizations can be used to make chess-playing programs stronger. For example,
transposition tables are used to record positions that have been previously evaluated, to save recalculation of them.
Refutation tables record key moves that "refute" what appears to be a good move; these are typically tried first in variant positions (since a move that refutes one position is likely to refute another). The drawback is that transposition tables at deep ply depths can get quite large – tens to hundreds of millions of entries. IBM's Deep Blue transposition table in 1996, for example was 500 million entries. Transposition tables that are too small can result in spending more time searching for non-existent entries due to threshing than the time saved by entries found. Many chess engines use
pondering, searching to deeper levels on the opponent's time, similar to human beings, to increase their playing strength. Of course, faster hardware and additional memory can improve chess program playing strength. Hyperthreaded architectures can improve performance modestly if the program is running on a single core or a small number of cores. Most modern programs are designed to take advantage of multiple cores to do parallel search. Other programs are designed to run on a general purpose computer and allocate move generation, parallel search, or evaluation to dedicated processors or specialized co-processors.
History The
first paper on chess search was by
Claude Shannon in 1950. He predicted the two main possible search strategies which would be used, which he labeled "Type A" and "Type B", before anyone had programmed a computer to play chess. Type A programs would use a "
brute force" approach, examining every possible position for a fixed number of moves using a pure naive
minimax algorithm. Shannon believed this would be impractical for two reasons. First, with approximately thirty moves possible in a typical real-life position, he expected that searching the approximately 109 positions involved in looking three moves ahead for both sides (six
plies) would take about sixteen minutes, even in the "very optimistic" case that the chess computer evaluated a million positions every second. (It took about forty years to achieve this speed.) A later search algorithm called
alpha–beta pruning, a system of defining upper and lower bounds on possible search results and searching until the bounds coincided, reduced the branching factor of the game tree logarithmically, but it still was not feasible for chess programs at the time to exploit the exponential explosion of the tree. Second, it ignored the problem of quiescence, trying to only evaluate a position that is at the end of an
exchange of pieces or other important sequence of moves ('lines'). He expected that adapting minimax to cope with this would greatly increase the number of positions needing to be looked at and slow the program down still further. He expected that adapting type A to cope with this would greatly increase the number of positions needing to be looked at and slow the program down still further. This led naturally to what is referred to as "selective search" or "type B search", using chess knowledge (heuristics) to select a few presumably good moves from each position to search, and prune away the others without searching. Instead of wasting processing power examining bad or trivial moves, Shannon suggested that type B programs would use two improvements: • Employ a
quiescence search. • Employ forward pruning; i.e. only look at a few good moves for each position. This would enable them to look further ahead ('deeper') at the most significant lines in a reasonable time. However, early attempts at selective search often resulted in the best move or moves being pruned away. As a result, little or no progress was made for the next 25 years dominated by this first iteration of the selective search paradigm. The best program produced in this early period was Mac Hack VI in 1967; it played at the about the same level as the average amateur (C class on the United States Chess Federation rating scale). Meanwhile, hardware continued to improve, and in 1974, brute force searching was implemented for the first time in the Northwestern University Chess 4.0 program. In this approach, all alternative moves at a node are searched, and none are pruned away. They discovered that the time required to simply search all the moves was much less than the time required to apply knowledge-intensive heuristics to select just a few of them, and the benefit of not prematurely or inadvertently pruning away good moves resulted in substantially stronger performance. In the 1980s and 1990s, progress was finally made in the selective search paradigm, with the development of
quiescence search, null move pruning, and other modern selective search heuristics. These heuristics had far fewer mistakes than earlier heuristics did, and was found to be worth the extra time it saved because it could search deeper and widely adopted by many engines. While many modern programs do use
alpha-beta search as a substrate for their search algorithm, these additional selective search heuristics used in modern programs means that the program no longer does a "brute force" search. Instead they heavily rely on these selective search heuristics to extend lines the program considers good and prune and reduce lines the program considers bad, to the point where most of the nodes on the search tree are pruned away, enabling modern programs to search very deep. In 2006,
Rémi Coulom created
Monte Carlo tree search, another kind of type B selective search. In 2007, an adaption of Monte Carlo tree search called Upper Confidence bounds applied to Trees or UCT for short was created by Levente Kocsis and Csaba Szepesvári. In 2011, Chris Rosin developed a variation of UCT called Predictor + Upper Confidence bounds applied to Trees, or PUCT for short. PUCT was then used in
AlphaZero in 2017, and later in
Leela Chess Zero in 2018.
Knowledge versus search (processor speed) In the 1970s, most chess programs ran on super computers like Control Data Cyber 176s or Cray-1s, indicative that during that developmental period for computer chess, processing power was the limiting factor in performance. Most chess programs struggled to search to a depth greater than 3 ply. It was not until the hardware chess machines of the 1980s, that a relationship between processor speed and knowledge encoded in the evaluation function became apparent. It has been estimated that doubling the computer speed gains approximately fifty to seventy
Elo points in playing strength .
Leaf evaluation For most chess positions, computers cannot look ahead to all possible final positions. Instead, they must look ahead a few
plies and compare the possible positions, known as leaves. The algorithm that evaluates leaves is termed the "evaluation function", and these algorithms are often vastly different between different chess programs. Evaluation functions typically evaluate positions in hundredths of a pawn (called a centipawn), where by convention, a positive evaluation favors White, and a negative evaluation favors Black. However, some evaluation function output win/draw/loss percentages instead of centipawns. Historically, handcrafted evaluation functions consider material value along with other factors affecting the strength of each side. When counting up the material for each side, typical values for pieces are 1 point for a
pawn, 3 points for a
knight or
bishop, 5 points for a
rook, and 9 points for a
queen. (See
Chess piece relative value.) The
king is sometimes given an arbitrarily high value such as 200 points (
Shannon's paper) to ensure that a checkmate outweighs all other factors . In addition to points for pieces, most handcrafted evaluation functions take many factors into account, such as pawn structure, the fact that a pair of bishops are usually worth more, centralized pieces are worth more, and so on. The protection of kings is usually considered, as well as the phase of the game (opening, middle or endgame).
Machine learning techniques such as Texel turning,
stochastic gradient descent, or
reinforcement learning are usually used to optimise handcrafted evaluation functions. Most modern evaluation functions make use of
neural networks. The most common evaluation function in use today is the
efficiently updatable neural network, which is a shallow neural network whose inputs are
piece-square tables. Piece-square tables are a set of 64 values corresponding to the squares of the chessboard, and there typically exists a piece-square table for every piece and colour, resulting in 12 piece-square tables and thus 768 inputs into the neural network. In addition, some engines use
deep neural networks in their evaluation function. Neural networks are usually trained using some
reinforcement learning algorithm, in conjunction with
supervised learning or
unsupervised learning. The output of the evaluation function is a single scalar, quantized in centipawns or other units, which is, in the case of handcrafted evaluation functions, a weighted summation of the various factors described, or in the case of neural network based evaluation functions, the output of the head of the neural network. The evaluation putatively represents or approximates the value of the subtree below the evaluated node as if it had been searched to termination, i.e. the end of the game. During the search, an evaluation is compared against evaluations of other leaves, eliminating nodes that represent bad or poor moves for either side, to yield a node which by convergence, represents the value of the position with best play by both sides. == Endgame tablebases ==