Monte Carlo method The
Monte Carlo method, which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to the 1940s. In his 1987 PhD thesis, Bruce Abramson combined
minimax search with an
expected-outcome model based on random game playouts to the end, instead of the usual
static evaluation function. Abramson said the expected-outcome model "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent." He experimented in-depth with
tic-tac-toe and then with machine-generated evaluation functions for
Othello and
chess. Such methods were then explored and successfully applied to heuristic search in the field of
automated theorem proving by W. Ertel, J. Schumann and C. Suttner in 1989, thus improving the exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or
iterative deepening. In 1992, B. Brügmann employed it for the first time in a
Go-playing program. In 2006, inspired by its predecessors,
Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search, L. Kocsis and Cs. Szepesvári developed the UCT (Upper Confidence bounds applied to Trees) algorithm, and the Fuego program began to win against strong amateur players in 9×9 Go. In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an
amateur 2 dan player.
Google Deepmind developed the program
AlphaGo, which in October 2015 became the first Computer Go program to beat a professional human Go player without
handicaps on a full-sized 19x19 board. In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating
Lee Sedol in
a five-game match with a final score of four games to one. AlphaGo represents a significant improvement over previous Go programs as well as a milestone in
machine learning as it uses Monte Carlo tree search with
artificial neural networks (a
deep learning method) for policy (move selection) and value, giving it efficiency far surpassing previous programs. The MCTS algorithm has also been used in programs that play other
board games (for example
Hex,
Havannah,
Game of the Amazons, and
Arimaa), real-time video games (for instance
Ms. Pac-Man and
Fable Legends), and nondeterministic games (such as
skat,
poker,
Magic: The Gathering, or
Settlers of Catan). == Principle of operation ==