AlphaGo Zero's neural network was trained using
TensorFlow, with 64 GPU workers and 19 CPU parameter servers. Only four
TPUs were used for inference. The
neural network initially knew nothing about
Go beyond the
rules. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in
reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome. In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession. It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level. According to Epoch.ai, training cost 3e23 FLOPs. For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run. DeepMind submitted its initial findings in a paper to
Nature in April 2017, which was then published in October 2017. ==Hardware cost==