AlphaZero masters chess, shogi and Go

In December 2017, DeepMind published the paper “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” introducing AlphaZero. Starting from random play and given no knowledge beyond the rules of each game, the abstract reports that AlphaZero reached a superhuman level of play in chess and shogi within 24 hours and convincingly defeated a world-champion program in each case, as well as in Go.

What made AlphaZero striking was its generality and its method. A single algorithm, combining a deep neural network with a tree search, learned all three games purely by playing against itself millions of times, with no human game data and no hand-crafted evaluation rules. DeepMind’s own account describes how it defeated the top chess engine Stockfish, the top shogi engine Elmo, and its predecessor AlphaGo Zero, and how it developed unconventional, dynamic playing styles.

AlphaZero generalized the self-play approach pioneered by AlphaGo into a single template for mastering perfect-information games. Its results were later confirmed and detailed in a paper in the journal Science, and the self-play, search-plus-learning recipe influenced reinforcement learning research broadly.