Deep Learning Machine Teaches Itself Chess in 72 Hours, Plays at International Master Level

MIT Technology Review:

In the last few years, neural networks have become hugely powerful thanks to two advances. The first is a better understanding of how to fine-tune these networks as they learn, thanks in part to much faster computers. The second is the availability of massive annotated datasets to train the networks.

That has allowed computer scientists to train much bigger networks organized into many layers. These so-called deep neural networks have become hugely powerful and now routinely outperform humans in pattern recognition tasks such as face recognition and handwriting recognition.

So it’s no surprise that deep neural networks ought to be able to spot patterns in chess and that’s exactly the approach Lai has taken. His network consists of four layers that together examine each position on the board in three different ways.

The first looks at the global state of the game, such as the number and type of pieces on each side, which side is to move, castling rights and so on. The second looks at piece-centric features such as the location of each piece on each side, while the final aspect is to map the squares that each piece attacks and defends.

Lai trains his network with a carefully generated set of data taken from real chess games. This data set must have the correct distribution of positions. “For example, it doesn’t make sense to train the system on positions with three queens per side, because those positions virtually never come up in actual games,” he says.

It must also have plenty of variety of unequal positions beyond those that usually occur in top level chess games.  That’s because although unequal positions rarely arise in real chess games, they crop up all the time in the searches that the computer performs internally.

And this data set must be huge. The massive number of connections inside a neural network have to be fine-tuned during training and this can only be done with a vast dataset. Use a dataset that is too small and the network can settle into a state that fails to recognize the wide variety of patterns that occur in the real world.