Complex information arising from random processes: the chess program AlphaZero

By Joe Felsenstein

January 28, 2020 12:00 MST

(This is a guest post put up by Joe for commenter MaryKaye, who is its author)

A sticking point among a lot of ID/Creationists seems to be whether information can be generated by a non-intelligent process.

Computer chess provides a very dramatic example. Previously, chess-playing algorithms were designed by software developers in concert with strong human players. In 2017, however, the AlphaZero program was created by initializing a neural net with the rules of chess, and then having it play millions of games against itself, updating its neural net with each game. After 24 hours of such self-play, AlphaZero won a 100-game match against Stockfish, the currently strongest human-designed algorithm, with the score of 28 wins, 82 draws, no losses. (It was not allowed to update its net based on its games with Stockfish, so did not “learn” from playing the champion; all it had was its experience playing against itself.)

You can read about this at:

https://arxiv.org/abs/1712.01815 (preprint of paper subsequently published in Science)

This approach works because, while the question of whether a chess move is good or bad is extremely difficult, the question of whether a terminal position in a game is a win, loss or draw is easy to determine: there is a clear “fitness criterion”. The algorithm played like a blithering idiot at first, but versions of its net that had more successful weightings rapidly outcompeted the others, and its play became steadily more “purposeful.” Interestingly, there was some apparent recapitulation of human improvement in chess: AlphaZero had an early infatuation with opening variations which had been abandoned by the top human players, but like the humans, eventually moved away from them. (I got some teasing over this, as my favorite opening variation is one of those that AlphaZero discarded as it “matured.”)

People have asked whether the programmers “cheated” and snuck some chess knowledge, beyond the rules, into AlphaZero. But AlphaZero calculates substantially fewer positions per second than Stockfish (a major handicap) and yet outperforms it. If we knew how to make an algorithm that much better than Stockfish by human design, we’d have done so – there is a substantial prize involved in winning the World Computer Chess Championship, as well as prestige.

The human World Champion, among others, studied AlphaZero’s games and noted that its style is different from both previous AIs and top human players. It is amazingly willing to give up its pieces for what a human would consider to be a transient initiative. Playing over the games, one gets the feeling that “no one ever told it” the relative values of the pieces. The World Champion hopes to incorporate lessons learned from AlphaZero’s games into his own play. (It would take a better player than myself to say whether he is succeeding.)

So, AlphaZero began with the basic rules of chess and a randomized neural net, and ended with a highly sophisticated algorithm adapted to winning chess games. (Its sibling AlphaZero Go did the same thing in go, which is felt to be the more difficult game.) Hasn’t there been a gain of information here?

The information, as Joe Felsenstein points out, did not come “from nothing.” A lot of energy had to be input to play all those training games. But it didn’t come from pre-existing information. Pouring energy into a system with a low information content has generated a system with a higher information content, at least for an intuitive understanding of “information.” Certainly the functionality of the system has increased. The version of AlphaZero that had only 3 hours of play time to develop its neural net loses, every time, to the one that had 9 hours. That’s a pretty clear “fitness” difference.

Key to AlphaZero’s success was the ability to run it in parallel on a large computer cluster, enabling the huge number of games needed to be accomplished quickly. A large biological population has the same quality: it consists of a whole lot of interactions being evaluated “in parallel.”

To me this pretty dramatically answers the question of whether information content can increase. Clearly it did: the terminal state of AlphaZero knows how to play chess with superhuman ability, whereas the starting state knew nothing but how to make a legal move and determine if that terminated the game and if so, with what outcome (i.e. it knew a checkmate when it stumbled into one).