AlphaZero

AlphaZero vs Stockfish 

On December 5, 2017, Google DeepMind announced that it had developed an automated learning program -- AlphaZero (hereafter AZ)-- which trained itself to play Go, and chess, and Shogi in only a few hours and then decisively beat the world's best programs at each: 

    • beat Elmo at Shogi: +90 =2 -8
    • beat AlphaGO at Go: +60 =0 -40 (there are no draws in Go)
    • beat Stockfish at Chess: +28 =72 -0.

This result follows DeepMind's previous high-profile successes in mastering Go, decades before many experts thought that would be possible:

    1. March 2016: AlphaGo defeated 18-time Go World Champion Lee Sedol: 4-1.
    2. May 2017: AlphaGo defeated current world #1 Go player Ke Jie: 3-0.
    3. October 2017: AlphaGo Zero defeated AlphaGo by a score of 100-0.

The paper, which has not been peer-reviewed, describing the experiment, along with 10 games between AlphaZero and Stockfish can be downloaded here.

We'll have more on this in a few days, including:

Q: Did AZ really take only 4 hours to learn chess?
A: Yes. Not counting the years of development in creating self-learning algorithms and the super-fast hardware to run them.

Q: Are these 10 games a fair representation of AZ's chess ability?
A: Obviously not: AZ scored 64% in the match, but scores 100% in the published sample.

Q: Was this a fair test for Stockfish?
A: No. There are objections to the lack of opening book, no endgame tablebases, the unusual time control, the exceptionally low level of memory cache allowed to Stockfish relative to the number of threads it ran on, the fact that they used a year-old version of the constantly changing program...

Q: So, is AlphaZero not stronger than Stockfish?
A: It is clearly much stronger, though probably by less than the 64-36 score indicates.

Q: But their paper says SF was analyzing way more positions per second than AZ, doesn't that mean SF had an advantage?
A: No. It means SF and AZ spend their calculating time differently: SF looks at lots of positions, AZ applies more time-consuming evaluations to each position.

Q: When will I be able to buy AlphaZero for chess?
A: Don't hold your breath. Google DeepMind did not sell the programs it made for Go, and (even if it did) they all run on hardware that goes about 1000 times faster than you could run on a desktop CPU with an AMD Ryzen 9.

Q: Will AlphaZero StockPicker revolutionize investing?
A: Not for long, if at all. Stock markets, unlike boardgames, are chaotic type-2 systems: choices made by agents affect the system in unpredictable ways...

Q: Was it just a coincidence that the first round of the 2017 London Chess Classic was played at the home of Google DeepMind the same week as their paper about AlphaZero was published?
A: :-) 


Chess Games First

There are many interesting things about this story, some of which we may return to here. But, as chess-lovers, the most interesting thing about the story is that there is now a chess-playing entity which is several orders of magnitude better than the best publicly-available programs, which are themselves several orders of magnitude better than the best human players, and that (some of) its games are available. So what does it play like??

All 10 games, with notes, can be replayed in the viewer below. If you need some encouraging, how about 7-time Russian Champion Peter Svidler:

"The games were absolutely fantastic, phenomenal.... I'm not amazed with the fact that it learned chess, but I was stunned by the games' quality."

I think all 10 games are worth playing through, but if you're short of time, then play at least the following three:

  1. game 3: a QID where AZ sacrifices a pawn and an exchange to trap Black's Q on h8 and win by zugzwang.
  2. game 10: another QID, where AZ allows a N to be captured at move 19 just to gain a development lead which doesn't turn into a (humanly) clear win until after move 33.
  3. game 7: Karpov 2.0: no fireworks, but a Karpov-like win where AZ didn't seem to do anything... until its opponent was lost.

() - ()
 
 Round:  Result:

..