AlphaGo Wins Final Game In Match Against Champion Go Player

AlphaGo, a largely self-taught Go-playing AI, last night won the fifth and final game in a match held in Seoul, South Korea, against that country’s Lee Sedol. Sedol is one of the greatest modern players of the ancient Chinese game. The final score was 4 games to 1.

Thus falls the last and computationally hardest game that programmers have taken as a test of machine intelligence. Chess, AI’s original touchstone, fell to the machines 19 years ago, but Go had been expected to last for many years to come.

The sweeping victory means far more than the US $1 million prize, which Google’s London-based acquisition, DeepMind, says it will give to charity. That’s because AlphaGo, for all its processing power, mainly owes its victory to a radical new way of using that power: via deep neural networks. These networks can train themselves with only a little intervention from human beings, and DeepMind’s researchers had already demonstrated that they can master a wide range of computer video games. The researchers hope that this generalizability can be carried over to the mastering of practical tasks in many other domains, including medicine and robotics.

Game programming began with chess, using methods first sketched out by Claude Shannon and Alan Turing in the 1940s. A machine calculates every possible continuation for each side, working its way as many moves ahead as it can and so generating a tree of analysis with millions of game positions. It then grades the positions by applying rules of thumb that even beginning chess players know, such as the differing values of the various pieces and the importance of controlling the center of the board. Finally, the algorithm traces its way from those end positions back to the current position to find the move that leads to the best outcome, assuming perfect play on both sides.

With modern hardware, this “brute-force” method can produce a strong chess-playing program. Add a grab-bag of tricks to “prune” the analysis tree, throwing out bad lines so the program can explore promising lines more deeply, and you get world-champion-level play. That came in 1997, when IBM’s Deep Blue supercomputer defeated then-World Chess Champion Garry Kasparov. Today you can download a US $100 program that plays even better—on a laptop.

Though some researchers have argued for some time that brute-force searching can in principle conquer Go, the game has long resisted such efforts. Compared to chess, the Chinese game offers far more moves in a given position and far more moves in a typical game, creating an intractably huge tree of analysis. It also lacks reliable rules of thumb for the grading of positions.

In recent years, many programmers have tried to get around this problem with Monte Carlo simulation, a statistical means of finding the best first move from a vast database of the games that might begin from a given position. That method is also used a bit in AlphaGo, together with the tree-generating methods of yore. But the key improvement is AlphaGo’s use of deep neural networks to recognize patterns.

At a quiet moment, 42 minutes into the streaming of the match’s second game, on 10 March, one of the online commenters, Google’s Thore Graepel, described his first over-the-board encounter with an early form of AlphaGo a year ago—on his first day of work at Deep Mind’s London office. “I thought, neural network, how difficult can it be? It cannot even do reading of positions, it just does pattern recognitions,” Graepel said. “I sat down in front of the board, a small crowd gathered round, and in a small time, my position started to deteriorate… I ended up losing, I tried again and lost again. At least at that point the office knew me, I had a good introduction!”

AlphaGo uses two neural networks, a policy network that was trained on millions of master games with the goal of imitating their play, and a value network, that tries to assign a winning probability to each given position. That way, the machine can focus its efforts on the most promising continuations. Then comes the tree-searching part, which tries to look many moves ahead.

“One way to think of it is that the policy network provides a guide, suggesting to AlphaGo moves to consider; but AlphaGo can then go on beyond that and come up with a new conclusion that overwhelms the suggestion by the policy network,” explained David Silver, the leader of the AlphaGo team, in online commentary last night, just before the final game. “At every part of the search tree, it’s using the policy network to suggest moves and the value network to evaluate moves. The policy network alone was enough to beat Graepel, an accomplished amateur player, on his first day in the office.”

A strange consequence of AlphaGo’s division of labor is the way it plays once it thinks it has a clearly winning game. A human player would normally try to win by the largest possible margin, by capturing not just one extra point on the board, but 10 or 20 points, if possible. That way, the human would be likely to win even if he later makes a small mistake. But AlphaGo prefers to win by one point, at what it considers a high probability, over winning, say, by 20 points, at a rather lower probability.

You might think that this tendency to go for the safe-but-slack move is what enabled Lee Sedol to win the fourth game, on Sunday. And indeed, commentators at the time noted that the machine seemed to have the upper hand when Sedol pounced with an unexpected move, after which the machine played some weak moves. Sedol had used up a lot of time on his clock and so had to scramble to make his following moves, but in the end he was able to sustain his advantage and finally win.

However it wasn’t slackness but sheer surprise that caused the problem, members of the DeepMind team said last night, in commentary before thev final game. “That crucial move that Lee found, move 78, was considered very unlikely to be played by a human—[the program] estimated a one in 10,000 chance,” said David Silver, the team leader. “It’s fair to say that AlphaGo was surprised by this move; it had to start replanning at that point.”

A human player faced with a strange-looking move would study it deeply, if there was enough time to do so—and AlphaGo had plenty of time. “But AlphaGo has a simple time-control strategy,” Silver noted. “Maybe that’s something we can work on in future.”

So, it seems, efforts to improve AlphaGo will continue.

“An exciting direction for future research is to consider whether a machine can learn completely by itself, without any human examples, to achieve this level of performance,” Silver said.

The final goal, of course, is to create an all-around learning machine, one that can learn to do a lot of things that people now get paid to do. Like, say, reporting and writing blog posts like this one.

chess alphago deepmind robot ai deep blue networks google

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

AlphaGo Wins Final Game In Match Against Champion Go Player

The AI owes its success to self-training deep neural networks, which can, in principle, be applied to other domains. Like your job.

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

Nvidia Backs Key Optical Tech

The Chain Reaction That Propels Civilization

Cryptocurrency Blockchains Don’t Need To Be Energy Intensive

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

AlphaGo Wins Final Game In Match Against Champion Go Player

The AI owes its success to self-training deep neural networks, which can, in principle, be applied to other domains. Like your job.

To Prevent a Shark Attack, Try Electric Fields

Digital Twin Center Loses Federal Funding

Will Dectravalve Transform EV Charging Speeds?

Related Stories

Nvidia Backs Key Optical Tech

The Chain Reaction That Propels Civilization

Cryptocurrency Blockchains Don’t Need To Be Energy Intensive