Can Machine Learning Teach Us Anything?

Games, Computers, and Humans


The breathless headline caught my eye: “Computer Shows Human Intuition—AI Breakthrough!” (or words to that effect). I was intrigued but skeptical. Reading further, I learned that a computer program, AlphaZero, developed by a team at DeepMind, in London, had beaten other champion chess-playing programs, as well as (of course) humans. That wasn’t the interesting news, as we take that kind of dominance for granted these days. What fascinated me was how the program had been constructed. Instead of being tuned by expert players, AlphaZero initially knew nothing more than the rules of chess. It learned how to play, and to win, by playing against itself. Soon it got so good it could beat everyone and everything.

But, I wondered, isn’t this what humans have been doing for centuries—learning by playing chess against ourselves? What, if anything, has the computer learned so quickly that we haven’t in all those years? Unfortunately, the neural network isn’t telling us. It appears, for instance, that it sacrifices pieces to gain position at a greater frequency than humans would usually attempt. Whatever, I am still intrigued, but also still skeptical reading headlines about the software showing human intuition, which is defined as “immediate apprehension or cognition without reasoning.” Is AlphaZero evoking human intuition, or is this superhuman intuition? Or should it not be called intuition at all?

Similar claims are being made about a new poker-playing program from a team at the University of Alberta, in Canada, called DeepStack, which has trounced human opponents in Texas hold ’em. The researchers write that it “plays using ‘intuition’ honed through deep learning to reassess its strategy with each decision.” To me, having little poker experience, this was a revelation: Contrary to my naive belief that winning at poker was based on psyching out opponents, poker is really a game of strategy. And the computer has learned a better strategy than we have discovered on our own.

A couple of years ago, another poker program, Libratus, from Carnegie Mellon University, in Pittsburgh, bested human champions. Libratus uses a technique, unhelpfully called Monte Carlo Counterfactual Regret Minimization, which is a clever way to prune an enormous decision tree and to choose among the multitude of possible pathways. This technique has been shown to lead to a Nash equilibrium strategy in which neither player can gain by changing strategy (assuming fixed strategies by the others). In other words, this results in a tie among equally proficient and knowledgeable participants.

This strategy interests and puzzles me. It seems to me that the computer is playing not to lose. I would assume that the human players are playing to win, and in so doing, lose. This seems counterintuitive. I recall too well all the times I have yelled at my TV when my sports teams have been in the lead and start playing not to lose, and then do lose. I suppose those teams’ don’t-lose strategies are not so optimal as those of the poker-playing program.

A deeper understanding of all this eludes me. However, I’m left with a residue of envy of those engineers working on these game-playing programs. What a great privilege! I know that it isn’t all fun and games (literally), and that there is a great deal of nitty-gritty as well as creative thought involved. Nonetheless, I am indeed intrigued. This is great stuff.