DIY

Data Mining Scrabble

A study of 10 million game simulations finds the true worth of Scrabble tiles

Illustration: Emily Cooper; Source: Andrew C. Thomas
Click on illustration for larger view.

Chess had its Deep Blue. "Jeopardy" had its Watson. Baseball has its sabermetrics, as chronicled in the hit book and filmMoneyball. In each game, data mining has upended the field of play. And now the same big-league technologies are about to hit Scrabble.

Using an open-source artificial-intelligence crossword game program called Quackle, Andrew C. Thomas, a visiting assistant professor of statistics at Carnegie Mellon University, in Pittsburgh, ran nearly 10 million simulated games to discover which Scrabble letter tiles confer the most value to a player.

As you might have suspected, Q is bad news. Yes, its tile value is 10 points, but Thomas's stats show that having a Q in your rack brings your game's final score down by 4 on average.

In the Facebook game Words with Friends, a sort of Scrabble derivative [see "Not Your Parents' Scrabble" in this issue], tile benefits, as well as values, differ dramatically. WwF's maker, Zynga, has laid out a board that encourages explosive plays and upset victories, in part by upping the point value of 12 letters and downgrading 2 others.

Comparing the average merits and demerits of each letter in simulated matches of Scrabble and Words with Friends will confirm some player intuitions and call others into question. Is getting a G generally a bad thing? Is M better than F? And will the letter Q ever recover from Thomas's findings?

IEEE Spectrum
FOR THE TECHNOLOGY INSIDER

Follow IEEE Spectrum

Support IEEE Spectrum

IEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.