With the two major U.S. political party conventions in the rearview mirror, Americans are now taking a close look at their quadrennial presidential election. Cable news networks, whose revenues rely on keeping audiences tuned in, have a vested interest in playing up the contest’s capriciousness and unpredictability.
But an increasing number of statistically literate poll-aggregation websites, such as FiveThirtyEight, RealClearPolitics and Electoral-Vote.com, reveal how the electorate’s choice is anything but. All three got the 2008 result right and predicted the final electoral tally to within 5 percent. And perhaps the most starkly analytical picture of the presidential race appeared on a Princeton neuroscientist’s website. A physicist by training who applies his statistical acumen to opinion poll results, Sam Wang’s forecast of the 2008 election was off by a single electoral vote. And his 2004 prediction was perfect.
Wang, an associate professor of neuroscience at Princeton, says an undergraduate-level knowledge of statistics and some basic coding skills are the chief prerequisites for being the most accurate presidential-race pundit on the planet. His Princeton Election Consortium, a collaboration with Princeton alum Andrew Ferguson, crunches the numbers collected by the Huffington Post –owned organization Pollster.com, which aggregates a number of poll results.
Wang says the key to predicting elections is recognizing the worthlessness of any single poll result. And, he says, given the workings of the U.S. Electoral College—in the United States, candidates win or lose individual state contests, which vary in size based on population—polls yield the most trustworthy results when aggregated state by state.
“I’m unsentimental about it,” Wang says. “What I’ve done, using statistical tools, is taken a bunch of state polls and turned them into a thermometer, where I can just tell you it’s 326 electoral votes in the shade today.”
His unsentimentality was learned the hard way. In 2004, Wang’s unadorned calculations predicted the exact outcome: George W. Bush with 286 electoral votes, John Kerry with 252. But Wang overruled his algorithm using additional ad hoc factors that put a thumb on the scale for his preferred candidate and predicted a wider margin for Kerry.
That’s one error Wang says he’ll never make again—a vow that points to another essential ingredient in PEC’s secret sauce.
Call it an Occam’s razor of poll collation: Use the cleanest aggregation algorithm with the fewest assumptions: Poll results are the only relevant data. Wang’s models pointedly ignore additional factors that are popular at bigger aggregators like FiveThirtyEight and RealClearPolitics, such as the state of the economy, the polarization of the electorate, or estimates of a pollster’s reliability. (Some pollsters have been accused of a house bias, such as Rasmussen Reports, which New York Times poll aggregator Nate Silver recently examined for its alleged GOP-leaning results.)
If a poll is accurate, Wang says, then these factors will already be baked into the result. If the poll isn’t accurate, then adding new complications will likely just make the numbers noisier, like trying to get theater-quality sound out of an old videotape by using Dolby Surround. Better to just to aggregate as many polls as possible and use well-studied statistical methods to wring the most meaning from the data.
Instead of weighting polls, PEC uses a simple statistical trick to reduce the influence of outliers: It relies on medians rather than means. Say that three polls find a candidate up by one, two, and three percentage points. The mean and median of these polls are both two points. But if one of the pollsters has a bias of five points, the poll results might instead be –4, 2, and 3. That knocks the mean back to 0.3 points—transforming a lead into a dead heat—while the median stays unmoved at 2. “The idea of the median,” says Wang, “is there has to be a way to account for the fact that different pollsters use slightly different methods.”
Andrew Gelman, professor of statistics and political science at Columbia University, thinks Wang may be wrong in writing off such factors as ideology and party affiliation. However, he does agree that individual polls yield far less information than political reporters sometimes claim. “There is no magic signal,” he says. “People are chasing the noise.”
PEC’s bottom-line projection as this story went to press is an 87 percent chance of Obama getting reelected on 6 November. Wang won’t yet forecast the specific electoral college outcome. Being a good number cruncher, he says, also means knowing when it’s best just to stick to your error bars.
About the Author
Mark Anderson is a regular contributor to IEEE Spectrum and the author of The Day the World Discovered the Sun (Perseus Books, 2012).