Microsoft Researcher Predicts Obama to Win
Says Economy-Driven Prediction Models Miss Key Data
Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
Last March, we had on the show David Rothschild, an economist then working at Yahoo Labs. He and a colleague, Patrick Hummel, had created a forecasting model for the 2012 U.S presidential election similar to the forecasts that economists use to make predictions about the gross domestic product or unemployment rate.
Back in March, the model predicted an Obama win in the Electoral College by 303 votes to 235, in large part because the first-quarter economic numbers for 2012 were pretty good. The economic news in the second quarter was pretty dismal, though, so when Rothschild and Hummel subsequently applied the same model through June, they found Romney winning the election, 290 to 248.
Here in mid-September though, the model has Obama back on top. That despite the dismal economy and a raft of competing models that are still predicting a Romney victory, in some cases by a landslide. To see where we’re at with the presidential race as it enters the home stretch, and to talk more about predictive modeling in general, we asked David Rothschild to join us once again.
Even though it’s only been six months, his CV has changed: He’s still working in New York City but is now at Microsoft Research, in its new lab here. His title is still simply “economist.” He joins us by phone.
David, welcome to the podcast.
David Rothschild: Thank you for having me again.
Steven Cherry: David, even though you came up with the same result as in March, you got to it by a very different path. Back then, you used a bunch of data which for some reason you guys in the prediction business call “fundamental data.” Now that we’re close to the election, the model gives most of the weight to a bunch of different factors that weren’t even in earlier. Tell us about the models then and now.
David Rothschild: Excellent. Definitely. So, that far away from the election, the most information you’re going to pull is going to be in fundamental data, past election results, economic indicators, incumbency, and similar factors to that. So in February we were able to say, based off how the economy is trending at this time, a certain prediction of how we feel the economy will be trending through the second quarter to make a prediction on the state-by-state. And I’ll clarify one thing: By the time we kind of roll around towards the summer, there’s a lot more data out there; particularly, liquidity starts to rise in prediction markets and polling data. Prediction markets—I talked a little bit about [them] the last time I was there—are markets in which people can buy and sell contracts on the likelihood of a single candidate or an event outcome. It is either a 1 or a 0, depending on what happens, so these contracts are very indicative of the probability of an event happening. And then polling, of course, state-by-state polling starts to come out. So by the time we actually ran the fundamental model in the spring, we never.... In the second quarter, we never really publicize that by itself, because by that point we’re already including a lot of prediction markets and polling data. And the way that my model has always worked is that you start with the fundamentals very early in the cycle; you start adding in polling data as it becomes available; and then, as the summer turns into the early fall, all of this kind of fundamental data is kind of incorporated into a much more informed voter, a much more informed, people-involving prediction markets, and the fundamental model by itself kind of drifts away, so that at this point in the election cycle we’re mainly focusing on a mixture of polling data and prediction market data. And that will continue towards Election Day, where essentially the fundamental data drops out and it’s really only incorporated in these other two data sources, and that drives the predictions today.
Steven Cherry: David, there are a lot of predictions out there that go the other way, including a University of Colorado model that gives Romney a pretty crushing 320-to-218 victory.
David Rothschild: Right.
Steven Cherry: That prediction is based on the poor economy, and in fact—and based on what you told us in March—your own model should be predicting a Romney win. What’s going on?
David Rothschild: Right. Well, there’s kind of two things I’ll talk about with this. The first is the idea of forecasting and what it’s for. So the idea is really two different things. One is making the best forecast you can in any given point in time. Upwards of $10 billion are going to be spent on the 2012 election cycle, and just like any major business, understanding what’s going to happen is extremely important and meaningful. So forecasts that are updated regularly provide information to people involved in the business, whether or not they’re investors, i.e., people that are funding campaigns, people that are practitioners, people that are using the money efficiently. So that’s the idea behind getting an accurate forecast in a timely manner. And the other thing is research and thinking about, well, how do these major things that we care about—whether or not they’re debates or conventions or leaked videos—how do they really impact the outcome? And so that kind of research is kind of another angle to it. And so to me fundamental models serve a very strong forecasting purpose very early in the cycle, when there isn’t very much in the form of polling and prediction markets. But later in the cycle, fundamental models by themselves just don’t have that kind of information. They’re not that useful as a pure forecast any more, but they do continue to provide a lot of information about this kind of research angle. And so for that Colorado paper and other papers that are coming out now with pure fundamental models, I wouldn’t think too much about them as far as what they’re saying with their forecast; I would think more about them as far as what they’re saying about how they correlate economic indicators with outcomes. There’s an extreme diminishing return to economic indicators this late in the cycle; story line is pretty well set at this point. Unless you’re going to see a major shift in the economic indicators in the last month or two, it’s going to be hard to make a dramatic shift from the idea that there’s a tepid but somewhat meaningful recovery. Number two is, we focus a lot on the state-by-state numbers, whereas a lot of other models focus on national indicators. And we think this has been kind of clear from our data. And if you look at states like Ohio and Virginia, especially, where the recovery has been stronger than the national average, it’s not surprising to see that they’re also seeing that in the polls—and maybe a little different than other models are showing that may have those in a different category.
Steven Cherry: These prediction markets capture the wisdom of crowds, because the market is sort of the sum total of what people who know and care about the outcome think the outcome will be. That’s just one way, one difference between past elections and this one: the rise of social media.
David Rothschild: Right. So let me address those separately, and I’ll address the second part first. So we’re really excited here about thinking about social media and how it can correlate to different things we care about. This cycle—it’s tough; you’re seeing a lot of raw data out there that’s not properly adjusted to any sort of outcome. You’re seeing a lot of raw numbers of tweets and raw numbers of searches, and on the flip side some kind of opaque ranking systems. But really, they’re having trouble putting it into context, and part of it is because there’s not a huge amount of historical data to attach it to—to kind of see what does this kind of tweet level mean historically to different outcomes. And second, I’d say they’re kind of conflating three very separate things that social media can be used for. One is predictions, but two is sentiment: how much people are thinking positively or negatively. And then number three is interest: how much people are actually thinking and talking about different candidates and different issues. So what we’re trying to do here is trying to make this both transparent and meaningful, and so we’re working on releasing before not too long some infographics and even some ongoing data visualizations, which will show how social media really can tell us about interest in different issues, interest in different candidates and sentiment.
Steven Cherry: David, there’s one rather iconoclastic model out there that maybe we could discuss for a minute; it’s profiled in an online feature that Spectrum is running this week. It’s the Princeton Election Consortium, and it’s run by a neuroscientist of all things named Sam Wang. Wang ignores the fundamental data—the unemployment data, the GDP data—and he ignores prediction markets. All he cares about is the polling data, which he aggregates state by state. I think you’re familiar with it. Maybe you could tell us how it’s different from what we’ve discussed so far and what you think of it.
David Rothschild: Sure. So definitely my understanding is that they run a very objective, poll-based forecast, and we run here is a combination of polls, fundamental data, and prediction markets. And I don’t want to compare and contrast too much, but I think both of those take a very solid approach. And you’re going to see a lot of good models out there and a lot of not-as-clean models. This is a very clean model that focuses on the question that people care about, which is forecasting the outcome very directly, and takes a very objective approach. But I think that by adding in prediction market data, which is kind of one of the keys that kind of separate us from other different models, is very important because it involves giving us a granularity and kind of continuous updating. These help us in two ways. Number one is on the granularity; so, we have models that move in real time and so when we kind of look at the research angle of thinking about what are we learning during elections, this granularity lets us, say, really isolate incidents and really help understand the impacts of those. Similarly, by having the model in the same light is updated in real time, it does provide something that’s continuously new and updated for people, and so it’s always providing the best forecast at any given moment for people looking at it. Polls do lag behind by a couple of days when it comes to updating after major events. So prediction markets are able to incorporate a lot of that information about what we can assume will actually happen that won’t in the polls in the next few days, and I think that’s really informative and kind of helps us understand what’s really going on at any given time.
Steven Cherry: David, a friend of mine—his name is Karl—he’s a rock climber in Yosemite, and he likes to play a game. You go rock climbing for the day with him, and maybe the sun is way past being directly overhead, and he says, “Guess the time.” And here are the rules. You give three numbers: the time, a margin of error, and how sure you are. So you might say, “4:15 plus or minus 15 minutes, 80 percent” or “4:20 plus or minus 5 minutes, 95 percent.” It’s a fun game—and by the way, there’s no system for scoring a winner, which I think is also true in economics. But how do you balance between the prediction itself, the margin of error, and the confidence level in election predictions or economic predictions?
David Rothschild: Right. Well, that’s a great question—and it’s actually work that we’re doing beyond the elections—is really thinking about election forecasting and also the sentiment and interest. We’re thinking about how do we apply this research in a general sense to economic indicators or even sports or finance or other questions like that. A lot of the forecasts that you see, especially in politics, are still focused on point estimates, like “The estimated vote share is going to be x” or “The economic indicator is going to be y,” and that doesn’t really paint a full picture for a lot of people. And so kind of the first stage of a lot of the work we do here, a lot of the work I do, is thinking about how do we take all this information, the passive social media information we’re talking about—whether it’s Twitter or Facebook or some internal stuff like thinking about search and page views or the more actively gathered stuff from polls to prediction markets—how do we get that all together? How do we make the most accurate and meaningful outcome predictions as well as sentiment and interest? But also, how do you get people to understand it in the most meaningful and also kind of salient way? And that involves thinking about: How do people understand probabilities? How do people understand estimated outcomes? And can we get lay people to understand probability distributions? Can we get them to understand all these things that will provide more meaning to the prediction? And we’re working on ways in which we can get people to understand full probability distribution for economic indicators, because we think that’s one place where it will be extremely valuable to understand where we think the probabilities can lie. And the cool thing about the research we’re doing there—and which we’ll unveil in some sort of games that we’re working on in the next month—is that by getting people to understand how these probability distributions impact them or what they look like or what they mean, we can also get people to reveal more information, which they already know, in new polling, in new prediction markets that we’re working on, in which people can actually estimate just as you were describing: not just point estimates but a distribution over where they think the outcome could lie.
Steven Cherry: Very good. And maybe if you could clue in Weather.com also, and they could stop predicting an 80 percent chance of rain when it’s, you know, raining.
David Rothschild: Right. And that’s the type of thing that people always talked about, is that weather people have to really put their prediction in a tight hole—rain, shine, things like that—whereas people really do—now, people look a lot on the hourly forecasts or look at things like that, but there is a lot of room for understanding the actual probabilities at given times would make more informed decisions for people. No different than economic indicators [that] will allow people to make more informed decisions if you know more about the distribution.
Steven Cherry: Well, David, it’s pretty interesting research, so thanks for doing it and thanks for coming back to talk with us again.
David Rothschild: Thank you very much for having me. I really enjoyed it.
Steven Cherry: We’ve been speaking with David Rothschild, an economist with Microsoft Research, about his prediction that Obama will win the 2012 presidential election and about prediction science in general.
For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
Announcer: “Techwise Conversations” is sponsored by National Instruments.
This interview was recorded 19 September 2012.
Audio engineer: Francesco Ferorelli
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s audio programming is the audio version.
Editor’s note: The original headline was changed to more accurately reflect the content of this podcast.