Where Will You Be December 18th at 6 P.M.?
Long-term prediction is hard, but researchers are on the case
Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
Where will you be tomorrow at 6 p.m.? You probably know, right? Short-term predictions are easy. Where will you be five months from now at 6 p.m.? Well, if it’s Christmas Day, you might know, but what if it’s the week before? You probably don’t, and probably neither do your friends and family and coworkers.
Long-term prediction is hard.
In fact, it turns out to be very hard. The usual prediction models break down entirely, and new mathematical techniques are needed.
And needed they are, because there’s a lot of utility to being able to make those predictions, whether you’re a credit card company trying to detect fraud in an individual account or a highway-traffic engineer deciding when to schedule a road repair.
My guest today is Adam Sadilek. He’s a postdoctoral fellow with the University of Rochester’s Department of Computer Science, and he recently did a stint with Microsoft Research, where he looked into long-range prediction and came out of it with a paper entitled “Far Out: Predicting Long-Term Human Mobility.” It’s coauthored with John Krumm, who’s a researcher in Microsoft [Research]’s Adaptive Systems & Interaction Group, and it’s being presented this month [July] at the annual conference of the Association for the Advancement of Artificial Intelligence, or AAAI, in Toronto.
Adam, welcome to the podcast.
Adam Sadilek: Thank you for having me.
Steven Cherry: First of all, what is the use of long-term prediction, and were those good examples that I gave?
Adam Sadilek: Yeah, you alluded to a lot of those examples already. Like one is infrastructure planning. If we have a good idea where everybody is going to be two years from now, we can plan building roads better, for example. Another application is peer-to-peer package delivery system, where instead of having dedicated mail carriers, you can actually use the general population to deliver packages, and then you can use a system like Far Out to see where are these people most likely to meet so they can exchange these packages and then route them to the final destination. Another component that actually my colleague at Microsoft already explored is to use long-term prediction to have an intelligent thermostat in your home. So if the thermostats understand your location, it can either preheat or cool down the house based on how likely you are to get home.
Steven Cherry: Very good. So how good are short-term predictions? And why can’t those same methods be used for long-term predictions?
Adam Sadilek: Short-term predictions are myopic, so they look at a very specific localized context. They usually take your past few locations plus some real-time aspect of your current context, and they evolve that one or two steps into the future, and that gives you prediction for the next hour, the next day. But as you force these systems to evolve further and further into the future, they become less and less precise. They diverge and eventually become worse than even a random predictor, because they’re not designed for that purpose. With Far Out, we take a more global view of people’s location data, and for each person we learn a library of prototypical days which we call eigendays, and these eigendays have the property that they capture the dependable repeating components of people’s location signal, and they filter out the transient and undependable aspects of human mobility.
Steven Cherry: So where does your data come from?
Adam Sadilek: We use two types of subjects. We have regular individuals that have been hired by Microsoft to carry GPS units that log their location every few seconds. They just carry these devices in their pockets as they go about their daily lives. And another class of subjects are vehicles, where the vehicles have the same unit installed on their dashboard, and again we record their location every few seconds. And we have a number of years—we have about 90 person years—of location data based on these subjects.
Steven Cherry: Very good. So compare the accuracy to…I mean, how accurate are these long-term predictions now, and how does that compare to the short-term predictions?
Adam Sadilek: Oh, we have two metrics for measuring accuracy: One is the regular distance on a map, in meters, and on average when you do predictions two, three years from now, the average area for us is about 1 kilometer. That’s basically one street block. The alternative methods achieve area of 2.5 kilometers per prediction, so that’s about 2.5 times worse than what we can do. So that’s the absolute area in terms of meters, but we also measure the area in terms of accuracy, where we divide the map into these triangular cells, and then we measure the accuracy in terms of what percentage of the cells did our model predict correctly. We can then induce a probability distribution over these triangular cells that tells us where a person is most likely to be at any given time, and the accuracy there wobbles between 80 and 90 percent depending on the subject.
Steven Cherry: And so you’re just predicting the physical location; you’re not predicting, for example, what the person is doing. And so if somebody goes to the mall, you don’t know if they’re shopping for a dress or going to the movies or eating lunch.
Adam Sadilek: Right. That’s something that would have to be another layer that’s on top of the system. Far Out is just for location prediction.
Steven Cherry: Very good. And you adjust for some things like weekdays versus weekends. I don’t suppose you adjust for things like, say, what season it is.
Adam Sadilek: You could add that to the model. We didn’t run experiments based on seasonal changes. We just take into account national holidays and the day of week that’s the context that we work with. You can imagine having a longer eigenday, which in that case would become an eigenmonth or an eigen half year, and that would be able to capture patterns like winter, spring, raining, sunny, things like that. It can easily be added to the model, but we didn’t do experiments with that.
Steven Cherry: And it doesn’t really pick up things like family vacations or things like that?
Adam Sadilek: Only to the extent that they’re correlated with the national holidays or days of week, for example. The system doesn’t have access to your calendar, and it’s limited to the context of one day. So there’s room for improvement in terms of how these things can be leveraged to have even better predictions over long term. But the purpose of this work was to actually figure out and see what is the limit of just GPS and just a simple context. To what extent can we predict your location a couple of years from now just looking at your past data?
Steven Cherry: So what’s next? Are you going to continue this work and try and make it more accurate?
Adam Sadilek: Well, one thing we are exploring right now is how you can deal with incomplete and irregularly sampled data. The data set we work with is pretty clean, where we have, every few seconds, we have a sample of GPS, your location. But now, when you start applying techniques like this, for example, on social network data, on tweets, these tweets come at random intervals, basically, and the location is more noisy and more incomplete. So we need to leverage additional machine-learning muscle to deal with this inconsistency. So that’s one of the next steps.
Steven Cherry: So what do you think the first actual application for this research to show up in?
Adam Sadilek: Well, the first application is actually deployed already by my colleagues at Microsoft. That’s the intelligent thermostat that’s already working for [unintelligible]. Another application you can imagine is to have an intelligent calendar that understands where everybody is and where everybody is going to be, and then when you want to schedule a group meeting, say, the calendar can figure out what is the most likely place that the people meet anyway and can help you select the right location and time.
Steven Cherry: So we might, for example, see this information show up in, I don’t know, the Microsoft Outlook or in the calendar and appointment parts of Outlook and Exchange?
Adam Sadilek: You can imagine this is possible. We haven’t experimented with this, but you can imagine techniques like this could be applied in other products as well.
Steven Cherry: So it might be like an intelligent assistant in scheduling a meeting.
Adam Sadilek: That’s right, and there’s a large body of work at Microsoft Research already looking into this problem.
Steven Cherry: I guess one issue would be to what extent an organization has maybe additional data that they could throw into this problem.
Adam Sadilek: Right, so you can access employees’ calendars to have a better idea of who is in, who is out, [unintelligible] so things can get arbitrarily complex here, and certainly oftentimes these additional data points help, but they can also confuse the model, and you have to be careful.
Steven Cherry: I guess there’s other sources of information as well that directly or indirectly impact location, so, I mean, Bing has, for example, a lot of location data, especially if people use it for mobile search. Facebook has a lot of data. Somebody on the show just told us recently that people’s wall postings spike on their birthdays, for example, so I guess this sort of research could show up in other places depending on what data is available.
Adam Sadilek: It’s good, yeah. Again, one has to be careful because of privacy reasons and also because the models become more complex because you have more parameters to work with. Well, the more parameters, the more data you have to have. I’m not sure I have any more to say about this.
Steven Cherry: Well, thanks, Adam. It’s very interesting research and good luck with it.
Adam Sadilek: Thank you.
Steven Cherry: We’ve been speaking with computer scientist Adam Sadilek about some work he did at Microsoft Research on making predictions about where people will be months, and even years, from now.
For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s audio programming is the audio version.