Photo: Maciej Frolow/Getty Images
Cellphone metadata, at the center of the recent Edward Snowden leaks in the United States, isn’t just helpful to spooks and secret spy programs with Orwellian names. According to one new study, for instance, small samples of cellphone metadata can reveal emergency situations in real time in a way that could benefit first responders and protect the public.
The new finding reveals the double-edged sword that metadata has become. All data—including the metadata Snowden’s leak showed the National Security Agency to be collecting—is value-neutral. It is useful for purposes good or bad or somewhere in between. So while the Snowden-NSA debate today is sometimes framed as though the metadata were the problem, the public safety study joins a raft of recent research highlighting other uncontroversial and productive uses of metadata.
The research was inspired by a massive forest fire in Israel in 2010, one that claimed 44 lives and took 82 hours to extinguish. Had officials been notified of the blaze soon after it began, much of the ensuing tragedy might have been avoided, says Erez Shmueli, a postdoctoral researcher at MIT’s Media Lab.
“The fire was actually identified and described by various observers in the region...before it caught the attention of the relevant fire department,” Shmueli says. And it was a matter of hours before authorities really understood the magnitude of the problem, a problem that was only getting worse with each passing minute. “That’s one of the reasons why the fire went crazy,” he says.
In other words, he says, the Israeli fire underscored the need for a network-level emergency alert system—one that supplements but does not supplant the existing emergency phone service.
For the study, researchers from MIT and Northeastern University, in the United States, and the Technion and Ben-Gurion University, in Israel, had access to three years of anonymized cellphone metadata for an unidentified “west European country.” A total of 12 billion cellphone calls over that three-year span constituted the raw data on which the group would be testing their theories.
The study began with the observation that no cellular provider could possibly monitor and analyze metadata over its entire network in real time. But real-time analysis is key to discovering emergency events. So sampling tiny subsets of metadata from the network—without ever analyzing the content of the calls and text messages—was really the only way to proceed.
The question was, which subsets? Random sampling, in which a smattering of data points providing time, duration, and the number called, were essentially pulled out of a virtual hat and revealed little about the real-world events taking place at the time.
However, random sampling of “densely connected nodes”—phones that had placed and received many calls over the previous 30 days—did reveal tiny spikes that could be amplified into a meaningful signal. The researchers applied an algorithm they call the Social Amplifier to this nonrandom sample to tease out where and when emergency events could be found. Those uncovered included eight real emergencies—three storms, a bombing, an earthquake, a blackout, and two airplane-related events—as well as eight other big local happenings, such as concerts and festivals, that were not emergencies.
To detect such high-profile events, the Social Amplifier examines 21 features of each point in the data set. One of the more revealing features is something called “betweenness centrality,” the number of shortest paths connecting every cellphone to every other cellphone that pass through the cellphone in question. As an example, say there are 100 people who live in a town, and over the past month Jeremy has called 80 of them. On the other hand, Jim has called only five. In this town’s network, Jeremy is a one-hop link connecting each of those 80 call recipients with one another. And his high connectedness means he’s probably going to be one of the connecting points for many of the other 20 town residents as well. By contrast, the number of shortest connections that go through Jim—who has no direct link to 95 percent of the town—is going to be much smaller. Jim’s betweenness centrality will be low. Jeremy’s will be high.
And so the algorithm monitored, among other indicators, any sudden change in a node’s centrality. This might translate to someone making 10 calls and sending a bunch of text messages in rapid succession. Of course, such activity could just be a blip. Or it could mean that something drastic has happened, and the caller is contacting friends and family, perhaps to tell them she’s okay. The more densely connected each node is, and the more that the densely connected nodes change their behavior, the greater the chance that something is happening.
Crucially, Shmueli notes, the algorithm has not used any geolocation information in its analysis up to this point. But when there are enough spikes in things like centrality over enough densely connected nodes, the algorithm notifies the network administrators, and they can then easily check if the activity is coming from just one part of a city.
If so, there’s a very good chance that something just happened there. And it might not be good. This way, Shmueli says, first responders can be alerted to the problem area and its location. A big event like the deadly 2010 Israeli fire could have been flagged and checked out long before the first emergency phone calls came in to the authorities who could handle it, he says.
Vincent Blondel, professor of applied mathematics at the Université Catholique de Louvain, in Belgium, says algorithms like the Social Amplifier might also be useful for detecting more mundane events like traffic patterns and gridlock.
The challenge for real-time analysis of any cellphone network, Blondel says, is not the sleekness of the algorithm—though he says he’s impressed by the Amplifier’s efficient use of metadata and the relatively low CPU footprint it would require.
“The difficulty,” Blondel says, “is not the computation part, but to have this data [in] real time.” And real-time data is something only cellphone providers themselves have access to, so the providers would have to be running the algorithm.
Details of the research were presented in March at a conference on the scientific study of metadata, NetMob 2013, and will be published in a forthcoming issue of the Journal of Statistical Physics.
Shmueli says the researchers are now investigating ways to use the Social Amplifier to detect big events—such as panics—in online financial and currency markets.