Should Data Sharing Be More Like Gambling?

Photo-illustration: Randi Klett; Images: Getty Images

When you install a new app on your phone, you might find yourself facing a laundry list of things the software says it needs to access: your photos folder, for example, along with your camera, address book, phone log, and GPS location.

In many cases, it’s an all or nothing deal. 

Eric Horvitz of Microsoft Research says companies could do better. Instead of asking users to provide wholesale access to their data, they could instead ask users to accept a certain level of risk that any given piece of data might be taken and used to, say, improve a product or better target ads.

“Certainly user data is the hot commodity of our time,” Horvitz said earlier this week at the American Association for the Advancement of Science, or AAAS, meeting in San Jose. But there is no reason, he says, that services “should be sucking up data willy-nilly.”

Instead, he says, companies could borrow a page from the medical realm and look for a minimally invasive option. Horvitz and his colleagues call their approach “stochastic privacy.” Instead of choosing to share or not to share certain information, a user would instead sign on to accept a certain amount of privacy risk: a 1 in 30,000 chance, for example, that their GPS data might be fed into real-time traffic analysis on any given day. Or a 1 in 100,000 chance that any given Internet search query might be logged and used.

Horvitz and colleagues outlined the approach in a paper for an American Association for the Advancement of Artificial Intelligence conference last year.

If companies were to implement stochastic privacy, they’d likely need to engage in some cost-benefit calculations. What are the benefits of knowing certain information? And how willing would a user be to share that information? 

This sort of exercise can turn up surprising results. In an earlier study, Horvig and Andreas Krause (then at Caltech, but now at ETH Zurich) surveyed Internet search users to gauge their sensitivity to sharing different kinds of information. More sensitive than marital status, occupation, or whether you have children? Whether the search was conducted during work hours. 

Of course, even if a company works out what seem to be reasonable risks for sharing different kinds of data, what it might look like on the user end is still an open question. How do you communicate the difference between a 1/30,000 and a 1/100,000 probability? 

Horvitz said that would be a good problem to have. “Would you want to live in a world where the challenge is to explain these things better,” he asked, “or where companies scarf up everything?”


Risk Factor

IEEE Spectrum's risk analysis blog, featuring daily news, updates and analysis on computing and IT projects, software and systems failures, successes and innovations, security threats, and more.

Willie D. Jones