Identifying Credit Card Users With a Few Bits of Data

Anonymizing data doesn't protect privacy as well as you might think

Photo: iStockphoto

Anonymized credit card data can easily be used to identify credit card users, more evidence that anonymizing data does not protect privacy as well as often thought, scientists now find.

Personal information often gets anonymized by stripping it of names, home addresses, phone numbers and other obvious identifying details. Such data often get shared, and underlie popular services such as Google’s real-time traffic monitoring, which shows conditions on major thoroughfares in more than 50 different countries.

However, anonymized data can still reveal a great deal about individuals. For example, computational social scientist Yves-Alexandre de Montjoye at MIT and his colleagues recently found that anonymized cell phone data could be better at identifying users than fingerprints. At most, 11 randomly chosen interactions with cell phone networks were needed to identify a person by the routes he or she regularly traveled, while identifying someone by a fingerprint requires at least 12 reference points.

To see how well anonymized credit card data protected privacy, de Montjoye and his colleagues at MIT and Aarhus University in Denmark analyzed three months' worth of information from 1.1 million people living in an unidentified developed country in the Organization for Economic Cooperation and Development (OCED). They detailed their findings in the Jan. 30 issue of the journal Science.

The researchers found that knowing when and where four credit card transactions occurred was enough to identify 90 percent of people from this anonymized metadata. Even when the data are less specific — for instance, purchases within a certain geographic area instead of a certain shop, or within 15 days instead of one day — individuals could be re-identified with a half-dozen or so more additional data points. Adding one more piece of data, the price of a certain transaction, could increase the chance of re-identification by 22 percent on average. Women and people in higher income brackets proved easier to identify, potentially because they have distinctive patterns in how they divide their time between the shops they visit.

Although data sharing can provide invaluable services, these findings suggest "we ought to rethink and reform how we approach data protection," de Montjoye said. He and his colleagues are now developing strategies known as OpenPDS and SafeAnswers to protect the privacy of metadata, which recently won a SXSW Interactive Innovation Award.

The Tech Alert Newsletter

Receive latest technology science and technology news & analysis from IEEE Spectrum every Thursday.

About the Tech Talk blog

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.