Why Pay to be an Identity Thief? Experimental Software Makes It Free

Thieves purchased sensitive personal data from ChoicePoint, but a Carnegie Mellon University researcher can get the same information free on the Web

3 min read

11 March 2005--The U.S. database industry is under a legal microscope following the pilfering of information that could allow thieves to steal the identity of hundreds of thousands of people. In a hearing yesterday, senators threatened legislation to regulate large brokers of financial and other data such as Lexis Nexis, Bank of America, and Choicepoint--all of whom have disclosed major thefts in the last two months. It was the incident at Alpharetta, Ga.-based Choicepoint that kindled the current concern in Washington, D.C. In mid-February the company, whose data is used to check the legitmacy of the potential customers of other companies, revealed that it had been tricked into selling the records of 145 000 people to thieves posing as legitimate ChoicePoint customers.

But why should an identity thief bother with an expensive charade? Carnegie-Mellon University associate professor of computer science, Latanya Sweeney, has found an even simpler way than paying a company in the personal database industry, which critics say is underregulated. She's found a way to extract all the data she wants for free from the World Wide Web. For over a decade, Sweeney has been exploring the intersection of technology and privacy. Her latest work builds on earlier Web-searching tools that create software agents to extract names, address, birth dates, and Social Security numbers from résumés posted online--everything you need to apply for a new credit card in someone else's name. Sweeney will report her findings at a symposium devoted to national security sponsored by the American Association for Artificial Intelligence and held at Stanford Univeristy, in California, 21 - 23 March.

With her software, Sweeney can gather the key data with just a little Web surfing. She starts with a filter that searches for documents likely to be résumés and then extracts the key data values--name, social security number, address, and date of birth. Résumés are found in a two-part process: first, a program Sweeney wrote last year finds long lists of names. Then a specialized Google search filter looks for résumés associated with those names that contain Social Security numbers.

Social Security numbers and the other needed fields, such as birth date, are isolated using a combination of techniques. For example, dates can be formatted in several different ways, but there are now standard techniques for parsing them. If a résumé has all the needed data except a birth date, the software grabs it from one of the many sites that offer them, such as Anybirthday.com. Social Security numbers have a distinctive format: nnn-nn-nnnn. Another program of Sweeney's, SSN Watch, checks the numbers that are found.

How important are those Social Security numbers? Last September, the commissioner of the U.S. Federal Trade Commission told Congress that they play "a pivotal role in identity theft. Identity thieves use the Social Security number as a key to access the financial benefits available to their victims."

Obviously, if people are posting their Social Security numbers to the Web, and if doing so leaves them highly vulnerable to identity theft, then they ought to stop. Sweeney's work addressed that issue. The Identity Angel project, which she launched earlier this year, looks for e-mail addresses in those résumés, and sends individuals automated notices that their identity information was found online. She says a follow-up study showed that more than 90 percent of the people subsequently removed the information from the Web.

Nonetheless, even with a digital Samaritan patrolling the ether, U.S. identities remain at risk. A November study by the U.S. Government Accountability Office found that "Social Security numbers appear in any number of records exposed to public view almost everywhere in the nation, primarily at the state and local levels of government."

The GAO reported that many states and hundreds of the nation's 3141 counties put Social Security numbers directly on the Internet and that "this could affect millions of people." The agency concluded that the risk of exposure for Social Security numbers in public records "is highly variable and difficult for any one individual to anticipate or prevent."

That risk is much lower across the Atlantic, where a 1995 European Union directive on data privacy ensures that personal data is kept secret by default.

According to Stephen J. Kobrin, a professor of multinational management at the University of Pennsylvania, in Philadelphia, this represents a fundamental difference between the United States and Europe. "In America privacy is seen as an alienable commodity subject to the market," he wrote in 2002 report. In contrast, he says, in Europe, privacy is considered to be "a fundamental human right." Not only do explicit privacy statutes exist there, but they are also enforced by dedicated regulatory agencies.

In other words, the current U.S. crisis of identity theft is a result of policy choices that Americans have made, sometimes implicitly, sometimes explicitly. They are choices that can be revisited anytime.

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions