7 June 2012—Depending on where you are, you may have recently noticed a dramatic change in your Google search results. For example, if you type in the key words “Margaret Thatcher,” you’ll get the usual set of links to highly ranked sites about this former British prime minister. But to the right of that list, you’ll also see a new pane with information about Thatcher—her photograph, date of birth, children, education, books she’s written—along with links to similar sets of information about her husband, other British prime ministers, and even Meryl Streep, who played Thatcher in the movie Iron Lady.
This new feature is the first visible outgrowth of something Google calls the Knowledge Graph, a vast collection of information about a half-billion entities and the relationships between them. It represents Google’s new push to make sense of the Web in terms of “things, not strings,” to use the company’s catchphrase. Instead of just indexing Web documents by the words they contain, “we really need to understand about things in the real world,” says Shashi Thakur, technical lead on the Knowledge Graph project, which some see as a stepping stone to a long-sought system called the Semantic Web.
The Knowledge Graph is very different from the basic search strategy Google was founded on, which was to crawl the Web and build up a giant index of the words contained on each of the documents found. With such an index, Google could easily return links to pages that included your search terms. The company’s secret sauce was the algorithm it used to rank results. This approach, while somewhat daunting to carry out at the scale required, is fundamentally straightforward. The computers doing the crawling, indexing, and ranking don’t need to have any sort of understanding of what the strings of letters you are searching on signify.
Google’s Knowledge Graph adds a new dimension to searches, because the company now keeps track of what many search terms mean. That’s what allows the system to recognize the connection between Margaret Thatcher (the person) and Grantham (her place of birth)—not because the two strings show up together on a lot of Web pages.
Although many people are just now seeing this new addition to Google’s search results, efforts to construct a Web of “things, not strings” have been going on for many years. Indeed, the Semantic Web—a Web of information defined well enough that computer algorithms running anywhere could readily determine the meaning of each item of data—has been a goal of the World Wide Web’s originator, Sir Tim Berners-Lee, almost from the beginning of the Web. But the formal data model and protocols Berners-Lee and the World Wide Web Consortium have promoted to create the Semantic Web have been slow to catch on. That’s why the Web primarily contains text documents, which make sense to the people reading them but are hard for computers to interpret.
Your algorithm would have to be pretty smart to determine, for example, whether a description of a computer that includes the word Flash is referring to hardware (solid-state memory) or software (a package from Adobe). But increasingly, the Web has been including certain well-defined entities, such as recipes, calendar events, or a person’s contact information. Special tags are used within Web pages to make the meaning of such things clear. But these developments have mostly taken place in an ad hoc fashion.
“That evolution is still under way, and the Knowledge Graph is an important step in that direction,” says David Karger, a professor in MIT’s Computer Science and Artificial Intelligence Laboratory. That’s not to say that you can’t find lots of data sets on the Web where the various pieces of information are both well defined and nicely organized, but the only way to query or manipulate such information has been with applications designed specifically to work with one particular set of data. Take airline reservations: You need to use a highly engineered Web application, say from Orbitz or Travelocity, to search for available flights. But suppose you wanted to combine those data sets with one for theatrical performances at your destination city. Then you’d be out of luck. “Our traditional applications put data in silos,” says Karger, who notes that this makes it very difficult to accomplish tasks that require you to combine different data sets in novel mash-ups. “The people who make mash-ups these days are programmers,” says Karger. The Knowledge Graph could broaden the range of people with that power if Google provides tools to access it directly.
How has Google constructed its huge Knowledge Graph? One hint comes from Google’s 2010 acquisition of San Francisco–based Metaweb Technologies, the company that developed Freebase, a structured collection of public knowledge. Freebase allows for investigations that normal search engines struggle with. Say you wanted to know whether cinematographer Robert Burks and film editor George Tomasini had ever worked together on anything other than an Alfred Hitchcock movie. Googling their names would quickly turn up a raft of links to such Hitchcock masterpieces as Rear Window and North By Northwest. But you’d need something like the Freebase-enabled co5TARS Web application to easily find out that the two had never worked together without Hitchcock.
Freebase was just one building block Google used to create Knowledge Graph. “We’re definitely open to using any data we have privileges to use,” says Thakur, who gives the CIA World Factbook and Wikipedia as examples of other collections of public knowledge that he and his colleagues tapped. He says that Google has also licensed certain data sets and points out that the company has some large data sets developed internally—the information collected for Google Maps, for example. “All of those go into the Knowledge Graph,” says Thakur.
“There’s tons and tons of this data available, but getting your hands on it in a form that’s useful is a bear,” says Robert Gonzalez, director of product management and marketing at Cambridge Semantics, a Massachusetts company that helps large enterprises harness Semantic Web technologies internally. Gonzalez notes that Google’s search engine has been able to track meaning in various ways for a while now—for numerical calculations, recipes, even in searches for restaurants. He says that Bing, too, has started to improve its search results with semantic components. And Gonzalez expects that these search engines will grow ever more impressive with time as they tap into the meaning of our searches. “We’re just scratching the surface,” he says.