The Making of Facebook’s Graph Search
Engineers who built Facebook’s Graph Search tell the inside story of how they made Zuckerberg’s dream come true
Earlier this month, Facebook gave hundreds of millions of its users a new tool—that adds the ability to search the links between the people they know and the places they go, the businesses they’re interested in, and other information stored in Facebook’s massive structured database. Users can now easily get answers to questions like “What restaurants do my friends like in New York?” and “Who do I know who works at Google?”
Graph Search—a name only a network engineer could love—is a search engine that crawls through people’s Internet connections—their so-called social graphs. The new tool may not immediately change the way people use the Internet, but it could be a big deal in the long run—big enough, perhaps, to challenge Google’s search hegemony.
For its part, Facebook needs Graph Search to work, if Facebook is going to be more than just a tool for entertainment and communication, a place for family photos and funny videos. Greg Sterling, senior analyst and program director with Opus Research’s Internet2Go division, says that kind of tool can be expendable (remember AOL Hometown, GeoCities, and MySpace?). With Graph Search, Facebook is trying to become more utilitarian, more embedded in daily life. “If you’re getting useful information about contractors and vacation planning and other advice,” Sterling says, “you start to rely on it in a way people rely on Google.”
“Graph Search is showing users that there’s a lot more value in Facebook than was previously obvious,” says Susan Etlinger, an industry analyst with the Altimeter Group, in San Mateo, Calif. “It demonstrates that your connections to other people have impact. You and I may have never talked about where to get our cars repaired, but when I need an auto body shop, I may discover that you had a great experience at one, which could be more meaningful than the fact that it gets four stars on Yelp.”
This illustration of the basic Graph Search approach shows nodes (such as people and places, shown here as dots) and the edges, or connections, between them (lines).Image: Facebook
Facebook indeed hopes that Graph Search will lead users to spend more time on the site and post more data. “One of my first successes with Graph Search was when I had a toothache. I looked for a dentist my friends liked. I went there, and it was a good experience,” says Lars Rasmussen, the engineering director of the project. “And this experience of having gotten a really useful task done by Graph Search made me want to go share more stuff.”
Facebook first showed off the Graph Search technology in January, when it allowed a controlled flow of Facebook users to opt into the technology. These eventually numbered in the millions. Mark Zuckerberg predicted that Graph Search would be as important to users as the Timeline, where users store their own photos and musings, and the News Feed, where they see a constant stream of updates by friends.
The announcement didn’t seem revolutionary. Can’t we already search for anything anywhere on the Web? That’s what Google’s for, isn’t it?
But Zuckerberg envisioned something very different. He wanted users to be able to tease out the kind of information about people, businesses, places, and issues that isn’t available in any public record or encyclopedia. With the kind of tool he had in mind, users could find people who live nearby who might want to get together for a game of softball, discover professional chefs’ recommendations for the best French restaurant in New York, or locate someone who knows someone working at a company that just posted an enticing job opening. And Zuckerberg wanted users to be able to ask for this information as if they were talking to a friend, slang and all.
As with a lot of projects at Facebook, the effort to build Graph Search didn’t start in a conference room. In April 2011, Zuckerberg invited Rasmussen to take a stroll. Rasmussen, a computer scientist with a Ph.D. from the University of California, Berkeley, had been at Facebook for only a few months, working on the team building Facebook’s Open Graph software. Open Graph allows folks listening to music on Spotify, for example, to let their Facebook friends automatically find out what they’re listening to. Rasmussen was well known in computing circles, having cofounded Where 2 Technologies in 2003 to develop mapping software and sold it just a year later to Google; the software eventually grew into Google Maps. When Google canceled a project that Rasmussen had worked on for three years, a collaboration tool called Google Wave, he left the company for a job at Facebook.
Zuckerberg hadn’t thought they’d need a lot of time to discuss what he had in mind; he’d planned just a short walk through the residential area that was near Facebook’s headquarters, just about 2 miles round trip.
He explained it succinctly. “Mark had formed this vision about building a structured search engine over all the data that people had shared with each other on Facebook,” Rasmussen recalls. “He thought that even with just the data we had at the time, it would be useful, and with all the different types of data people might share with each other in the future, it would become a supercompelling experience.
“He clearly had thought about it a bunch,” Rasmussen continues. “It would let you ask questions like ‘Which of my friends in New York like opera?’ or ‘What movies do my friends like?’ You could never answer such precise questions with a keyword search engine.”
Facebook already had 20 or so engineers working on search software, including tools that look through a user’s connections to find potential new friends. Those engineers had started prototyping some approaches to structured search, but they hadn’t quite figured out a way to give Zuckerberg what he wanted. Zuckerberg told Rasmussen to join the team and see what he could do.
“I remember very clearly saying, ‘Mark, you are aware that I never worked on search. I may have been at Google for six years, but I never did search.’ And he said, ‘Exactly.’ He wanted someone to do something different.”
Rasmussen immediately saw the scope of the challenge. Facebook adds more than 9 billion new pieces of data a day. About half those are “likes,” and 350 million of these pieces are photos. The numbers, he says, are staggering, and they are growing fast, making the problem harder and harder. Facebook needed to build a search engine that was robust enough not only to handle today’s data but also the onslaught to come.
The good news was that at Facebook, unlike the Web in general, data was already classified as a particular type of information—such as a person, a town, a company, a photo, or an activity. In the branch of mathematics called graph theory, these classifications are known as nodes. The nodes can have a variety of connections—a person has a friend, a hometown, a workplace; in graph theory, these are called edges. Facebook users create another important connection that’s unique to the site when they click on a “like” button on a page put up by or about a company, a corporation, or a musical group, for example.
The bad news was that the software then operating on the site was a hodgepodge, and the search team was going frantic just keeping it all up and running. The patchwork of software grew out of an overall engineering strategy: In its haste to roll out new features, Facebook doesn’t spend much time testing software or making it compatible with other software.
“Moving fast,” Rasmussen says, “is an important factor in why the company is so successful.” But, he acknowledges, “each time you build a new thing, you incur a little bit of cost going forward maintaining that thing as things change around it, making sure the old things still work, and scaling it as the user base grows.” By the time he’d joined the search team, the group was focused on simply maintaining the existing collection of search products.
“When Zuck saw it, he said, ‘Oh, you’ll never make that work. But if you can, it’ll be awesome,’ ” says Rasmussen.
Rasmussen and others continued to extend the prototype as time allowed, with a lot of help from a summer intern who jumped into the project full time. Throughout that summer and into the fall, the team kept improving the basic package—and kept getting more frustrated.
“We would build a prototype, and we would think we had something really cool, and the folks on the search team would all use it and love it,” says Rasmussen. “And then we’d bring users into our user studies lab and put it in front of the uninitiated. And those ‘real users’ couldn’t make it work.”
For example, an engineer would tell a user to pretend to look for a friend to go running with. Says Rasmussen: “We knew that the right query was ‘Who are my friends who like running?’ But the poor users didn’t know this. They would say ‘Who are people who like to run?’ or ‘Who are runners?’ or ‘Which of my friends run?’ They would express it in every conceivable way that was not ‘friends who like running’—which was the only way Graph Search understood it.”
Facebook’s typical approach would have been to use a separate group to focus on the project. “I passionately argued against that,” Rasmussen says. “It’s not cool for most of the team to be working on old things while only a small team gets to work on the new, exciting thing.”
So with Zuckerberg’s approval, Rasmussen made the new project the main task of the entire search department, giving the maintenance of older systems a lower priority. The engineers soon realized that to make Graph Search work and keep other search functions on the site viable, they had to run them all off the same back-end system. That was the only way they’d be able to spend less time supporting the software and more time on the new project. Now the multiple programs that combed through the fields in Facebook’s database had to be unified.
The engineers had previously made some effort to extend one of the four existing back-end systems to support a wider variety of search functions. The team had code-named that extension project Unicorn; like the mythical creature, they thought, this system would heal all their woes if only it existed. Unicorn could do a number of useful things, including identifying intersections between sets of users: It could find friends of John who were also friends of Mary, or John’s friends who were also fans of Justin Bieber. But it wasn’t sophisticated enough to power Graph Search.
For six months, the search team focused on improving Unicorn, making it more broadly applicable and scalable, and slowly bringing over the various existing search functions from the limited-purpose search engines, like type-ahead, to run on Unicorn. It was slow going.
A big problem was the dramatically growing data sets: If you have 200 friends, and each has 200 friends, following friends of friends involves 40 000 connections. The engineers needed to constrain the search to a manageable number of friend connections, somewhere in the thousands. Fortunately, the sheer effort of typing “friends of friends of friends of friends” apparently keeps users from trying to reach out too far.
But Unicorn didn’t have to be perfect to replace the stand-alone search engines. The team finished moving all the existing search functions to Unicorn on 18 May 2012, the day Facebook went public. “We had our own little party that day, to celebrate moving everything to Unicorn,” says Rasmussen. The rest of the company was celebrating going public. It was a total coincidence, but it was nice; it felt like they were celebrating us.”
With Graph Search looking more like a real tool, Zuckerberg gave it more resources. Rasmussen pulled engineers, computer scientists, and programmers from Facebook’s new hires. (At Facebook, people are rarely hired for specific jobs but instead go into a general pool.) There were specialists in user interfaces, and other people who did systems engineering. He also brought in the first linguists ever hired by the company, to expand the semantic prowess of the search engine. Users can now say “Which of my besties run?” Over a matter of months, by September 2012 or so, the search team doubled to 50. It’s far larger now, though Facebook isn’t saying just how large.
Kari Lee, an engineer who’d been working on Facebook’s other search tools since 2007 and working on Graph Search since its early days, took over management of the grammar and user interface teams. She began trying to whittle down the daunting idea of comprehensive natural-language search into something that could be implemented in a reasonable amount of time. “The joke on the team became ‘Kari says no,’ ” she recalls. “Lars came with big ideas, and I would say, ‘That doesn’t work for this and this reason.’ ” It wasn’t fun saying no all the time, but, Lee says, “I had to figure out how to get the project out of crazyland into something that was actually going to work.”
PATHFINDING: When a user types a search request into a browser, Facebook’s front-end server sends it on to an aggregator server, which analyzes it and sends it to the relevant index servers to fulfill. These index servers are updating constantly as billions of pieces of data are added daily.
This was a typical puzzle: Does someone who searches for “photos of engineers from Mumbai” want photos of engineers who are from Mumbai or photos of engineers from anywhere who are photographed in Mumbai? It’s not obvious to the search engine. (Facebook’s engineers are still working on that one.)
A key step turned out to be the addition of a drop-down set of possible search queries; these start popping up as users type the first letters of their questions and adjust as they continue. The drop-down queries allow users to select a possible interpretation of their questions. What’s more, the team found that when users saw these phrases, they were quietly “trained” to express themselves in the way Graph Search understands best. Along with the drop-downs, the team added a column on the right-hand side of the page that suggests possible ways to filter results, another way of training users to understand what is possible in natural-language searches.
Drop-downs were a brilliant move, say analysts, particularly because of the current limitations of Facebook’s search possibilities. “I love the suggestions, because I’m not sure exactly what it will let me search for,” says Matt McGee, editor in chief of Search Engine Land.
“Considering the relative simplicity of Graph Search compared to the complexity of data in Facebook, it’s an impressive engineering feat,” says Etlinger, of the Altimeter Group. “But natural language is really hard. Facebook has done a nice job, but it’s still a considerable challenge.
By mid-2012, about a year after the project started, the developers were getting excited about new capabilities that they thought were awesome. But Rasmussen remained frustrated. Every time “we put it in front of [an outside] user,” he says, “it didn’t work very awesomely. We started worrying.”
But things were progressing, albeit slowly. Recalls Lee, “At first, our user testing sessions were really to help us understand whether this was even feasible. Whenever a user typed something in and it worked, we would be behind the glass cheering, because it was more unusual for it to work than not. Towards the end, we might still be secretly giving each other high fives every time it worked, but they would mess up much less often, and that was so much fun to see.”
A year in the Facebook world is a really, really long time. “I couldn’t imagine anything taking more than a year to build,” Rasmussen says. “But we were coming up on a year, and we weren’t getting close to being done.”
The engineers weren’t struggling just to make the software work. By March 2012 they knew that Graph Search was going to require changes on the hardware side. All of Facebook’s data are classified, indexed, and stored in Facebook’s gigantic data centers in Oregon, North Carolina—and Sweden. Overall, Facebook has hundreds of petabytes of information in the form of photos, videos, and status updates. When a search request comes in, aggregator servers parse the request and send it to the appropriate place—photo index servers, friend index servers, and so on.
Answering even a fairly simple request, say, “photos of John’s friends,” involves several sets of servers. The aggregator servers send out a request to an index server to find John’s friends, then to the photo index servers to find photos of those users, and finally to the servers that store the photos themselves to pull up the photos. Facebook replicates the indices based on the traffic that flows to them, so users can get reasonably quick responses. Increasing the number and complexity of search queries meant, at a minimum, that Facebook would need to add more index servers. Even with plenty of servers, however, delays can multiply when requests travel back and forth across Facebook’s network from one set of servers to another. So the developers designed Graph Search to have each server bring back only a few hundred results so that users would get their responses promptly—in less than a second.
Facebook currently has more than 240 billion photos— more than any other type of data
Not only does fast index-data processing get responses to users quickly, it’s also crucial for user privacy. You can choose to show the information you share only to friends, but that means if you drop someone as a friend, the corresponding data must instantly be made unavailable to you. So the indexes must be updated constantly, not just with new privacy settings but also the 4.75 billion pieces of content (like pictures or posts) and the 4.5 billion “likes” added to Facebook daily.
At the time that Facebook announced the Graph Search project in January, all the index data were stored in conventional dynamic RAM. However, each server is limited in how much RAM it can address, so the engineers had to spread the index data among multiple servers, which slowed down searches. Since then, engineers have moved the majority of the photos index to flash memory, and they’ll eventually move other index data to flash as well. (Photos are by far the largest type of data stored on Facebook—by two orders of magnitude. Users have posted more than 240 billion photos, all of which are stored on hard disks.)
Because flash memory is denser than DRAM by about a factor of 10, moving the indexes to flash means that fewer servers need to be involved in each index search. Engineers are also steadily improving the design of the index, trying to optimize the way data is structured and retrieved. Doing both—squeezing more data into each machine and handling it more efficiently—means that each index server will be able to answer a growing number of queries per second, which is essential as Facebook gives more and more users access to the technology.
Back in April 2012, the team, though confident it could make Graph Search work, had decided to reduce the scope of what the initial system would cover, recalls product manager Loren Cheng. They removed the ability to search for events, for example, and also the ability to search for posts, comments, and Open Graph actions that come through a user’s News Feed; the team limited searches essentially to information that the software can find by knowing who the users’ friends are and what the friends “like.” Now, more than a year later, they are facing questions about this or that search function that doesn’t exist but seems like it should. For example, you can’t search your friends’ posts or even your own posts yet, although all the developers agree that this would be hugely useful. “It broke our hearts to eliminate that,” Cheng says.
Making tough choices, though, meant that by August 2012 they were ready to open Graph Search to all employees. “People would play around with it, give us a bunch of criticisms, then leave. It was just too horrific,” says Rasmussen. But come the last few months of 2012, the team started to notice that new users weren’t leaving quite that fast and that a few had started using it regularly. Maybe, just maybe, Graph Search was turning into a tolerable experience. “That’s when we started regaining faith that this was actually going to go somewhere,” he says.
Having users, even just hundreds of them, helped the development process. “We could argue all day about things like how we rank results, for example,” Rasmussen says. “Do we rank by how often a user interacts with someone? For recruiting a new employee or finding someone to do an activity with, you probably want to search among your immediate network. But for dating, you probably want to look outside your circle of friends to their friends.”
Looking at real users stopped the arguments, and so development moved faster. Behind the scenes, automated systems monitored—anonymously— everything these users did on Graph Search and tried to determine when the search engine provided good answers or bad answers (based on what the user did afterward). Reports of failed searches could indicate problems with the natural-language processing, a ranking of results that buried useful answers, or simply that the user was looking for something known to be outside the system’s current capabilities.
By late 2012 Graph Search was working—sort of. It couldn’t handle every question users might come up with. It couldn’t even answer most questions. But it did enough, the developers believed, to be released to the public—at least to a small percentage of Facebook’s users. The company formally announced Graph Search in January; the first million-plus users were early adopters who specifically requested it. This month, Facebook began rolling out Graph Search to everyone who uses the site in American English, some several hundred million people. Eventually, Graph Search will be rolled out to all of Facebook’s 1.1 billion users. How long that takes will depend on both quantitative data—gathered as users type in their queries and click through the site and through bug reports—and qualitative data, from feedback forms on the site and from studies of users brought into the Facebook lab for observation. If it turns out that typical users, not just early adopters, love Graph Search in its evolving state, Rasmussen says, it will roll out quickly.
Whenever Facebook makes a change to its site, users raise questions of privacy—are people going to find out things about me I really don’t want them to know? The Graph Search rollout has been no exception. And indeed, while Graph Search doesn’t change who sees your data (friends, friends of friends, or anybody, depending on the choices you make), it does make it easier for someone looking for something to find it. A Tumbler site, http://actualfacebookgraphsearches.tumblr.com, has demonstrated that vividly, with searches like “mothers of Jews who like bacon” and “current employers of people who like racism.” Analyst Etlinger expects that corporate legal teams may start using Graph Search to identify possible problems; she adds that other uses may lead to some individuals wishing they hadn’t been quite so public on the site. Unusual for Facebook, Graph Search’s rollout came with detailed reminders for users to review their privacy settings to make sure they were aware of what they are sharing publicly.
The developers admit that much work remains. By leaving out the ability to search wall posts and comments posted on Facebook by a user’s friends, Graph Search is ignoring the most active section on Facebook. “It’s a matter of scale,” Rasmussen says. Today, Graph Search can handle the hundreds of billions of photos and likes stored in its servers. But adding News Feed search would require upgrading hardware systems throughout Facebook’s server network and optimizing the software so that these systems could efficiently sift through the massive data.
It does have a way to go, say analysts. But, says Sterling of Opus Research, all signs point to “a serious commitment to provide a meaningful use of search on both its website and mobile app.”
“Right now, Graph Search is very limited,” says McGee. “But Facebook can afford to be in this for the long run. They have plenty of money.”
So far “it’s a bit of a curiosity,” says Etlinger. “I think it’s going to have a smooth adoption curve; people will start to use it more frequently, but it will take some time. It is a new type of behavior.”
Indeed, points out Facebook’s Lee, Graph Search is unique: “You can go to Yelp to get restaurant recommendations, or you can go to LinkedIn to see where people work. But we’re the only place you can go to find restaurant recommendations from people who worked as chefs. We’re tying together all this information people are sharing.”
Of course, if Facebook could make money off all these search queries, that wouldn’t be a bad thing for the company. But Rasmussen swears that any effort to use the software to generate income is in the distant future.
“Throughout Silicon Valley,” says Rasmussen, “we start by making something we think is really useful. And if that turns out to be true, then we get a certain amount of usage, which we’ll then find a way to monetize so that we have money to build the next thing. It always happens in that order.”