Yahoo primes its research pump
PHOTO: YAHOO Inc.
Prabhakar Raghavan, Yahoo Research guru
Yahoo Inc. is doing very well these days, having reported healthy revenue, operating income, and stock dividend numbers each fiscal quarter in 2005. But unlike, say, archrival Google, which seems to roll out a new product every month, Yahoo isn't known for its innovations. In July, the Sunnyvale, Calif., company hired Prabhakar Raghavan, an IEEE Fellow, to head Yahoo Research, and in December it opened an East Coast research center in New York City. IEEE Spectrum Senior Associate Editor Steven Cherry spoke with Raghavan in December and January by phone.
You joined Yahoo in July 2005, coming from a much smaller firm, Verity Inc., which specialized in corporate search tools. What's it like at a US $5 billion company?
At Yahoo, I'm with the company that's got the largest Internet presence. We get 400 million unique visitors monthly; one out of eight page views is a Yahoo page view.
At Yahoo Research we have five focus areas. First, there's search and text mining, including classic Web search. Second, there's data mining and machine learning. About 10 terabytes of data flow through our servers each day. Just for comparison, the world's first terabyte database, at Wal-Mart, a decade ago, was a big deal. Now we're doing 10 times that each and every day.
Next is computer-human interaction. We feel we need to be prepared for a billion people interacting through the Web, organizing themselves into communities. What drives this desire to congregate online? How can we improve people's experiences? Media are social media; for us, that includes the recent acquisitions of Flickr and del.icio.us. [Flickr, based in New York City, is a Web service for posting and commenting on photos; del.icio.us, also based in New York City, similarly allows people to post files of Web bookmarks and comment on those of others.]
The fourth area is large-scale computing. One of the things we've had to figure out, which we're always working on, is large-scale computing. Those daily 10 new terabytes get added to 20 000 terabytes of data stored on our servers, and all that data has to be made available 24/7, via desktops, laptops, even cellphones.
The fifth and final area is the intersection of microeconomics and computer science, especially auction theory--auctions and marketplaces. Information that gets sold can be everything from sponsored keywords to music to anything that gets annotated and improved and put on a market. How do you apportion rewards among interacting agents--in our case, one billion humans? How do you incentivize the right things--good content instead of spam, for example?
People congregate around things they find useful. That's one picture people have of open-source software--people contribute code and get the admiration of community members. That's one model for social media. We also think about open content, where one person creates something and others enhance it. For example, Yahoo and others have ways in which questions can be posed not to the system's centralized index but to other users. And recently, we have looked at ways to do that through social networks--like Friendster does--where the answers have a measure of trust. You trust your friends, your friends trust their friends, and so on. What happens when we add incentives such as small monetary rewards to good answers? What happens when there are incentives to just pass along the query to someone who can come up with a good answer? We've started to research those questions, and many more.
Your research group has been on something of a hiring spree.
We have a large staff, and we're hiring more people all the time. In November, we hired Andrei Broder as a research fellow and vice president of emerging search technology. He was, by the way, just named an IEEE Fellow for 2006.
How would you compare your group with similar corporate research groups, such as Microsoft Research?
Microsoft has a very well-established research group; it does a lot of long-term research. We feel the bar for us is higher. We want a comparable group with the same credibility in the scientific community. But there exists a need for short-term results as well.
We need to engage externally with the scientific community, for a lot of reasons. One, obviously, is branding and recruitment. We want researchers out there to see Yahoo as a serious place to work. But also, for ourselves, we can't become an insular group. Technical communities that are insular tend to fall behind. Whereas, if they're out there participating in IEEE and ACM [Association for Computing Machinery] conferences, have places on the conference committees, give conference keynotes, and so on, then we know they're cutting their teeth on the right stuff. And lastly, we're participating in an intellectual commons. We do work, others build on it, and we build on the work of others. In the process, the whole pie grows, and our piece of the pie grows as well.