For Spectrum's January issue, I wrote about the Zillow Prize competition, in which nearly 4,000 teams were pitted against one another in a quest to come up with a computerized algorithm or machine-learning system that could predict the future sale price of homes. Real-estate giant Zillow organized the competition in hopes of using what it learned from these teams to improve its own system of predicting home prices, something the company calls the “Zestimate." And today, Zillow has announced a winner: a team made up of Chahhou Mohamed of Morocco, Jordan Meyer of the United States, and Nima Shahbazi of Canada, whose predictions bettered the Zestimate by about 13 percent.
Stan Humphries, chief analytics officer for the Zillow Group, in Seattle, says that he and his colleagues have learned an enormous amount from the winning team and others in the competition—thousands of people working for two years on the problem: “That's a huge help," says Humphries.
Although he couldn't be too specific, Humphries shared that one area of insight was “how you combine various models in an ensemble approach." You see, even a single team will typically pursue different strategies or models in parallel and so generate different estimates, which then need to be combined in some way to produce a final result. It's a little like what statistician Francis Galton famously did to estimate the weight of an ox at a 1906 country fair: He asked hundreds of people attending the fair for their estimates and combined them to come up with a value that was more accurate than what even the most expert individual had estimated.
According to Humphries, the winning team also put considerable effort into the ancillary data that they fed into some of their models—data that included such things as the proximity of water bodies and the prevailing level of road noise. So there was a lot more data for crunching than the usual real-estate statistics about square footage, number of bedrooms, and the sale prices of comparably sized houses in the area.
The competitors submitted their home-price predictions last July, for homes that sold in September and October. So Humphries and his Zillow colleagues have been busy now for a couple of months figuring out what exactly the algorithms of top teams did and trying to decide which parts of their systems should go into the Zestimate. “Some [parts] have already been put in; some will flow in later," says Humphries.
It's no doubt a huge job to sift through code submitted by so many teams, trying to pick out the best parts. Humphries and his colleagues will likely be busy for a long time. It's a tribute to the quantity of output a technical competition like this can engender. As Humphries says, “It was a good use of a million dollars."
David Schneider is a senior editor at IEEE Spectrum. His beat focuses on computing, and he contributes frequently to Spectrum's Hands On column. He holds a bachelor's degree in geology from Yale, a master's in engineering from UC Berkeley, and a doctorate in geology from Columbia.