A null pointer exception is like having the address to a house that was never built. It means a programmer has referred to an object that doesn’t actually exist because it was never described in the code. Null pointers are extremely common and relatively easy to fix—easy enough to be boring, in fact.
Unfortunately, the tedious work of finding and fixing errors like these still takes up much of a developer’s time and mental energy. A 2016 evaluation of 1,000 Android apps [PDF] found that null pointers caused more crashes than any other kind of error, including illegal arguments, array index out of bounds exceptions, and bad tokens.
To make its developers’ jobs more rewarding, Facebook is now using two automated tools called Sapienz and SapFix to find and repair low-level bugs in its mobile apps. Sapienz runs the apps through many tests to figure out which actions will cause it to crash. Then, SapFix recommends a fix to developers, who review it and decide whether to accept the fix, come up with their own, or ignore the problem.
Engineers began using Sapienz to review the Facebook app in September 2017, and have gradually begun using it for the rest of the company’s apps (which include Messenger, Instagram, Facebook Lite, and Workplace). In May, the team will describe its more recent adoption of SapFix at the International Conference on Software Engineering in Montreal, Canada (and they’re hiring).
So far, the combination of Sapienz and SapFix has worked pretty well, according to the team. “It’s saving developers time, that’s our key goal,” says Mark Harman, an engineering manager at Facebook. Harman, who is also a professor of software engineering at University College London, built Sapienz with cofounders Yue Jia and Ke Mao at their former startup Majicke, which Facebook acquired in February 2017.
Since its debut, Sapienz has found hundreds of new ways to crash each of Facebook’s apps every month. Though some of those crashes are not a major concern for developers, about 75 percent have since been protected against.
For now, SapFix only recommends fixes for null pointers, but the team hopes to someday expand that system to assist with many different error types. As a starting point for its fixes, SapFix relies on templates served up from another tool developed at Facebook called Getafix. Those templates are based on previous fixes that human developers made for similar problems.
Automating bug repair, at least for certain classes of bugs, should free up developers to write more code or to solve more difficult problems—akin to working on the logic and semantics of a story rather than fixing spelling and grammar errors.
“I think in the future, this is going to be mainstream because it makes total sense,” says Giovanni Vigna, a professor of computer science at the University of California in Santa Barbara. “Instead of paying expensive QA engineers to find trivial bugs, pay them to find super difficult bugs. And the simple ones, automate.”
So far, much of the work on automated bug repair has been confined to academic labs or DARPA competitions, but Facebook’s progress shows that these technologies are now beginning to make their way into workflows at major tech companies. “In terms of real deployment, Facebook’s system is the first I’ve heard of,” Vigna says.
Harman predicts that with tools like SapFix, particular classes of bugs could be eradicated within two to five years—meaning they will be found and fixed entirely by automated systems. Bugs that are localized to just a few lines of code are more likely to be on the extinction list than more complex ones.
Martin Monperrus, a professor of software technology at KTH Royal Institute of Technology, in Sweden, is more conservative in his outlook. He says autonomous bug repair is “only at the beginning” and compares it to “the level of early bicycles” in the history of transportation.
There’s certainly still room for improvement with SapFix. “It often will propose bad fixes,” says Harman. But, he adds, perfection isn’t necessarily the goal. “We don’t have to be right every time; it just needs to be right often enough to help developers.”
One task that Sapienz has helped with is deciding which tests to run in order to figure out what user actions will cause an app to crash. Such actions might include clicking on two buttons at once, applying multiple coupon codes at checkout, or selecting more than one option from a list. Coming up with these tests is a complicated problem because, as Harman says, “there are more test cases for even a simple app than there are stars in the universe.”
Though it’s possible to automatically generate tests through a technique known as random fuzzing, doing so is computationally expensive and can produce extremely long or otherwise unrealistic scenarios (such as touching 20 places on the screen at once). And when developers design these tests by hand, they often overlook cases they didn’t explicitly design for.
Instead of relying on randomly generated or developer-designed tests, Sapienz generates a list of possible tests and favors those that produce a crash in the shortest number of steps. It also prioritizes tests that have higher coverage, which means that they click through to more screens in the app. From there, it mutates these tests slightly (by changing a click or swapping a step) to come up with another batch of similar tests, and repeats the process.
By the end of its evaluation, Sapienz will have found new ways to quickly and reliably crash an app. Using a predictive model, Sapienz prioritizes those crashes based on how common and severe they are likely to be. The system uses a series of heuristic rules to hone in on the lines of code where the problem may have originated, and annotates this spot in Facebook’s code review tool, allowing a developer to see the system’s comments overlaid on the actual code they wrote.
Facebook’s developers make more than 100,000 commits every week, and the Facebook app for Android contains millions of lines of code. Sapienz runs hundreds of emulators around the clock to review code before and after it’s shipped, conducting tens of thousands of tests every day.
“This is a great example of how these kinds of techniques are slowly becoming mainstream,” says Vigna. “I think we will see more and more of this in the future.”