Fighting Buggy Code with Genetic Algorithms

Automating software tests is a great idea in theory. To make it practical, researchers are using evolution to automate the automation.

4 min read

Fighting Buggy Code with Genetic Algorithms
University of Saarland software engineer Andreas Zeller used automated test-generation software to ferret out bugs in the programming for automotive sensors.
Photo: Oliver Dietze/Universität des Saarlandes

The offspring of a genetic algorithm and a data-structure-reading spider could hunt down and eliminate the software bugs that plague development projects from websites to automotive sensors.

If a 2012 Cambridge University business school study has it right, programmers spend about one-quarter of their working lives debugging. (They devote another 25 percent of their workdays to designing and writing code; paperwork and meetings eat up the rest.)

With a shrug and a sigh, we’ll pass by the obvious implication (kill the meetings and shelve the status reports) and move on to bug hunting, the blood-sport that consumes three months of the work year.

It’s clearly useful to automate software testing, and engineers are doing that whenever they can—particularly in complex applications. But they still have to develop input data sets and testing strategies, and those can be a bottleneck. The programmers can hand-build data sets and test the code unit by unit. They can formulate rules to generate random collections of well-formed data. They can search old files or try to cadge sets someone else once used for something that was sort of similar.

If only you could automate the automation and quickly generate suitable test inputs, and then optimize them to cover as much code and uncover as many flaws as possible.

Software engineers at Germany’s University of Saarland are among the hunters pursuing this quarry. The school’s Software Engineering Department has developed a number of methods for automatically generating tests, particularly for applications with structured inputs. To date, they’ve optimized test-generators for Web pages (a package called WebMate, commercialized by TestFabrik); Java apps (Exsyst); Android apps (DroidMate, whose motto is: “Android apps are great. But when they crash, hang, or violate your privacy, they suck.”); and, most recently, for XML and JSON (XMLMate, “Search-based XML input generation”).

At first glance, the methods seems focused on Web services, but bugs in almost any software project can be fair game.

“The best thing is that we are completely independent from the application area,” says department Chair Andreas Zeller. “With our framework, we are not only able to test computer networks, the processing of datasets, websites or operating systems, but we can also examine software for sensors in cars."

To test a so-called cyberphysical system, like an automotive sensor (photo), “All you need is a way to translate XML into the structured input formats required by the system under test, which usually is a straight-forward translation process,” Zeller says.

The packages are, well, evolving. The most recent test generators apply genetic algorithms to initial seed data and continually improve upon it.

And the latest, XMLMate, helps to automate and improve the seeding process. In a paper prepared for this June's International Symposium on Software Testing and Analysis (ISSTA 2014), Zeller, Nikolas Havrikov, and Matthias Hoeschele describe “search-based” input-set generation: the test-generation software combs through XML schemas (the files that define how objects are to be categorized and treated) and sample test-input files gathered from all over. These are processed into a number of input packages of validly formed data.

These packages are the “chromosomes” on which a genetic algorithm forces evolution. XMLMate swaps, recombines, substitutes, and just generally mutates the input-package data—following the rules of the schema, of course. Then it pumps input package after input package through the “subject”—the program being tested. XMLMate tracks the results and scores each input-package’s relative “fitness.” The fittest test inputs activate more lines of the evaluation program’s code…and provoke the greatest number of errors. The unfit data sets—the dull ones that don’t stray off the beaten path and don’t cause any trouble—lose the evolutionary struggle and are flushed from the gene pool. The trouble-making elite are saved for another round of evolution and reaping.

The Saarland researchers tested XMLMate on six XML-processing programs: Rome, a library for processing RSS and Atom syndication feeds; JEuclid, a library for rendering and converting MathML;  Freedots, which renders MusicXML musical notation into Braille and MIDI audio; Apache Batikand SVG Salamander, libraries for modifying, converting, and rendering Scalable Vector Graphics; SVG Image, a “simple” Java SVG rendering app; and FlyingSaucer, an XHTML-and-SVG rendering library

At this point, automated approaches haven’t achieved complete coverage of every line of the subject’s code. Indeed, coverage ranged from a few percent to just under half. The tests did find, however, that search-based test generation consistently increases coverage by anywhere from 10 percent to (in the extreme case of Freedots) more than 900 percent over randomly generated test inputs. Or, as the paper puts it, “Evolving sample inputs achieve coverage that would not be reached by random seeds.”

Oddly, adding the genetic algorithm to the mix did not uniformly raise coverage per se. What evolving code did clearly do was increase the number of exception errors it could provoke (remember, exception generation factored into the fitness function). Overall, non-evolved, random-test-input packages generated 12 unique program exceptions in the six trial programs; tests that evolved from search-generated seeds turned up 28 errors. (Or, to put it another way, 16 fewer bugs for customers to find.)

Interestingly, the test program also turned up problems in programs that were not in the evaluation group:  They found “one small file” that shocked Firefox into consuming all available main memory and shutting down. Another input set crashed Opera. And some JEuclid test inputs crashed the Java 1.6 virtual machine on both Linux and Windows.

Right now, Zeller and his colleagues plan to make XMLMate available as open-source software…though it is also possible that TestFabrik, their WebMate partner, might commercialize the utility.

Illustration: Randi Klett; Images: iStockphoto
Photo: Oliver Dietze/Universität des Saarlandes

The Conversation (0)