Autonomous Security Bots Seek and Destroy Software Bugs in DARPA Cyber Grand Challenge

The team behind the victorious Cyber Reasoning System will receive a US $2 million prize

3 min read
A cybersecurity illustration showing a keyboard with a graphic of a bug superimposed on top.
Photo: Steven Puetzer/Getty Images

The mission: to detect and patch as many software flaws as possible. The competitors: seven dueling supercomputers about the size of large vending machines, each emblazoned with a name like Jima or Crspy, and programmed by expert hacker teams to autonomously find and fix malicious bugs.

These seven “Cyber Reasoning Systems” took the stage on Thursday for DARPA’s Cyber Grand Challenge at the Paris Hotel and Conference Center in Las Vegas, Nev. They were competing for a $2 million grand prize in the world’s first fully autonomous “Capture the Flag” tournament. After eight hours of grueling bot-on-bot competition, DARPA declared a system named Mayhem, built by Pittsburgh, Pa.-based ForAllSecure as the unofficial winner. The Mayhem team was led by David BrumleyXandra, produced by TECHX from GammaTech and the University of Virginia, placed second to earn a $1 million prize; and Mechanical Phish by Shellphish, a student-led team from Santa Barbara, Calif., took third place, worth $750,000.

DARPA is verifying the results and will announce the official positions on Friday. The triumphant bot will then compete against human hackers in a “Capture the Flag” tournament at the annual DEF CON security conference. Though no one expects one of these reasoning systems to win that challenge, it could solve some types of bugs more quickly than human teams.

Darpa hopes the competition will pay off by bringing researchers closer to developing software repair bots that could constantly scan systems for flaws or bugs and patch them much faster and more effectively than human teams can. DARPA says quickly fixing such flaws across billions of lines of code is critically important. It could help to harden infrastructure such as power lines and water treatment plants against cyberattacks, and to protect privacy as more personal devices come online.

But no such system has even been available on the market. Instead, teams of security specialists constantly scan code for potential problems. On average, it takes specialists 312 days to discover a software vulnerability and often months or years to actually fix it, according to DARPA CGC host Hakeem Oluseyi.

“A final goal of all this is scalability,” says Michael Stevenson, Mission Manager for the Deep Red team from Raytheon. “If [the bots] discover something in one part of the network, these are the technologies that can quickly reach out and patch that vulnerability throughout that network.” The original 2005 DARPA Grand Challenge jumpstarted corporate and academic interest in autonomous cars.

imgThis visualization shows network traffic flows for the bot Rubeus as it receives verification of software bugs from competitors.Image: DARPA

The teams were not told what types of defects their systems would encounter in the finale, so their bots had to reverse engineer DARPA’s challenge software, identify potential bugs, run tests to verify those bugs, and then apply patches that wouldn’t cause the software to run slowly or shut down altogether.

To test the limits of these Cyber Reasoning Systems, DARPA planted software bugs that were simplified versions of famous malware such as the Morris worm and the Heartbleed bug. Scores were based on how quickly and effectively the bots deployed patches and verified competitors’ patches, and bots lost points if their patches slowed down the software. “If you fix the bug but it takes 10 hours to run something that should have taken 5 minutes, that's not really useful,” explains Corbin Souffrant, a Raytheon cyber engineer.

Members of the Deep Red team described how their system accomplished this in five basic steps: First, their machine (named Rubeus) used a technique called fuzzing to overload the program with data and cause it to crash. Then, it scanned the crash results to identify potential flaws in the program’s code. Next, it verified these flaws and looked for potential patches in a database of known bugs and appropriate fixes. It chose a patch from this repository and applied it, and then analyzed the results to see if it helped. For each patch, the system used artificial intelligence to compare its solution with the results and determine how it should fix similar patches in the future.

During the live competition, some bugs proved more difficult for the machines to handle than others. Several machines found and patched an SQL Slammer-like vulnerability within 5 minutes, garnering applause. But only two teams managed to repair an imitation crackaddr bug in SendMail. And one bot, Xandra by the TECHx team, found a bug that the organizers hadn’t even intended to create.

Whether humans or machines, it’s always nice to see vanquished competitors exhibit good sportsmanship in the face of a loss. As the night wound down, Mechanical Phish politely congratulatedMayhem on its first place finish over the bots’ Twitter accounts.

The Conversation (0)

How the FCC Settles Radio-Spectrum Turf Wars

Remember the 5G-airport controversy? Here’s how such disputes play out

11 min read
This photo shows a man in the basket of a cherry picker working on an antenna as an airliner passes overhead.

The airline and cellular-phone industries have been at loggerheads over the possibility that 5G transmissions from antennas such as this one, located at Los Angeles International Airport, could interfere with the radar altimeters used in aircraft.

Patrick T. Fallon/AFP/Getty Images
Blue

You’ve no doubt seen the scary headlines: Will 5G Cause Planes to Crash? They appeared late last year, after the U.S. Federal Aviation Administration warned that new 5G services from AT&T and Verizon might interfere with the radar altimeters that airplane pilots rely on to land safely. Not true, said AT&T and Verizon, with the backing of the U.S. Federal Communications Commission, which had authorized 5G. The altimeters are safe, they maintained. Air travelers didn’t know what to believe.

Another recent FCC decision had also created a controversy about public safety: okaying Wi-Fi devices in a 6-gigahertz frequency band long used by point-to-point microwave systems to carry safety-critical data. The microwave operators predicted that the Wi-Fi devices would disrupt their systems; the Wi-Fi interests insisted they would not. (As an attorney, I represented a microwave-industry group in the ensuing legal dispute.)

Keep Reading ↓Show less
{"imageShortcodeIds":["29845282"]}