A Machine Learning Classifier Can Spot Serial Hijackers Before They Strike

How would you feel if, every time you had to send sensitive information somewhere, you relied on a chain of people playing the telephone game to get that information to where it needs to go? Sounds like a terrible idea, right? Well, too bad, because that’s how the Internet works.

Data is routed through the Internet’s various metaphorical tubes using what’s called the Border Gateway Protocol (BGP). Any data moving over the Internet needs a physical path of networks and routers to make it from A to B. BGP is the protocol that moves information through those paths—though the downside, like a person in a game of telephone, is that each junction in the path only knows what they’ve been told by their immediate neighbor.

Because a specific junction in a route knows only where the data it’s transmitting just came from and where it’s headed next, it’s relatively easy for someone to step in and divert the data. At these specific junctions, autonomous systems establish BGP connections. Like a party pooper intentionally ruining a game of telephone by whispering a completely different phrase than the one that was told to them, a hacker can potentially insert their own autonomous system to reroute information. The worst offenders are serial hijackers, who repeatedly divert data to skim information or enable distributed denial-of-service (DDOS) attacks. In 1998, several hackers testified to the U.S. Congress that the Internet could be taken down by a dedicated hacker in 30 minutes by deploying BGP hacking.

Historically, serial hijackers have been difficult to stop. One recent example was Bitcanal, a Portuguese web hosting firm that spent years assisting serial hijackers in their attacks. It took years of coordinated effort from legitimate service providers to shut down Bitcanal, and meanwhile, plenty of other serial hijackers still roam the Web. What’s worse, serial hijackers have to, as the name suggests, launch multiple attacks before it becomes clear that they’re a bad-faith actor.

“BGP [hacking] is one way to sniff at traffic, or steal traffic,” says Cecilia Testart, a graduate student at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). ”Given that the Internet is becoming more and more critical, we should try and prevent these attacks.”

Testart is the lead author on a paper published today [PDF] by a group of researchers at CSAIL and the Center for Applied Internet Data Analysis (CAIDA). They’ve proposed that machine learning can be used to pro-actively stop serial hijackers from their hijinks. Serial hijackers, the researchers suggest, display some characteristic traits that make them stand out compared to ordinary network providers. They show that machine learning could sniff out serial hijackers more quickly than the standard method of identifying them only after many attacks.

The joint team used a machine learning technique called an extremely-randomized trees (extra-trees) classifier. In a test with their classifier, the classifier flagged 934 out of 19,103 autonomous systems it tested as potential serial hijackers.

You can think of extra-trees classifiers as though you were growing a forest of trees, where each tree represents a vote of confidence—for example, whether someone is a serial hijacker—based on a randomized subset of available information.

The resulting forest represents a consensus. If most trees have arrived at the decision that someone is a serial hijacker using the limited information available to them, then you likely have one on your hands.

Testart says extra-trees classifiers and other forest classifiers don’t have the same bias toward a set of training data that a machine learning technique like deep learning may have. Because the available data on known serial hijackers is so small, deep learning techniques may have skewed toward finding only ones most similar to known attackers and missed ones that might differ more.

Of course, for individual trees to cast a vote, they need to know what they’re looking for. The research group identified a few ways in which serial hijackers differ from authentic network providers that typically route Internet traffic. For example, authentic providers tend to be online more consistently, as they are providing Internet service to actual customers. Serial hijackers, on the other hand, would only be online while they’re skimming data.

Serial hijackers also usually have more varied Internet Protocol (IP) blocks—basically the street addresses of the Internet. Testart explains that an institution like MIT typically has a block of consecutive IP addresses that it uses. Hijackers, however, tend to pick up small strings of IP addresses as they become defunct from other users. One user with a strange selection of IP blocks, therefore, is more likely to be a serial hijacker.

These rules aren’t set in stone. Testart notes there are instances when a legitimate network provider could go offline—for example, during an earthquake or blackout. Fat finger errors can also lead to typos and misconfigurations that could make a legitimate provider look suspicious at first glance.

Testart says there’s still plenty to be done with the work the research team has published so far. She suggests that an extra-trees classifier like the one the group developed could give network operators a sort of reputation score, so that serial hijackers would see their reputations drop quickly as they went about their nefarious business.

The other option is to update BGP and turn the game of the telephone into something more secure. But Testart doesn’t think that’s likely. “The Internet is a huge network,” she says. “It’s running on infrastructure set up many years ago. If you update a major protocol, you need to update all that infrastructure.” Imagine the headaches of trying to get every network provider in the world to agree to change a protocol—it’s far easier just to build a tool that can sniff out serial hijackers.

internet security machine learning Internet security MIT IP Internet

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

A Machine Learning Classifier Can Spot Serial Hijackers Before They Strike

Serial hijackers routinely steal IP addresses to pilfer data and launch attacks. Extra-trees classifiers could sniff out these hackers sooner

The Legacy of the Datapoint 2200 Microcomputer

Announcing a Benchmark to Improve AI Safety

Related Stories

Why the AI Boom is a Windfall for Tiny Anguilla

Cory Doctorow: Interoperability Can Save the Open Web

Bob Kahn on the Birth of “Inter-networking”

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

A Machine Learning Classifier Can Spot Serial Hijackers Before They Strike

Serial hijackers routinely steal IP addresses to pilfer data and launch attacks. Extra-trees classifiers could sniff out these hackers sooner

The Legacy of the Datapoint 2200 Microcomputer

Announcing a Benchmark to Improve AI Safety

Prepping For Post-Quantum Cryptography

Related Stories

Why the AI Boom is a Windfall for Tiny Anguilla

Cory Doctorow: Interoperability Can Save the Open Web

Bob Kahn on the Birth of “Inter-networking”