It's very difficult, if not impossible, for us humans to understand how robots see the world. Their cameras work like our eyes do, but the space between the image that a camera captures and actionable information about that image is filled with a black box of machine learning algorithms that are trying to translate patterns of features into something that they're familiar with. Training these algorithms usually involves showing them a set of different pictures of something (like a stop sign), and then seeing if they can extract enough common features from those pictures to reliably identify stop signs that aren’t in their training set.
This works pretty well, but the common features that machine learning algorithms come up with generally are not “red octagons with the letters S-T-O-P on them.” Rather, they're looking features that all stop signs share, but would not be in the least bit comprehensible to a human looking at them. If this seems hard to visualize, that's because it reflects a fundamental disconnect between the way our brains and artificial neural networks interpret the world.
The upshot here is that slight alterations to an image that are invisible to humans can result in wildly different (and sometimes bizarre) interpretations from a machine learning algorithm. These "adversarial images" have generally required relatively complex analysis and image manipulation, but a group of researchers from the University of Washington, the University of Michigan, Stony Brook University, and the University of California Berkeley have just published a paper showing that it's also possible to trick visual classification algorithms by making slight alterations in the physical world. A little bit of spray paint or some stickers on a stop sign were able to fool a deep neural network-based classifier into thinking it was looking at a speed limit sign 100 percent of the time.
Here's an example of the kind of adversarial image we're used to seeing:
An image of a panda, when combined with an adversarial input, can convince a classifier that it’s looking at a gibbon. Image: OpenAI
Obviously, it's totally, uh, obvious to us that both images feature a panda. The differences between the first and third images are invisible to us, and even when the alterations are shown explicitly, there's nothing in there that looks all that much like a gibbon. But to a neural network-based classifier, the first image is probably a panda while the third image is almost definitely a gibbon. This kind of thing also works with street signs, causing signs that look like one thing to us to look like something completely different to the vision system of an autonomous car, which could be very dangerous for obvious reasons.
Top row shows legitimate sample images, while the bottom row shows adversarial sample images, along with the output of a deep neural network classifier below each image. Images: Papernot et al
Adversarial attacks like these, while effective, are much harder to do in practice, because you usually don't have direct digital access to the inputs of the neural network you're trying to mess with. Also, in the context of something like an autonomous car, the neural network has the opportunity to analyze a whole bunch of images of a sign at different distances and angles as it approaches. And lastly, adversarial images tend to include introduced features over the entire image (both the sign and the background), which doesn't work in real life.
What's novel about this new technique is that it's based on physical adversarial perturbations: altering road signs in the real world in such a way that they reliably screw up neural network classifiers from multiple distances and angles while remaining discreet enough to be undetectable to casual observers. The researchers came up with several techniques for doing this, including subtle fading, camouflage graffiti, and camouflage art. Here's how the perturbed signs look when printed out as posters and stuck onto real signs:
Subtle perturbations cause a neural network to misclassify stop signs as speed limit 45 signs, and right turn signs as stop signs. Images: Evtimov et al
And here are two attacks that are easier to manage on a real-world sign, since they're stickers rather than posters:
Camouflage graffiti and art stickers cause a neural network to misclassify stop signs as speed limit 45 signs or yield signs. Images: Evtimov et al
Because the stickers have a much smaller area to work with than the posters, the perturbations they create have to be more significant, but it's certainly not obvious that they're not just some random graffiti. And they work almost as well. According to the researchers:
The Stop sign is misclassified into our target class of Speed Limit 45 in 100% of the images taken according to our evaluation methodology. For the Right Turn sign… Our attack reports a 100% success rate for misclassification with 66.67% of the images classified as a Stop sign and 33.7% of the images classified as an Added Lane sign. [The camouflage graffiti] attack succeeds in causing 73.33% of the images to be misclassified. In [the camouflage abstract art attack], we achieve a 100% misclassification rate into our target class.
In order to develop these attacks, the researchers trained their own road sign classifier in TensorFlow using a publicly available, labeled dataset of road signs. They assumed that an attacker would have “white box” access to the classifier, meaning that they can't mess with its training or its guts, but that they can feed things in and see what comes out— like if you owned an autonomous car, and could show it whatever signs you wanted and see if it recognized them or not, a reasonable assumption to make. Even if you can't hack directly into the classifier itself, you could still use this feedback to create a reasonably accurate model of how it classifies things. Finally, the researchers take the image of the sign you want to attack and feed it plus their classifier into an attack algorithm that outputs the adversarial image for you. Mischief managed.
It's probably safe to assume that the classifiers used by autonomous cars will be somewhat more sophisticated and robust than the one that these researchers managed to fool so successfully. (It used only about 4,500 signs as training input.) It's probably not safe to assume that attacks like these won't ever work, though, because even the most sophisticated deep neural network-based algorithms can be really, really dumb at times for reasons that aren't always obvious. The best defense is probably for autonomous cars to use a multi-modal system for road sign detection, for the same reason that they use multi-modal systems for obstacle detection: It's dangerous to rely on just one sensor (whether it's radar, lidar, or cameras), so you use them all at once, and hope that they cover for each other’s specific vulnerabilities. Got a visual classifier? Great, make sure and couple it with some GPS locations of signs. Or maybe add in something like a dedicated red octagon detection system. My advice, though, would just be to do away with signs all together, at the same time that you do away with human drivers and just give over all the roads completely to robots. Problem solved.
Robust Physical-World Attacks on Machine Learning Models, by Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song from the University of Washington, the University of Michigan Ann Arbor, Stony Brook University, and the University of California Berkeley, can be found on arXiv.