A growing unease has settled around evolving deepfake technologies that make it possible to create evidence of scenes that never happened. Celebrities have found themselves the unwitting stars of pornography, and politicians have turned up in videos appearing to speak words they never really said.
Concerns about deepfakes have led to a proliferation of countermeasures. New laws aim to stop people from making and distributing them. Earlier this year, social media platforms including Facebook and Twitter banned deepfakes from their networks. And computer vision and graphics conferences teem with presentations describing methods to defend against them.
So what exactly is a deepfake, and why are people so worried about them?
Deepfake technology can seamlessly stitch anyone in the world into a video or photo they never actually participated in. Such capabilities have existed for decades—that’s how the late actor Paul Walker was resurrected for Fast & Furious 7. But it used to take entire studios full of experts a year to create these effects. Now, deepfake technologies—new automatic computer-graphics or machine-learning systems—can synthesize images and videos much more quickly.
There’s a lot of confusion around the term “deepfake,” though, and computer vision and graphics researchers are united in their hatred of the word. It has become a catchall to describe everything from state-of-the-art videos generated by AI to any image that seems potentially fraudulent.
A lot of what’s being called a deepfake simply isn’t: For example, a controversial “crickets” video of the U.S. Democratic primary debate released by the campaign of former presidential candidate Michael Bloomberg was made with standard video editing skills. Deepfakes played no role.
The main ingredient in deepfakes is machine learning, which has made it possible to produce deepfakes much faster at a lower cost. To make a deepfake video of someone, a creator would first train a neural network on many hours of real video footage of the person to give it a realistic “understanding” of what he or she looks like from many angles and under different lighting. Then they’d combine the trained network with computer-graphics techniques to superimpose a copy of the person onto a different actor.
While the addition of AI makes the process faster than it ever would have been before, it still takes time for this process to yield a believable composite that places a person into an entirely fictional situation. The creator must also manually tweak many of the trained program’s parameters to avoid telltale blips and artifacts in the image. The process is hardly straightforward.
Many people assume that a class of deep-learning algorithms called generative adversarial networks (GANs) will be the main engine of deepfakes development in the future. GAN-generated faces are near-impossible to tell from real faces. The first audit of the deepfake landscape devoted an entire section to GANs, suggesting they will make it possible for anyone to create sophisticated deepfakes.
However, the spotlight on this particular technique has been misleading, says Siwei Lyu of SUNY Buffalo. “Most deepfake videos these days are generated by algorithms in which GANs don’t play a very prominent role,” he says.
GANs are hard to work with and require a huge amount of training data. It takes the models longer to generate the images than it would with other techniques. And—most important—GAN models are good for synthesizing images, but not for making videos. They have a hard time preserving temporal consistency, or keeping the same image aligned from one frame to the next.
The best-known audio “deepfakes” also don’t use GANs. When Canadian AI company Dessa (now owned by Square) used the talk show host Joe Rogan’s voice to utter sentences he never said, GANs were not involved. In fact, the lion’s share of today’s deepfakes are made using a constellation of AI and non-AI algorithms.
The most impressive deepfake examples tend to come out of university labs and the startups they seed: a widely reported video showing soccer star David Beckham speaking fluently in nine languages, only one of which he actually speaks, is a version of code developed at the Technical University of Munich, in Germany.
And MIT researchers have released an uncanny video of former U.S. President Richard Nixon delivering the alternate speech he had prepared for the nation had Apollo 11 failed.
But these are not the deepfakes that have governments and academics so worried. Deepfakes don’t have to be lab-grade or high-tech to have a destructive effect on the social fabric, as illustrated by nonconsensual pornographic deepfakes and other problematic forms.
Indeed, deepfakes get their very name from the ur-example of the genre, which was created in 2017 by a Reddit user calling himself r/deepfakes, who used Google’s open-source deep-learning library to swap porn performers’ faces for those of actresses. The codes inside DIY deepfakes found in the wild today are mostly descended from this original code—and while some might be considered entertaining thought experiments, none can be called convincing.
So why is everyone so worried? “Technology always improves. That’s just how it works,” says Hany Farid, a digital forensics expert at the University of California, Berkeley. There’s no consensus in the research community about when DIY techniques will become refined enough to pose a true threat—predictions vary wildly, from 2 to 10 years. But eventually, experts concur, anyone will be able to pull up an app on their smartphone and produce realistic deepfakes of anyone else.
The clearest threat that deepfakes pose right now is to women—nonconsensual pornography accounts for 96 percent of deepfakes currently deployed on the Internet. Most target celebrities, but there are an increasing number of reports of deepfakes being used to create fake revenge porn, says Henry Ajder, who is head of research at the detection firm Deeptrace, in Amsterdam.
But women won’t be the sole targets of bullying. Deepfakes may well enable bullying more generally, whether in schools or workplaces, as anyone can place people into ridiculous, dangerous, or compromising scenarios.
Corporations worry about the role deepfakes could play in supercharging scams. There have been unconfirmed reports of deepfake audio being used in CEO scams to swindle employees into sending money to fraudsters. Extortion could become a major use case. Identity fraud was the top worry regarding deepfakes for more than three-quarters of respondents to a cybersecurity industry poll by the biometric firm iProov. Respondents’ chief concerns were that deepfakes would be used to make fraudulent online payments and hack into personal banking services.
For governments, the bigger fear is that deepfakes pose a danger to democracy. If you can make a female celebrity appear in a porn video, you can do the same to a politician running for reelection. In 2018, a video surfaced of João Doria, the governor of São Paulo, Brazil, who is married, participating in an orgy. He insisted it was a deepfake. There have been other examples. In 2018, the president of Gabon, Ali Bongo, who was long presumed unwell, surfaced on a suspicious video to reassure the population, sparking an attempted coup.
The ambiguity around these unconfirmed cases points to the biggest danger of deepfakes, whatever its current capabilities: the liar’s dividend, which is a fancy way of saying that the very existence of deepfakes provides cover for anyone to do anything they want, because they can dismiss any evidence of wrongdoing as a deepfake. It’s one-size-fits-all plausible deniability. “That is something you are absolutely starting to see: that liar’s dividend being used as a way to get out of trouble,” says Farid.
Several U.S. laws regarding deepfakes have taken effect over the past year. States are introducing bills to criminalize deepfake pornography and prohibit the use of deepfakes in the context of an election. Texas, Virginia, and California have criminalized deepfake porn, and in December, the president signed the first federal law as part of the National Defense Authorization Act. But these new laws only help when a perpetrator lives in one of those jurisdictions.
Outside the United States, however, the only countries taking specific actions to prohibit deepfake deception are China and South Korea. In the United Kingdom, the law commission is currently reviewing existing laws for revenge porn with an eye to address different ways of creating deepfakes. However, the European Union doesn’t appear to see this as an imminent issue compared with other kinds of online misinformation.
So while the United States is leading the pack, there’s little evidence that the laws being put forward are enforceable or have the correct emphasis.
And while many research labs have developed novel ways to identify and detect manipulated videos—incorporating watermarks or a blockchain, for example—it’s hard to make deepfake detectors that are not immediately gamed in order to create more convincing deepfakes.
Still, tech companies are trying. Facebook recruited researchers from Berkeley, Oxford, and other institutions to build a deepfake detector and help it enforce its new ban. Twitter also made big changes to its policies, going one step further and reportedly planning ways to tag any deepfakes that are not removed outright. And YouTube reiterated in February that it will not allow deepfake videos related to the U.S. election, voting procedures, or the 2020 U.S. census.
But what about deepfakes outside these walled gardens? Two programs, called Reality Defender and Deeptrace, aim to keep deepfakes out of your life. Deeptrace works on an API that will act like a hybrid antivirus/spam filter, prescreening incoming media and diverting obvious manipulations to a quarantine zone, much like how Gmail automatically diverts spam before it reaches your inbox. Reality Defender, a platform under construction by the company AI Foundation, similarly hopes to tag and bag manipulated images and video before they can do any damage. “We think it’s really unfair to put the responsibility of authenticating media on the individual,” says Adjer.