Automatic Speaker Verification Systems Can Be Fooled by Disguising Your Voice

Automatic speaker verification (ASV) systems are sometimes used to grant access to sensitive information and identify suspects in a court of law. Increasingly, they are being baked into consumer devices, such as Amazon’s Echo and Google’s Home, to respond to person-specific commands, such as “play my music” or “read my email.”

But such systems make mistakes when speakers disguise their voices to sound older or younger, according to a new study published in Speech Communication by researchers from the University of Eastern Finland. Earlier research by the same group has shown that some ASV systems can’t distinguish between a professional impersonator and the person they are imitating.

It’s hard to tell how similar the systems tested for these studies are to commercial technologies, but Tomi Kinnunen, a coauthor and computer scientist at the University of Eastern Finland, says they’re probably not too far off. “There are many variants of how this is implemented in practice, but pretty much, they are still based on a lot of machine learning and signal processing,” he says.

Specifically, the researchers found that the equal error rate of an ASV system—a measure that captures times when the system mistook the same speaker for someone else, and when it tagged different speakers as the same person—increased by 11 times for male speakers and six times for female speakers who tried to sound younger than they were. When speakers tried to sound older, the system’s equal error rate increased by seven times for males and five times for females.

This means that people can fool ASV systems by changing the sound of their own voice. Speaking at a higher frequency, which most speakers did to produce a youthful voice, proved a more effective disguise than imitating an older person’s voice. For the latter, speakers also raised the frequency of their own voice, but not as high as for the younger version.

Rosa González Hautamäki, a coauthor who defended her dissertation on this topic last Thursday, says ASV systems can’t easily detect changes to the fundamental (or lowest) frequency of a person’s voice. The fundamental frequency can rise and fall as someone speaks, but ASV systems mistake these changes for a new speaker. “Including this feature would make these systems more robust to these kinds of challenges,” she says.

Historically, Kinnunen says, automatic speaker verification systems have shown higher error rates for female speakers (who often speak at a higher frequency) versus male speakers. In their test, the opposite was true—perhaps because both male and female speakers attempted to raise the frequencies of their voices to sound younger and older, and female speakers were already starting at a higher frequency.

When combined with other authentication methods, a “voiceprint” may be a useful identifier. But Ben Fisher, CEO at Magic & Co., a technology consultancy in New York City, says it should generally not be trusted on its own. In addition to being susceptible to voice mimicry and disguise, voiceprints can be thwarted by simply recording someone saying a phrase (such as “open”) and replaying that recording.

Increasingly, software has made it easier to synthesize entire sentences and conversations in someone’s voice. Fisher points to Lyrebird, an AI-based service that can make a digital copy of a voice based on a 60-second clip, and use that copy to produce new sentences that the original speaker never uttered. Google is working on a similar project called WaveNet.

Fisher says fraud detection software can identify synthesized and recorded voices with relatively high accuracy—about 90 to 95 percent. But that’s still not good enough to trust automatic speaker verification systems to grant access to bank accounts and secure areas, or for other sensitive applications. And fraud detection software often requires advanced algorithms and processors not available on consumer devices, where voice is quickly becoming the primary interface.

There are also biological reasons why voice is an imperfect mode of identification—people’s voices change as they age, and even when they’re sick. These challenges have so far limited the utility of voice as a biometric. “It’s a real concern,” Fisher says. “One of the reasons you haven’t seen it everywhere is because of this problem.”

Realistically, a professional impersonator is unlikely to walk into someone’s home and imitate their voice in order to listen to their favorite Amazon Music playlist. And Kinnunen and Hautamäki say they don’t know of any case in which a criminal or impersonator has used voice disguise or mimicry to fool an ASV system for nefarious purposes.

However, Fisher believes the risk of voice-enabled hacking will worsen over time. “I think this will be a larger problem,” he says. “The ability to make a threat out of it is growing faster than the defenses.”

security wireless gadgets voice biometrics Amazon consumer devices speaker verification systems Google telecommunications

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Automatic Speaker Verification Systems Can Be Fooled by Disguising Your Voice

Such systems are used to build evidence in criminal cases, and grant access to personal information

Hyundai’s Ioniq 5 N Accelerates Performance Tech

Boston Dynamics’ Robert Playter on the New Atlas

Hello, Electric Atlas

Related Stories

Self-Destructing Circuits and More Security Schemes

Your Tablet’s Light Sensor Can Spy On You

Bellingcat Crowdsources Spycraft, Scales Up Sleuthing

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Automatic Speaker Verification Systems Can Be Fooled by Disguising Your Voice

Such systems are used to build evidence in criminal cases, and grant access to personal information

Hyundai’s Ioniq 5 N Accelerates Performance Tech

Boston Dynamics’ Robert Playter on the New Atlas

Hello, Electric Atlas

Related Stories

Self-Destructing Circuits and More Security Schemes

Your Tablet’s Light Sensor Can Spy On You

Bellingcat Crowdsources Spycraft, Scales Up Sleuthing