Hey there, human — the robots need you! Vote for IEEE’s Robots Guide in the Webby Awards.

Close bar

How to Silently Hack a Smart Speaker

A technique using ultrasound can deliver voice commands to systems such as Google Home and Amazon Echo without bystanders picking up on the cues

3 min read

Strategically placed speakers that emit ultrasound can be used to hack voice-assisted devices.
Photo-illustration: Waseda University/IEEE Spectrum

“Okay, Google. Turn the volume up to max.”

Imagine if this voice command was applied to your Google Home system without you hearing it. A group of researchers in Japan have shown that this is possible, by using strategically placed speakers that emit ultrasound to hack voice-assisted devices.

Voice assistance systems such as Siri, Google Assistant, and Amazon Alexa has grown in popularity in recent years, but these remain vulnerable to several different forms of hacking—attacking either the software or hardware that the systems rely on.

“An effective countermeasure against these attacks is to fix the vulnerability,” explains Ryo Iijima, a researcher at Waseda University. “In contrast, our new Audio Hotspot Attack technique leverages a physical phenomenon [that occurs as ultrasound waves travel] in the air, which cannot be changed in nature.”

His team’s new hacking technique, described in a study published in the 19 November edition of IEEE Transactions on Emerging Topics in Computing, relies on parametric speakers. That type of speaker is capable of emitting ultrasound (inaudible to the human hear) in a directional manner. Placed at the right location and directed towards the voice assistance system, these speakers can emit ultrasound waves containing the secret voice command, which only becomes audible at certain distances.

The results suggest that attacks from 3.5 meters are the most successful, but the hallway experiments show that this technique is effective from distances as far as 12 m.

The delay in audibility happens because the frequency of the sound wave changes as it moves through the air, eventually hitting a threshold that is audible only once the sound wave is in close proximity of the targeted voice-assisted system.  What’s more, the researchers used an amplifier to ensure that the volume of the voice command is normal by the time it reaches its target.  

The attack could be carried out with a single parametric speaker directed at the voice assistance system, or two speakers whose sound waves “cross” right at the location of the system. The first approach yields a stronger voice command, while the latter approach offers more precision and a lower chance of being heard by someone standing close to the target. 

The researchers tested their ultrasound approach with 18 volunteers under two different scenarios: a standard size room and a long hallway. For their experiments, they attempted to impose their ultrasound commands onto two smart speakers, Amazon Echo and Google Home, from several distances with 0.5-meter increments. A smart phone was used to generate malicious voice commands from the parametric loudspeakers.

An attack was considered successful if three consecutive voice commands were accepted at a given distance. The results suggest that attacks from 3.5 meters are the most successful, but the hallway experiments show that this technique is effective from distances as far as 12 m. Google Home was more vulnerable to attack than Amazon Echo.

“We knew the theory of directional sound beams before we started our experiments,” says Iijima. “But what was surprising to us after we performed the experiments was that such directional sound beams were actually non-recognizable by the participants of our experiments, while [the sound] successfully activated all speakers.”

In this study, the voiceprint technology required to authenticate a user was deactivated, so a generic, computer-generated voice was able to activate the systems and provide commands. However, the research team says they’ve done preliminary experiments with recorded voiceclips of legitimate users, finding that these could be transmitted inaudibly via ultrasound and be used to activate the voice assistance systems.

A key limitation of this approach is the fact that any objects between the parametric speakers and the system being hacked would interfere with the transmission of the sound waves. “One possible method of overcoming this limitation would be to install parametric loudspeakers on a ceiling, thus creating a ‘sound shower,’” Iijima explains.

The Conversation (0)