Noisy and Stressful? Or Noisy and Fun? Your Phone Can Tell the Difference

Smartphones for several years now have had the ability to listen non-stop for wake words, like “Hey Siri” and “OK Google,” without excessive battery usage. These wake-up systems run in special, low-power processors embedded within a phone’s larger chip set. They rely on algorithms trained on a neural network to recognize a broad spectrum of voices, accents, and speech patterns. But they only recognize their wake words; more generalized speech recognition algorithms require the involvement of a phone’s more powerful processors.

Today, Qualcomm announced that Snapdragon 8885G, its latest chipset for mobile devices, will be incorporating an extra piece of software in that bit of semiconductor real estate that houses the wake word recognition engine. Created by Cambridge, U.K. startup Audio Analytic, the ai3-nano will use the Snapdragon’s low-power AI processor to listen for sounds beyond speech. Depending on the applications made available by smartphone manufacturers, the phones will be able to react to such sounds as a doorbell, water boiling, a baby’s cry, and fingers tapping on a keyboard—a library of some 50 sounds that is expected to grow to 150 to 200 in the near future.

The first application available for this sound recognition system will be what Audio Analytic calls Acoustic Scene Recognition AI. Instead of listening for just one sound, the scene recognition technology listens for the characteristics of all the ambient sounds to classify an environment as chaotic, lively, boring, or calm. Audio Analytic CEO and founder Chris Mitchell explains.

“There are two aspects to an environment,” he says, “eventfulness, which refers to how many individual sounds are going on, and how pleasant we find it. Say I went for a run, and there were lots of bird sounds. I would likely find that pleasant, so that would be categorized as ‘lively.’ You could also have an environment with a lot of sounds that are not pleasant. That would be ‘chaotic.’”

Mitchell’s team selected those four categories after reviewing studies about perceptions of sound. They then used its custom-created dataset of 30 million audio recordings to train the neural network.

What a mobile device will do with its newfound awareness of ambient sounds will be up to the manufacturers that use the Qualcomm platform. But Mitchell has a few ideas.

“A train, for example, is boring,” he says. “So you might want to increase the active noise cancellation on your headphones to remove the typical low hum. But when you get off the tube, you want more transparency—so you can hear bike messengers, so noise cancellation should be reduced. On a smartphone you could also adjust notifications based on the type of environment, whether it vibrates or rings, or what sort of ring tone is used.”

I first met Mitchell two years ago, when the company was demonstrating prototypes of how its audio analysis technology would work in smart speakers. Since then, Mitchell reports, products using the company’s technology are available in some 150 countries. Most are security and safety systems, recognizing the sound of breaking glass, a smoke alarm, or a baby’s cry.

Audio Analytic’s approach, Mitchell explained to me, involves using deep learning to break sounds into standard components. He uses the word “ideophones” to refer to these components. The term also refers to the representation of a sound in speech, like “quack.” Once sounds are coded as ideophones, each can be recognized just as digital assistants’ systems recognize their wake words. This approach allows the ai3-nano engine to take up just 40 KB and run completely on the phone without connecting to a cloud-based processor.

Once the technology is established in smartphones, Mitchell expects its applications will grow beyond security and scene recognition. Early instances, he expects, will include media tagging, games, and accessibility.

For media tagging, he says, the system can search phone-captured video by sound. So, for example, a parent can easily find a clip of a child laughing. Or children could use this technology in a game that has them make the sounds of an animal—say a duck or a pig. Then for completing the task, the display could put a virtual costume on them.

As for accessibility, Mitchell sees the technology as a boon to the hard of hearing, who already rely on mobile phones as assistive devices. “This can allow them to detect [and specifically identify] a knock on the door, a dog barking or a smoke alarm,” he says.

After rolling out additional sound recognition capabilities, they expect to work next on identifying context beyond specific events or scenes. “We have started doing early stage research in that area,” he says. “So our system can say ‘It sounds like you are making breakfast’ or ‘It sounds like you are getting ready to leave the house.’” Which would allow apps to take advantage of that information in arming a security system or adjusting lights or heat.

embedded ai machine learning portable devices Qualcomm Snapdragon

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Noisy and Stressful? Or Noisy and Fun? Your Phone Can Tell the Difference

Qualcomm’s latest smart phone chips will be able to identify soundscapes thanks to a small UK startup

Hyundai’s Ioniq 5 N Accelerates Performance Tech

Boston Dynamics’ Robert Playter on the New Atlas

Hello, Electric Atlas

Related Stories

Too Perilous For AI? EU Proposes Risk-Based Rules

How Adversarial Attacks Could Destabilize Military AI Systems

AI Deception: When Your Artificial Intelligence Learns to Lie

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Noisy and Stressful? Or Noisy and Fun? Your Phone Can Tell the Difference

Qualcomm’s latest smart phone chips will be able to identify soundscapes thanks to a small UK startup

Hyundai’s Ioniq 5 N Accelerates Performance Tech

Boston Dynamics’ Robert Playter on the New Atlas

Hello, Electric Atlas

Related Stories

Too Perilous For AI? EU Proposes Risk-Based Rules

How Adversarial Attacks Could Destabilize Military AI Systems

AI Deception: When Your Artificial Intelligence Learns to Lie