Machines Just Got Better at Lip Reading

Soccer aficionados will never forget the headbutt by French soccer great Zinadine Zidane during the 2006 World Cup final. Caught on video camera, Zidane’s attack on Italian player Marco Materazzi after a verbal exchange got him a red ticket. He left the field, making it easier for Italy to become world champions. The world found out later about Materazzi’s abusive words of Zidane’s female relatives.

“If we had good lip-reading technology Zidane’s reaction could have been explained or they would’ve both gotten sent out,” says Helen Bear, a computer scientist at the University of East Anglia in Norwich, UK. “Maybe the match outcome would be different.”

Bear and her colleague Richard Harvey have come up with a new lip-reading algorithm that improves a computer’s ability to differentiate between sounds—such as ‘p’, ‘b,’ and ‘m’—that all look similar on lips. The researchers presented their work at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) in Shanghai.

A machine that reliably reads lips would have uses beyond sport rulings, of course. It could be used to solve crimes or analyze car and airplane accidents based on recorded footage, Bear says. It could help people who go deaf later in life, for whom lip-reading doesn’t come as easily as to those who are born with the impairment. It could also be used for better movie dubbing.

Lip-reading, or visual speech recognition, involves identifying shapes that the mouth makes and then mapping those to words. It is more challenging than the audio speech recognition that are common today. That’s because the mouth assumes between 10 and 14 shapes, called visemes, while speech has 50 different sounds called phonemes. So the same viseme can correlate to multiple phonemes.

Bear and Harvey have developed a new machine-learning algorithm that more precisely maps a viseme to one particular phoneme. The algorithm involves two training steps. In the first, the computer learns to map a viseme to the multiple phonemes it can represent. In the second, the viseme is duplicated—say three times if it looks like ‘p’, ‘b’ and ‘m’—, and each copy trains on just one of those sounds.

The data to train the algorithm came from audio and video recordings of 12 speakers (7 men and 5 women) speaking 200 sentences. Bear used known computer vision algorithm that extract the shape of their mouths. She then labeled the extracted data with appropriate visemes and the audio data with phonemes and fed it to her training algorithm.

The algorithm identifies sounds correctly 25% of the time, an improvement over past methods, Bear says. And it recognizes words 5% better than on average for all speakers, which, Bear says, is a significant increase given the low accuracy of speech-recognition systems that have been developed so far.

software audio speech recognition lip reading machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Machines Just Got Better at Lip Reading

New speech recognition technology can distinguish sounds that look the same on lips, making lip reading easier for machines

This AI Can Beat You At Rock-Paper-Scissors

NTT's Photonics to Slash Data Center Energy Use

Electric Salt Devices Make Low-Salt Food Tastier

Related Stories

Why IT Projects Repeat Costly Mistakes

Trillions Spent and Big Software Projects Are Still Failing

Airflow: From Stagnation to Millions of Downloads

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Machines Just Got Better at Lip Reading

New speech recognition technology can distinguish sounds that look the same on lips, making lip reading easier for machines

This AI Can Beat You At Rock-Paper-Scissors

NTT's Photonics to Slash Data Center Energy Use

Electric Salt Devices Make Low-Salt Food Tastier

Related Stories

Why IT Projects Repeat Costly Mistakes

Trillions Spent and Big Software Projects Are Still Failing

Airflow: From Stagnation to Millions of Downloads