Your Candy Wrappers are Listening

“I had to double check I wasn’t playing the wrong audio file.”

The first time Abe Davis coaxed intelligible speech from a silent video of a bag of crab chips (an impassioned recitation of “Mary Had a Little Lamb”) he could hardly believe it was possible. Davis is a Ph.D. candidate at MIT, and his group’s image processing algorithm can turn everyday objects into visual microphones—deciphering the tiny vibrations they undergo as captured on video.

The research, which will be presented at the computer graphics conference SIGGRAPH 2014 next week, builds on work from MIT’s Computer Science and Artificial Intelligence Laboratory to capture movement on video much smaller than a single pixel. By seeing how border pixels on an object fluctuated in color, the group’s algorithm can measure and calculate the object's minuscule movements (and even magnify a wine glass’s oscillations when a tone is played or visually reveal a heartbeat under the skin).

“It was clear for us quickly that there’s a strong relation between sound and visual motion,” says Michael Rubinstein, a postdoc at Microsoft Research who worked on this and the earlier CSAIL research. “We had this crazy idea: can we actually use videos to recover sound?”

The first speech recovered from the chip bag can be played below. (Future recordings were much clearer, but probably less funny.)

According to Davis, previous ways to recover sound remotely require more than just a video camera. By shining a laser on a vibrating object and measuring how the light scatters or how its phase changes, other researchers have been able to pull out detailed data about the sound.

The team’s processing algorithm lets them take a new tack: a completely passive recovery of the sound. By recording objects’ movements on high-frame-rate video, in ambient lighting—no laser needed—they are able to translate the vibrations caused by speech and music back to sound waves, with only a little bit of noise.

The group found that a number of factors affected how well the sound could be captured: for instance, low-frequency sounds were easier than high-frequency, which needed a faster frame rate, and smaller movements required a stronger zoom to catch. Low frame-rate footage from an ordinary digital camera posed a particular challenge because less signal could get through. But because of the way a “rolling-shutter” camera processes inputs, it could be made to exceed its frame rate and gather enough details to recover comprehensible sounds.

And then there were the test items themselves: “We asked ourselves what objects are going to be good visual microphones,” says Rubinstein. “It turns out those are objects like paper bags, chip bags, and aluminum foil that are very light and kind of rigid.” On those types of objects, the vibrations move the entire object, so there is less noise to filter out.

The group tested an eclectic selection of materials, including a bag of chips (excellent), a soda can (surprisingly mediocre), and a potted plant (average). They were even able to recreate music playing using footage of the vibrating ear buds. The best material of all was the thin foil wrapper on a Lindt chocolate bar Davis had been snacking on.

The worst was a brick, which they intended to use for measuring experimental error. Even that “did better than we expected it to do,” says Davis.

The researchers are learning how to predict how well any given microphone and camera setup will work, and are even looking into analyzing “found” footage—although the compression algorithms most video goes through eliminate the slight variations they need to analyze. The technique could be helpful anywhere sound can’t carry, or even to identify the composition of the objects themselves, and they plan to release the code behind it on the project’s website.

“Most people, their mind immediately goes to espionage and spying,” says Davis. “But I think that probably the most important applications of this are yet to be found. We just discovered the signal is there, and now we can start asking what to do with it.”

“What we’re doing pushes the boundary of what you can do with just cameras,” adds Rubinstein.

Watch the video below:

audio algorithms spying visual microphone processing video

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Your Candy Wrappers are Listening

Visual microphone reconstructs nearby sound from silent videos of ordinary objects

This IEEE Society’s Secret to Boosting Student Membership

Why Haven’t Hoverbikes Taken Off?

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Related Stories

Turn a Vintage Hi-Fi Into a Modern Entertainment Center

Build Your Own Hi-fi Ear Defenders

Deep Learning Could Bring the Concert Experience Home

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Your Candy Wrappers are Listening

Visual microphone reconstructs nearby sound from silent videos of ordinary objects

This IEEE Society’s Secret to Boosting Student Membership

Why Haven’t Hoverbikes Taken Off?

Ukraine Is Riddled With Land Mines. Drones and AI Can Help

Related Stories

Turn a Vintage Hi-Fi Into a Modern Entertainment Center

Build Your Own Hi-fi Ear Defenders

Deep Learning Could Bring the Concert Experience Home