New AI Dupes Humans Into Believing Synthesized Sound Effects Are Real

Imagine you are watching a scary movie: The heroine creeps through a dark basement, on high alert. Suspenseful music plays in the background, while some unseen, sinister creature creeps in the shadows…and then–BANG! It knocks over an object.

Such scenes would hardly be as captivating and scary without the intense, but perfectly timed sound effects, like the loud bang that sent our main character wheeling around in fear. Usually these sound effects are recorded by Foley artists in the studio, who produce the sounds using oodles of objects at their disposal. Recording the sound of glass breaking may involve actually breaking glass repeatedly, for example, until the sound closely matches the video clip.

In a more recent plot twist, researchers have created an automated program that analyzes the movement in video frames and creates its own artificial sound effects to match the scene. In a survey, the majority of people polled indicated that they believed the fake sound effects were real. The model, AutoFoley, is described in a study published 25 June in IEEE Transactions on Multimedia.

“Adding sound effects in postproduction using the art of Foley has been an intricate part of movie and television soundtracks since the 1930s,” explains Jeff Prevost, a professor at the University of Texas at San Antonio who cocreated AutoFoley. “Movies would seem hollow and distant without the controlled layer of a realistic Foley soundtrack. However, the process of Foley sound synthesis therefore adds significant time and cost to the creation of a motion picture.”

Intrigued by the thought of an automated Foley system, Prevost and his Ph.D. student, Sanchita Ghose, set about creating a multilayered machine-learning program. They created two different models that could be used in the first step, which involves identifying the actions in a video and determining the appropriate sound.

The first machine-learning model extracts image features (such as color and motion) from the frames of fast-moving action clips to determine an appropriate sound effect.

The second model analyzes the temporal relationship of an object in separate frames. By using relational reasoning to compare different frames across time, the second model can anticipate what action is taking place in the video.

In a final step, sound is synthesized to match the activity or motion predicted by one of the models. Prevost and Ghose used AutoFoley to create sound for 1,000 short movie clips capturing a number of common actions, like falling raining, a galloping horse, and a ticking clock.

Analysis shows–unsurprisingly–that AutoFoley is best at producing sounds where the timing doesn’t need to align perfectly with the video (such as falling rain or a crackling fire). But the program is more likely to be out of sync with the video when visual scenes contain random actions with variation in time (such as typing or thunderstorms).

Next, Prevost and Ghose surveyed 57 local college students on which movie clips they thought included original soundtracks. In assessing soundtracks generated by the first model, 73 percent of students surveyed chose the synthesized AutoFoley clip as the original piece, over the true original sound clip. In assessing the second model, 66 percent of respondents chose the AutoFoley clip over the original sound clip.

“One limitation in our approach is the requirement that the subject of classification is present in the entire video frame sequence,” says Prevost, also noting that AutoFoley currently relies on a data set with limited Foley categories. While a patent for AutoFoley is still in the early stages, Prevost says these limitations will be addressed in future research.

machine learning movies

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

New AI Dupes Humans Into Believing Synthesized Sound Effects Are Real

Using machine-learning, AutoFoley determines what actions are taking place in a video clip and creates realistic sound effects

Will Dectravalve Transform EV Charging Speeds?

Advice on Leading and Mentoring For Greater Innovation

Tiny MEMS Clock Rivals Atomic Precision

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

New AI Dupes Humans Into Believing Synthesized Sound Effects Are Real

Using machine-learning, AutoFoley determines what actions are taking place in a video clip and creates realistic sound effects

Will Dectravalve Transform EV Charging Speeds?

Advice on Leading and Mentoring For Greater Innovation

Tiny MEMS Clock Rivals Atomic Precision

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024