This AI Watched 100 Films to Learn How to Recognize a Kiss

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Like someone who has never been kissed, AI began learning the basics by binge-watching romantic film clips to see how Hollywood stars lock lips. By training deep learning algorithms that have already proven adept at recognizing faces and objects to also recognize steamy kissing scenes dramatized by professional actors, a data scientist has shown how AI systems could gain greater insight into the most intimate human activities.

The study of AI-based kiss detection came from Amir Ziai, a senior data scientist at Netflix, as he was completing coursework to obtain an AI graduate certificate at Stanford University. Ziai handpicked a representative sample of 100 films from a database of Hollywood films spanning the past century. Then he manually labeled different film segments as either kissing or non-kissing scenes, and used still frames and sound clips from those segments to train deep learning algorithms to detect both the sights and sounds of smooching.

Lest anyone get the wrong impression, it’s still unclear whether or not the kiss detection method works with more sexual scenes that go beyond kissing. “In my training set, I’ve stayed away from overly sexual scenes to make sure that the model is not confusing kissing and sex,” Ziai says.

Ziai’s current employer Netflix was not involved in the Stanford-based research that is detailed in a paper published on the preprint server arXiv. And Ziai has not investigated any possible applications of such technology for Netflix. But it’s not hard to imagine the possible commercial applications that could interest Netflix or other companies such as YouTube, Facebook, Instagram, and TikTok that handle huge amounts of streaming or stored video.

Back in April 2019, Google announced that its Pixel smartphones had received a Photobooth feature update that allowed the phones to automatically snap photos whenever they detected kissing in a single frame taken by the smartphone camera. Ziai’s demonstration of kiss detection technology that works with videos hints at future applications that could automatically categorize video content, create personalized video recommendations for viewers, and possibly even screen out certain videos as part of online content moderation.

“This is a good example of how modern computer vision techniques make it fairly easy to develop specific ‘sense and respond’ software, cued to qualitative/unstructured things (like the presence of kissing in a scene),” said Jack Clark, strategy and communications director at OpenAI, in his Import AI newsletter, which recently highlighted the kiss detection study. “I think this is one of the most under-hyped aspects of how AI is changing the scope of individual software development.”

When it came time to visually identify kissing scenes, the deep learning model that proved most successful was ResNet-18, an image classification algorithm that was already pre-trained on more than one million images from the popular ImageNet database. To listen for the sounds of kissing, a deep learning model known as VGGish trained on the last 960 milliseconds of audio from one-second segments of each scene.

That two-pronged approach of training AI to process both images and audio of kissing helped the overall model achieve a fairly impressive F1 score of 0.95—a measure that represents the weighted average of the algorithm’s accuracy regarding both false positives and false negatives.

But the model still stumbled when it encountered trickier video editing and camera perspectives in some film scenes. For example, wide shots of actors kissing sometimes confused the algorithm because most of the camera frame consisted of background scenery. Fast-paced video cuts and shots that didn’t include both actors also proved challenging.

It’s always difficult to figure out which particular data patterns lead deep learning models to make their predictions. One way for humans to try to understand AI logic involves using saliency maps to highlight the data that received the most attention from the AI during its analysis. In the case of the Hollywood kissing scenes, the deep learning models seemed to pay more attention to image pixels related to the actors’ faces.

Some “limited experimentation” also suggests that the AI relied more heavily on visual features rather than audio in order to identify kissing scenes, Ziai says. He observed that the kiss detection system could benefit from a “more carefully crafted dataset” and perhaps make use of more contextual information beyond just still images to detect kissing.

It’s still unclear how well an AI model trained on just 100 Hollywood films such as Anna Karenina (1935), Ghost (1990), and Casino Royale (2006) would work in a larger dataset of films. But the model saw only “marginal improvement” after the training dataset grew beyond 80 videos, Ziai says. The Hollywood film dataset and some of the computing resources were provided by the lab of Kayvon Fatahalian, an assistant professor of computer science at Stanford University.

Another question is whether such an AI model could perform with similar accuracy in detecting kiss scenes in the types of videos commonly shared on social media. That challenge would probably require additional training on a much larger video dataset with examples going beyond on-screen Hollywood couples such as Patrick Swayze and Demi Moore. Still, some very preliminary testing suggests that this broader application of AI-powered kiss detection shows promise.

“The attempt in this study was to use a diverse dataset so that the model does not overfit to any particular type of movie, ” Ziai says. “Anecdotally, the model seem to work reasonably well on a few YouTube videos that I found.”

internet stanford university netflix big data ai machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

This AI Watched 100 Films to Learn How to Recognize a Kiss

A senior data scientist at Netflix trained an AI to detect kissing scenes in films—and had to take precautions to make sure the model didn’t confuse kissing with sex

Are We Testing AI Intelligence the Wrong Way?

Why BYD's Hybrid Is Perfect for Brazil

Room-Size Particle Accelerators Go Commercial

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

This AI Watched 100 Films to Learn How to Recognize a Kiss

A senior data scientist at Netflix trained an AI to detect kissing scenes in films—and had to take precautions to make sure the model didn’t confuse kissing with sex

Are We Testing AI Intelligence the Wrong Way?

Why BYD's Hybrid Is Perfect for Brazil

Room-Size Particle Accelerators Go Commercial

Related Stories

DeepMind's Robots Play Infinite Table Tennis

Why the Nobel Prize in Physics Went to AI Research

15 Graphs That Explain the State of AI in 2024