A wearable for kids on the autism spectrum provides behavioral therapy via augmented reality
Imagine this scene: It's nearly dinnertime, and little Jimmy is in the kitchen. His mom is rushing to get dinner on the table, and she puts all the silverware in a pile on the counter. Jimmy, who's on the autism spectrum, wants the silverware to be more orderly, and while his mom is at the stove he carefully begins to put each fork, knife, and spoon back in its slot in the silverware drawer. Suddenly Jimmy hears shouting. His mom is loud; her face looks different. He continues what he's doing.
Now imagine that Jimmy is wearing a special kind of Google Glass, the augmented-reality headset that Google introduced in 2013. When he looks up at his mom, the head-up display lights up with a green box, which alerts Jimmy that he's “found a face." As he focuses on her face, an emoji pops up, which tells Jimmy, “You found an angry face." He thinks about why his mom might be annoyed. Maybe he should stop what he's doing with the silverware and ask her.
Our team has been working for six years on this assistive technology for children with autism, which the kids themselves named Superpower Glass. Our system provides behavioral therapy to the children in their homes, where social skills are first learned. It uses the glasses' outward-facing camera to record the children's interactions with family members; then our software detects the faces in those videos and interprets their expressions of emotion. Through an app, caregivers can review auto-curated videos of social interactions.
Over the years we've refined our prototype and run clinical trials to prove its beneficial effects: We've found that its use increases kids' eye contact and social engagement and also improves their recognition of emotions. Our team at Stanford University has worked with coauthor Dennis Wall's spinoff company, Cognoa, to earn a “breakthrough therapy" designation for Superpower Glass, which puts the technology on a fast track toward approval by the U.S. Food and Drug Administration (FDA). We aim to get health insurance plans to cover the costs of the technology as an augmented-reality therapy.
When Google Glass first came out as a consumer device, many people didn't see a need for it. Faced with lackluster reviews and sales, Google stopped making the consumer version in 2015. But when the company returned to the market in 2017 with a second iteration of the device, Glass Enterprise Edition, a variety of industries began to see its potential. Here we'll tell the story of how we used the technology to give kids with autism a new way to look at the world.
When Jimmy puts on the glasses, he quickly gets accustomed to the head-up display (a prism) in the periphery of his field of view. When Jimmy begins to interact with family members, the glasses send the video data to his caregiver’s smartphone. Our app, enabled by the latest artificial-intelligence (AI) techniques, detects faces and emotions and sends the information back to the glasses. The boundary of the head-up display lights up green whenever a face is detected, and the display then identifies the facial expression via an emoticon, emoji, or written word. The users can also choose to have an audio cue—a voice identifying the emotion—from the bone-conducting speaker within the glasses, which sends sound waves through the skull to the inner ear. The system recognizes seven facial expressions—happiness, anger, surprise, sadness, fear, disgust, and contempt, which we labeled “meh” to be more child friendly. It also recognizes a neutral baseline expression.
To encourage children to wear Superpower Glass, the app currently offers two games: “Capture the Smile,” in which the child tries to elicit happiness or another emotion in others, and “Guess the Emotion,” in which people act out emotions for the child to name. The app also logs all activity within a session and tags moments of social engagement. That gives Jimmy and his mom the ability to watch together the video of their conflict in the kitchen, which could prompt a discussion of what happened and what they can do differently next time.
The three elements of our Superpower Glass system—face detection, emotion recognition, and in-app review—help autistic children learn as they go. The kids are motivated to seek out social interactions, they learn that faces are interesting, and they realize they can gather valuable information from the expressions on those faces. But the glasses are not meant to be a permanent prosthesis. The kids do 20-minute sessions a few times a week in their own homes, and the entire intervention currently lasts for six weeks. Children are expected to quickly learn how to detect the emotions of their social partners and then, after they’ve gained social confidence, stop using the glasses.
Our system is intended to ameliorate a serious problem: limited access to intensive behavioral therapy. Although there’s some evidence that such therapy can diminish or even eliminate core symptoms associated with autism, kids must start receiving it before the age of 8 to see real benefits. Currently the average age of diagnosis is between 4 and 5, and waitlists for therapy can stretch over 18 months. Part of the reason for the shortage is the shocking 600 percent rise since 1990 in diagnoses of autism in the United States, where about one in 40 kids is now affected; less dramatic surges have occurred in some parts of Asia and Europe.
Because of the increasing imbalance between the number of children requiring care and the number of specialists able to provide therapy, we believe that clinicians must therefore look to solutions that can scale up in a decentralized fashion. Rather than relying on the experts for everything, we think that data capture, monitoring, and therapy—the tools needed to help all these children—must be placed in the hands of the patients and their parents.
Efforts to provide in situ learning aids for autistic children date back to the 1990s, when Rosalind Picard, a professor at MIT, designed a system with a headset and minicomputer that displayed emotional cues. However, the wearable technology of the day was clunky and obtrusive, and the emotion-recognition software was primitive. Today, we have discreet wearables, such as Google Glass, and powerful AI tools that leverage massive amounts of publicly available data about facial expressions and social interactions.
The design of Google Glass was an impressive feat, as the company’s engineers essentially packed a smartphone into a lightweight frame resembling a pair of eyeglasses. But with that form factor comes an interesting challenge for developers: We had to make trade-offs among battery life, video streaming performance, and heat. For example, on-device processing can generate too much heat and automatically trigger a cutback in operations. When we tried running our computer-vision algorithms on the device, that automatic system often reduced the frame rate of the video being captured, which seriously compromised our ability to quickly identify emotions and provide feedback.
Our solution was to pair Glass with a smartphone via Wi-Fi. The glasses capture video, stream the frames to the phone, and deliver feedback to the wearer. The phone does the heavy computer-vision work of face detection and tracking, feature extraction, and facial-expression recognition, and also stores the video data.
But the Glass-to-phone streaming posed its own problem: While the glasses capture video at a decent resolution, we could stream it only at low resolution. We therefore wrote a protocol to make the glasses zoom in on each newly detected face so that the video stream is detailed enough for our vision algorithms.
Our computer-vision system originally used off-the-shelf tools. The software pipeline was composed of a face detector, a face tracker, and a facial-feature extractor; it fed data into an emotion classifier trained on both standard data sets and our own data sets. When we started developing our pipeline, it wasn’t yet feasible to run deep-learning algorithms that can handle real-time classification tasks on mobile devices. But the past few years have brought remarkable advances, and we’re now working on an updated version of Superpower Glass with deep-learning tools that can simultaneously track faces and classify emotions.
This update isn’t a simple task. Emotion-recognition software is primarily used in the advertising industry to gauge consumers’ emotional responses to ads. Our software differs in a few key ways. First, it won’t be used in computers but rather in wearables and mobile devices, so we have to keep its memory and processing requirements to a minimum. The wearable form factor also means that video will be captured not by stable webcams but by moving cameras worn by kids. We’ve added image stabilizers to cope with the jumpy video, and the face detector also reinitializes frequently to find faces that suddenly shift position within the scene.
Now and Later
When an autistic child wearing Superpower Glass sees the face of a caregiver, the glasses’ head-up display lights up with a green box, which indicates that a face is present. The software identifies the facial expression, and the display shows either an emoji (happy, sad, angry, disgusted, surprised, afraid, or “meh”) or, to indicate a neutral expression, nothing at all. Later, the caregiver can use a smartphone app to look at the video captured by the smart glasses and review the interactions. Illustration: Chris Philpot
Failure modes are also a serious concern. A commercial emotion-recognition system might claim, for example, a 98 percent accuracy rate; such statistics usually mean that the system works well on most people but consistently fails to recognize the expressions of a small handful of individuals. That situation might be fine for studying the aggregate sentiments of people watching an ad. But in the case of Superpower Glass, the software must interpret a child’s interactions with the same people on a regular basis. If the system consistently fails on two people who happen to be the child’s parents, that child is out of luck.
We’ve developed a number of customizations to address these problems. In our “neutral subtraction” method, the system first keeps a record of a particular person’s neutral-expression face. Then the software classifies that person’s expressions based on the differences it detects between the face he or she currently displays and the recorded neutral estimate. For example, the system might come to learn that just because Grandpa has a furrowed brow, it doesn’t mean he’s always angry. And we’re going further: We’re working on machine-learning techniques that will rapidly personalize the software for each user. Making a human–AI interaction system that adapts robustly, without too much frustration for the user, is a considerable challenge. We’re experimenting with several ways to gamify the calibration process, because we think the Superpower Glass system must have adaptive abilities to be commercially successful.
The App: The Superpower Glass smartphone app runs the software for facial and emotional recognition and serves as an interface for the family. Parents and children can review videos together that are color-coded by the emotions detected, and the app also launches games that encourage the kids to practice identifying emotions. Images: Stanford University
We realized from the start that the system would be imperfect, and we’ve designed feedback to reflect that reality. The green box face-detection feature was originally intended to mitigate frustration: If the system isn’t tracking a friend’s face, at least the user knows that and isn’t waiting for feedback that will never come. Over time, however, we came to think of the green box as an intervention in itself, as it provides feedback whenever the wearer looks at a face, a behavior that can be noticeably different for children on the autism spectrum.
To evaluate Superpower Glass, we conducted three studies over the past six years. The first one took place in our lab with a very rudimentary prototype, which we used to test how children on the autism spectrum would respond to wearing Google Glass and receiving emotional cues. Next, we built a proper prototype and ran a design trial in which families with autistic kids took the devices home for several weeks. We interacted with these families regularly and made changes to the prototype based on their feedback.
With a refined prototype in hand, we then set out to test the device’s efficacy in a rigorous way. We ran a randomized control trial in which one group of children received typical at-home behavioral therapy, while a second group received that therapy plus a regimen with Superpower Glass. We used four tests that are commonly deployed in autism research to look for improvement in emotion recognition and broader social skills. As we described in our 2019 paper in JAMA Pediatrics, the intervention group showed significant gains over the control group in one test (the socialization portion of the Vineland Adaptive Behavior Scales [PDF]).
We also asked parents to tell us what they had noticed. Their observations helped us refine the prototype’s design, as they commented on technical functionality, user frustrations, and new features they’d like to see. One email from the beginning of our at-home design trial stands out. The parent reported an immediate and dramatic improvement: “[Participant] is actually looking at us when he talks through google glasses during a conversation…it’s almost like a switch was turned.… Thank you!!! My son is looking into my face.”
This email was extremely encouraging, but it sounded almost too good to be true. Yet comments about increased eye contact continued throughout our studies, and we documented this anecdotal feedback in a publication about that design study. To this day, we continue to hear similar stories from a small group of “light switch” participants.
We’re confident that the Superpower Glass system works, but to be honest, we don’t really know why. We haven’t been able to determine the primary mechanism of action that leads to increased eye contact, social engagement, and emotion recognition. This unknown informs our current research. Is it the emotion-recognition feedback that most helps the children? Or is our device mainly helping by drawing attention to faces with its green box? Or are we simply providing a platform for increased social interaction within the family? Is the system helping all the kids in the same way, or does it meet the needs of various parts of the population differently? If we can answer such questions, we can design interventions in a more pointed and personalized way.
The startup Cognoa, founded by coauthor Dennis Wall, is now working to turn our Superpower Glass prototype into a clinical therapy that doctors can prescribe. The FDA breakthrough therapy designation for the technology, which we earned in February 2019, will speed the journey toward regulatory approval and acceptance by health insurance companies. Cognoa’s augmented-reality therapy will work with most types of smartphones, and it will be compatible not only with Google Glass but also with new brands of smart glasses that are beginning to hit the market. In a separate project, the company is working on a digital tool that physicians can use to diagnose autism in children as young as 18 months, which could prepare these young kids to receive treatment during a crucial window of brain development.
Ultimately, we feel that our treatment approach can be used for childhood concerns beyond autism. We can design games and feedback for kids who struggle with speech and language, for example, or who have been diagnosed with attention deficit hyperactivity disorder. We’re imagining all sorts of ubiquitous AI-powered devices that deliver treatment to users, and which feed into a virtuous cycle of technological improvement; while acting as learning aids, these devices can also capture data that helps us understand how to better personalize the treatment. Maybe we’ll even gain new scientific insights into the disorders in the process. Most important of all, these devices will empower families to take control of their own therapies and family dynamics. Through Superpower Glass and other wearables, they’ll see the way forward.
This article appears in the April 2020 print issue as “Making Emotions Transparent.”
About the Authors
When Stanford professor Dennis Wall met Catalin Voss and Nick Haber in 2013, it felt like a “serendipitous alignment of the stars,” Wall says. He was investigating new therapies for autism, Voss was experimenting with the Google Glass wearable, and Haber was working on machine learning and computer vision. Together, the three embarked on the Superpower Glass project to encourage autistic kids to interact socially and help them recognize emotions.