Record Broken: Decoding Words From Brain Signals

Two brain-computer interfaces give speechless people more words per minute than ever before

4 min read
A woman with wires coming from an implant in her head looks at an avatar on a screen, below which the words What did you say? are displayed.

A woman who lost the ability to talk 18 years ago uses a speech prosthesis system to translate her brain signals into synthesized speech.

Noah Berger

Two new brain-computer interfaces (BCI)—electronic bridges between mind and machine—decode words from people’s unspoken but intended speech. The two systems, presented in papers published this week in the journal Nature, demonstrate performance breakthroughs for speech-decoding BCI developed by scientists at Stanford University and the University of California, San Francisco.

Both papers present new research resulting from years of work with patients who have lost their ability to speak. The first paper presents a speech BCI developed by research team led by Jaimie Henderson at Stanford University working with an ALS patient they refer to as “T12” to protect her privacy. The Stanford team developed a BCI that determines the words T12 intends to say based on recordings of electrical activity collected from her brain. The recordings are made using electrode arrays implanted into a region of T12’s cortex that is believed to play a role in the articulation and vocalization of speech.

A bespectacled woman with wires coming from an implant in her head looks at a monitor which displays a green square and the words Bring my glasses closer please. To the left, a researcher looks on.The Stanford team’s brain-computer interface achieved an average word decoding rate of 62 words per minute. Steve Fisch

The electrode recordings collected from T12 were then used to train a deep-learning model to associate patterns of neural activity with the intention to vocalize individual words. This was done in two stages, with the first mapping brain recordings to sequences of distinct phonemes—individual units of sound within words—before a second stage collected those sounds into words. The resulting system can be thought of as a digital prosthesis for human speech, converting one’s intention to vocalize into a series of sounds and then converting those sounds into known words.

This approach appears to have paid off: The device achieved an average word decoding rate of 62 words per minute, more than three times as fast as the previous record of 18 wpm, which happened to be set by the same Stanford research group developing a separate system to decode handwriting from neural activity.

Francis Willett, a researcher at Stanford and the first author of the new study, indicated that T12 may be able to use the device to communicate even faster: “The rate limit isn’t the algorithm,” says Willett. “If we asked her to go as fast as she could, or tried to train her to go faster—how much faster she could go is an open question.”

An ECoG Array Catches Neural Signals, Machine Learning Decodes Them

The second paper, authored by researchers led by Edward Chang at UC San Francisco, presents a different BCI device built to decode speech from brain activity. Working with a participant who lost her ability to speak after a brain-stem stroke 18 years ago, the team developed a device that converts recordings of the participant’s neural activity into text and audio reconstructions of her intended speech. The publication is a continuation of previous research from the lab into the neural basis of speech production.

How a Brain Implant and AI Gave a Woman with Paralysis Her Voice BackUCSF

Although this BCI uses a machine-learning approach similar to Stanford’s to map brain activity patterns to speech reconstructions, it differs by using electrocorticogram (ECoG) electrodes placed on the brain’s surface (instead of the implanted electrode arrays used by the other team). The UC San Francisco team also targeted brain areas involved with the movements of the vocal tract, picking up the participant’s intended movements of her lips, tongue, and jaw.

The woman using UC San Francisco’s system achieved an output rate of 78 wpm, which was faster than the Stanford team’s device and more than four times as fast as the previous record. UC San Francisco’s system is also able to reconstruct the user’s speech in both text and audio, while the Stanford team’s device outputs only text.

Doctoral student Sean Metzger, the first author of the UC San Francisco paper, says that ECoG signals collected from the brain’s surface enable more stable speech decoding than signals from individual neurons collected through implanted electrode arrays. “I think the big advantage of ECoG is the stability we see,” says Metzger. “We saw we could stop training the decoder and it would work really well for a long time after that. With the single units, you have to retrain your model every day.”

The UC San Francisco team also developed, in association with the animation firm Speech Graphics, a facial avatar system to accompany audio reconstructions. The avatar, a programmable image of a human face, moves to re-create the experience of seeing the BCI user speak. The avatar’s facial motions are controlled by a machine-learning model trained to recognize articulations of particular sounds and facial movements from patterns in neural activity. When asked how she felt about the avatar system, the UC San Francisco study participant said it could help her achieve her dream of becoming a counselor, stating that the avatar would ease communication with clients through the BCI: “I think the avatar would make them more at ease.”

Though both of these devices fall short by half of an average person’s word output rate while speaking, which is roughly 160 wpm, they do mark substantial improvements in speech prosthetic capabilities. Moving forward, both research teams are working to boost performance and accuracy by using more electrodes and better hardware: “Right now we’re getting one out of every four words wrong,” says Willett. “I hope the next time you talk to us we’re getting one out of every 10 words wrong.”

The Conversation (0)