The June 2023 issue of IEEE Spectrum is here!

Close bar

Can AI and fMRI “Hear” Unspoken Thoughts?

Hopes for paralyzed patients counterbalance calls for mental privacy

4 min read
An animated gif showing a brain with glowing lights and strips of text

In this stylized depiction of the language-decoding process, the decoder generates multiple word sequences [paper strips], and predicts how similar each candidate word sequence is to the actual word sequence [beads of light] by comparing predictions of the user's brain responses against the actual recorded brain responses.

Jerry Tang and Alexander Huth

What are the opportunities—and challenges—to be found when AI squares off with functional MRI data? For one, researchers have discovered, reconstructing speech from brain activity seems newly possible—potentially without surgical interventions or other hardware interfaces. The group’s finding, published 1 May in Nature Neuroscience, details the process by which the team could decode the gist of perceived and imagined speech, as well as from silent videos, using only functional MRI data.

The group’s findings could someday help restore communication abilities for paralyzed patients. But—as the authors also note—the continuing development of this technology will need to negotiate a balancing act with increasingly prominent calls for mental privacy.

The science of mind reading isn’t new. There is already thriving research on brain-computer interfaces (BCIs) that can convert brain activity into speech. In fact, a clinical trial is currently underway for an implantable neuroprosthesis that does just that with great accuracy. What is new about the Texas research is that it does not require brain surgery.

The group’s decoder is a set of algorithms reusing data from functional MRIs (fMRIs), explains study leader Alexander Huth, assistant professor of neuroscience and computer science at the University of Texas at Austin. “We put people inside an MRI scanner and used that to record their brain activity,” he says. “Then we take that data, and we go and do things with it on our computers.”

How to Hear the Unspoken

To gather brain-activity data, researchers had study subjects listen to naturally spoken stories inside the fMRI scanner to see how their brains responded while listening to the words. Altogether, about 16 hours of data was collected from each subject. Next, the researchers trained the encoding model on this data set. “It takes the input that’s coming into the person’s brain, and tries to predict how their brain responds to that,” says Huth.

A person in a plaid shirt fits a helmet apparatus on top of the head of another person who is lying at the entrance of an MRI machine.Researchers trained their semantic decoder on dozens of hours of brain-activity data from participants, collected in an fMRI scanner.Nolan Zunk/University of Texas at Austin

Finally, for the decoding part, they do the reverse—that is, get the words back from brain activity. For that they used a decoding framework based on Bayesian statistics—as pioneered by Shinji Nishimoto of Osaka University. The large language model GPT-1 helped with the natural-language processing part of the system. As the neural language model was trained on a large data set of natural English word sequences, it was good at predicting the most likely ones.

“So more or less, we guess what the stimulus might have been, we guess what the words might have been, and then we check how good our guesses were by using the encoding model to predict the brain activity, and then comparing the predicted brain activity to the actual one,” Huth says.

They were surprised at how well their system worked. Not only did they find the meaning of the stimuli; they found the decoded word sequences to often capture words and phrases verbatim. They also found that they could retrieve continuous language separately from different regions of the brain. Huth feels particularly vindicated by this: “It’s a drum I’ve been beating for a long time that the right hemisphere does process language,” he says.

“If there’s some neural activity elicited by every word that you hear, what you see in one snapshot of brain activity is the smeared mess of what you heard over the last 10 or 15 seconds.”
—Alexander Huth, University of Texas at Austin

To test if the decoded text accurately captured the meaning of the stories, they also conducted a behavioral experiment by asking subjects who had only read the decoded words a set of questions. The subjects could answer more than half the questions correctly without having seen the videos.

“One of the the coolest and most surprising aspects was that it works despite the fundamental limitations of fMRIs,” Huth says. In an fMRI, the blood flow to the brain is measured as a proxy for brain activity. This is seen in the form of the blood-oxygen-level-dependent (BOLD) signal. “Even an infinite-decimal impulse of neural activity is a 10-second wave of BOLD,” Huth says. Language, on the other hand, is much faster. “If there’s some neural activity elicited by every word that you hear, what you see in one snapshot of brain activity is the smeared mess of what you heard over the last 10 or 15 seconds.”

A big part of the process was to disentangle that. But because they were looking not at individual words, but sentence-level meanings, they were working with the slower-moving gist behind the words. They used a technique called beam search that is common in natural language processing, which picks the best sequences of words, as well as considers the probabilities of the combinations of all the words that came before each one. “Because we have multiple word sequences on our beam, the model can later can discard the versions that weren’t right,” says Huth. This, he says, is crucial for accuracy.

The researchers’ AI algorithm infers groups of words from patterns displayed in fMRI scans of brain activation, as depicted here.

For their future work, the researchers hope that fast-moving advances in natural-language neural networks will result in better accuracy. So far, they have found that the bigger, modern language models work a lot better for the encoding part at least. They would also like to work with larger data sets—say, 100 or 200 hours per subject.

Despite the obvious benefits, Huth and his team’s paper once again brings to the forefront the ethical implications of “mind-reading” technologies. In this study, the researchers showed that the decoder worked only with the knowledge and cooperation of the subject—and a model trained for one person wouldn’t work with another.

But there may be problems on the horizon as things get better, as the models and brain imaging get better and more broadly generalizable from person to person, Huth says. “We think it would be better to be proactive about this, to have maybe laws that explicitly protect mental privacy from the government, from employers…before it [becomes] a thing that’s causing harm to real people.”

The Conversation (0)