Researchers at UC San Francisco have successfully developed a “speech neuroprosthesis” that has enabled a man with severe paralysis to communicate in sentences, translating signals from his brain to the vocal tract directly into words that appear as text on a screen.
The achievement, which was developed in collaboration with the first participant of a clinical research trial, builds on more than a decade of effort by UCSF neurosurgeon Edward Chang, MD, to develop a technology that allows people with paralysis to communicate even if they are unable to speak on their own. The study appears July 15 in the New England Journal of Medicine.
“To our knowledge, this is the first successful demonstration of direct decoding of full words from the brain activity of someone who is paralyzed and cannot speak,” said Chang, the Joan and Sanford Weill Chair of Neurological Surgery at UCSF, Jeanne Robertson Distinguished Professor, and senior author on the study. “It shows strong promise to restore communication by tapping into the brain’s natural speech machinery.”
Each year, thousands of people lose the ability to speak due to stroke, accident, or disease. With further development, the approach described in this study could one day enable these people to fully communicate.
Translating Brain Signals into Speech
Previously, work in the field of communication neuroprosthetics has focused on restoring communication through spelling-based approaches to type out letters one-by-one in text. Chang’s study differs from these efforts in a critical way: his team is translating signals intended to control muscles of the vocal system for speaking words, rather than signals to move the arm or hand to enable typing. Chang said this approach taps into the natural and fluid aspects of speech and promises more rapid and organic communication.
“With speech, we normally communicate information at a very high rate, up to 150 or 200 words per minute,” he said, noting that spelling-based approaches using typing, writing, and controlling a cursor are considerably slower and more laborious. “Going straight to words, as we’re doing here, has great advantages because it’s closer to how we normally speak.”
Over the past decade, Chang’s progress toward this goal was facilitated by patients at the UCSF Epilepsy Center who were undergoing neurosurgery to pinpoint the origins of their seizures using electrode arrays placed on the surface of their brains. These patients, all of whom had normal speech, volunteered to have their brain recordings analyzed for speech-related activity. Early success with these patient volunteers paved the way for the current trial in people with paralysis.
Previously, Chang and colleagues in the UCSF Weill Institute for Neurosciences mapped the cortical activity patterns associated with vocal tract movements that produce each consonant and vowel. To translate those findings into speech recognition of full words, David Moses, PhD, a postdoctoral engineer in the Chang lab and one of the lead authors of the new study, developed new methods for real-time decoding of those patterns and statistical language models to improve accuracy.
But their success in decoding speech in participants who were able to speak didn’t guarantee that the technology would work in a person whose vocal tract is paralyzed. “Our models needed to learn the mapping between complex brain activity patterns and intended speech,” said Moses. “That poses a major challenge when the participant can’t speak.”
In addition, the team didn’t know whether brain signals controlling the vocal tract would still be intact for people who haven’t been able to move their vocal muscles for many years. “The best way to find out whether this could work was to try it,” said Moses.
The First 50 Words
To investigate the potential of this technology in patients with paralysis, Chang partnered with colleague Karunesh Ganguly, MD, PhD, an associate professor of neurology, to launch a study known as “BRAVO” (Brain-Computer Interface Restoration of Arm and Voice). The first participant in the trial is a man in his late 30s who suffered a devastating brainstem stroke more than 15 years ago that severely damaged the connection between his brain and his vocal tract and limbs. Since his injury, he has had extremely limited head, neck, and limb movements, and communicates by using a pointer attached to a baseball cap to poke letters on a screen.
The participant, who asked to be referred to as BRAVO1, worked with the researchers to create a 50-word vocabulary that Chang’s team could recognize from brain activity using advanced computer algorithms. The vocabulary — which includes words such as “water,” “family,” and “good” — was sufficient to create hundreds of sentences expressing concepts applicable to BRAVO1’s daily life.
For the study, Chang surgically implanted a high-density electrode array over BRAVO1’s speech motor cortex. After the participant’s full recovery, his team recorded 22 hours of neural activity in this brain region over 48 sessions and several months. In each session, BRAVO1 attempted to say each of the 50 vocabulary words many times while the electrodes recorded brain signals from his speech cortex.
Translating Attempted Speech into Text
To translate the patterns of recorded neural activity into specific intended words, the other two lead authors of the study, Sean Metzger, MS and Jessie Liu, BS, both bioengineering doctoral students in the Chang Lab used custom neural network models, which are forms of artificial intelligence. When the participant attempted to speak, these networks distinguished subtle patterns in brain activity to detect speech attempts and identify which words he was trying to say.
To test their approach, the team first presented BRAVO1 with short sentences constructed from the 50 vocabulary words and asked him to try saying them several times. As he made his attempts, the words were decoded from his brain activity, one by one, on a screen.
Then the team switched to prompting him with questions such as “How are you today?” and “Would you like some water?” As before, BRAVO1’s attempted speech appeared on the screen. “I am very good,” and “No, I am not thirsty.”
The team found that the system was able to decode words from brain activity at rate of up to 18 words per minute with up to 93 percent accuracy (75 percent median). Contributing to the success was a language model Moses applied that implemented an “auto-correct” function, similar to what is used by consumer texting and speech recognition software.
Moses characterized the early trial results as a proof of principle. “We were thrilled to see the accurate decoding of a variety of meaningful sentences,” he said. “We’ve shown that it is actually possible to facilitate communication in this way and that it has potential for use in conversational settings.”
Looking forward, Chang and Moses said they will expand the trial to include more participants affected by severe paralysis and communication deficits. The team is currently working to increase the number of words in the available vocabulary, as well as improve the rate of speech.
Both said that while the study focused on a single participant and a limited vocabulary, those limitations don’t diminish the accomplishment. “This is an important technological milestone for a person who cannot communicate naturally,” said Moses, “and it demonstrates the potential for this approach to give a voice to people with severe paralysis and speech loss.”
Co-authors on the paper include Sean L. Metzger, MS; Jessie R. Liu; Gopala K. Anumanchipalli, PhD; Joseph G. Makin, PhD; Pengfei F. Sun, PhD; Josh Chartier, PhD; Maximilian E. Dougherty; Patricia M. Liu, MA; Gary M. Abrams, MD; and Adelyn Tu-Chan, DO, all of UCSF. Funding sources included National Institutes of Health (U01 NS098971-01), philanthropy, and a sponsored research agreement with Facebook Reality Labs (FRL), which completed in early 2021.
UCSF researchers conducted all clinical trial design, execution, data analysis and reporting. Research participant data were collected solely by UCSF, are held confidentially, and are not shared with third parties. FRL provided high-level feedback and machine learning advice.