Using brain scan technology, artificial intelligence, and speech synthesizers, scientists have transformed brain patterns into comprehensible verbal speech – a plan that could ultimately give voice to those who are not.
It's a shame that Stephen Hawking is not alive when he sees it because he may have jumped from him. The new speech system developed by researchers at the Neural Acoustics Laboratory at Columbia University in New York is something that could benefit the later physicist.
Hawking had amyotrophic lateral sclerosis (ALS), a motor neuron disease that carried his speech but continued communication with a computer and a speech synthesizer. Using the face switch attached to his glasses, Hawking was able to pre-select words on a computer that was read by the voice synthesizer. It was a bit tedious, but it allowed Hawking to produce about a dozen words per minute.
Imagine, though, that if Hawking does not have to manually select and run words. Some individuals, whether they have ALS, a syndrome blocked or recovered from a stroke, may not have the motor skills needed to control the computer, even by facial enhancement. Ideally, an artificial voice system would capture the ideas of the individual directly to produce speech, thus eliminating the need to control the computer.
A new research published today in the Scientific Experiments makes us an important step towards this goal, but instead of capturing the inner thoughts of a person to reconstruct speech, he uses the brain created by listening to speech.
To create such a speech neurosystem, neuroscientists Nima Mesgarani and his colleagues combined the recent advances in deep learning with speech synthesis technologies. Their resulting brain and computer interface, though still rudimentary, captured the brain patterns directly from the auditory cortex, which were then decoded with AI volume vocoder or speech synthesizer to produce comprehensible speech. The speech was very robotic, but almost three out of four listeners were able to recognize the content. It is an exciting progress that could eventually help people who have lost their ability to speak.
To be clear, Mesgarani's neurogenetic device does not translate the hidden speech of an individual – thoughts in our heads, also called imaginary speech – into words. Unfortunately we are not yet in science. Instead, the system captured the individual cognitive responses of an individual when they listened to the recordings of the people who are speaking. The deep neural network then decoded or translates these patterns, allowing the system to reconstruct speech.
"This study continues the recent developments in the application of deep learning practices for decoding neural signals," said Andrew Jackson, a neural interfacing professor at Newcastle University, who did not participate in the new study. Gizmodo. "In this case, neural signals are recorded from the human brain surface during epilepsy, and listeners listen to different words and sentences that read actors, and neural networks are trained to learn the relationship between brain signals and sounds, and as a result they can reconstruct comprehensible word / sentences based only on brain signals. "
Patients with epilepsy have been selected for study because they often have to undergo brain surgery. Megarani, with the help of Ashesh Dinesh Mehta, neurosurgeon at the Northwell Health Physician Partners Neuroscience Institute and co-author of a new study, accepted five volunteers into the experiment. The team used invasive electro-morphology (ECoG) to measure neural activity as patients listened to continuous speech sound. Patients listened, for example, to speakers who recite numbers from zero to nine. Their brain patterns were then fed into the AI-supported vocoder, leading to synthesized speech.
The results were very robotic, but quite understandable. During testing, listeners could correctly identify spoken digits about 75% of the time. They could even find out if the speaker was a man or woman. It's not bad and the result that even came as "surprise" to Mesgaran, as he said Gizmodo in email.
You can find speech synthesizer records here (scientists tested different techniques, but the best result came from a combination of deep neural networks with a vokodor).
Using Voice Synthesizer in this context, unlike a system that can respond and recit pre-recorded words, has been important to Mesgarani. How he explained it Gizmodo, there is more speech than just the right word.
"Since the goal of this work is to restore speech communication to those who have lost the ability to speak, we have focused on learning from the brain signal for the sound of speech itself," he said Gizmodo. "It is also possible to decode phonemes [distinct units of sound] or words, however, speech has much more information than just content – such as a speaker [with their distinct voice and style], intonation, emotional tone, and so on. Therefore, our goal in this particular document is to restore the sound itself. "
Looking at the future, Mesgarani would like to synthesise more complex words and sentences, and to gather signals from the brains of people who simply think or imagine what they are talking about.
Jackson was impressed by a new study but said it was still unclear whether this approach would apply directly to the brain and computer interface.
"In the post, the decoded signals reflect the true words that the brain hears." To be useful, the communications device would have to decode the words the user would introduce, "said Jackson Gizmodo. "Although there are often overlapping brain areas that are involved in hearing, speaking and speaking, we still do not know exactly how similar brain signals will look like."
William Tatum, a neurologist at Mayo Clinic who also did not participate in a new study, said research is important in being the first to use artificial intelligence to reconstruct brainwave speech associated with generating known acoustic stimuli. The importance is remarkable, "because it has advanced in the application of deep learning in the new generation of better-designed speech systems," he said Gizmodo. This means that the sample size of participants is too small and that the use of data extracted directly from the human brain during surgery is not ideal.
Another limitation of the study is that neural networks should do more for them than just reproduce words from zero to nine and need to be trained on a large number of brain signals from each participant. The system is patient-specific because we all produce different brain patterns when listening to speech.
"In the future it will be interesting to see how well decoders are trained for one person generalized for other people," Jackson said. "It's a bit like speech recognition systems that need to be individually trained by the user, unlike today's technologies such as Siri and Alex, which make sense to someone's voice and use neural networks again, only time will show if these technologies can one day do the same for brain signals. "
There is no doubt that there is still a lot of work to do. The new contribution, however, is an encouraging step towards achieving implantable speech neurosciences.[Scientific Reports]