Google's Voice Recognition Tool First Benefited Its Own Employees

Recently, the Google Brain team released the Parrotron project to help people and devices understand people with speech disorders more accurately. Parrotron starts with audio analysis and solves the problem from the perspective of speech signals. It uses a single end-to-end deep neural network training to directly convert the speech from people with speech disorders into fluent synthetic speech, thereby helping them solve communication problems.
Dimitri Kanevsky was born in Russia in the 1950s. He grew up during the Sino-Soviet Cold War, but he still completed his studies and obtained a doctorate in mathematics.
His study and work journey started in Russia, and he moved to Israel and Germany. He finally chose to stay in the United States and became a research scientist at Google, focusing on the field of speech recognition algorithms.
It seems to be the life path of an academic elite: receiving a good education, obtaining a US green card, a glamorous job, 152 US scientific and technological patents, and finally reaching the pinnacle of life in Silicon Valley.

The story is far from simple. Dimitri Kanevsky is not an ordinary person. Most people would hardly imagine that he is a member of the hearing-impaired community.
Dimitri Kanevsky became deaf due to medication when he was one year old, but his family still chose a normal education for him. He started learning lip reading and pronunciation since he was a child and has been studying in ordinary schools. In his teens, he began to learn English with the help of Russian pronunciation.
However, when learning English, he had great difficulties in language communication due to hearing impairment and the difference in Russian pronunciation. His sentences were vague and often incomprehensible to others. Even his verbal care for his family members might not be conveyed.
Simply put, most people find it difficult to understand the English he speaks directly. In order to solve his own problem and help more people facing similar problems, Dimitri Kanevsky has been working on the topic of speech recognition.

In medicine, this condition of unclear speech is called "dysarthria".According to statistics,As many as one million people around the world suffer from dysarthria due to physical illness.
Dysarthria is a speech disorder caused by neuropathy, paralysis of speech-related muscles, weakened contraction force or incoordination of movements, commonly known as "slurred speech".
For example, stroke, cerebral palsy, Parkinson's disease, Down syndrome, ALS (amyotrophic lateral sclerosis) and many other diseases can cause this condition.

Also at Google, a brand marketing manager named Aubrie Lee was diagnosed with a rare muscular dystrophy (ALS), which caused her to spend a lot of time in a wheelchair.
The continuous loss of muscles throughout her body also caused her difficulty in communication. Aubrie had great difficulty in hearing and pronunciation, and was often misunderstood because she could not smile. In addition, she had multiple accents and her pronunciation was not clear, so the other party often could not understand her during the conversation.
In order to help colleagues like Dimitri Kanevsky and Aubrie Lee solve their language problems, articulation difficulties have gradually become a scientific research direction of the Google AI research team.
Caring for people with language barriers,Google launches breakthrough tool
A few years ago, Kanevsky joined Google's AI research group with 30 years of experience in speech recognition. At that time, there was no convenient tool for him to communicate with others normally. Every time he had a meeting, Kanevsky needed to book the CART service in advance and rely on the captioner to enter the meeting and type the voice information on the screen for the conversation.
Similarly, Aubrie and his colleagues also need to spend a lot of effort to complete work communication that ordinary people can easily do. But this dilemma is slowly becoming history.
In February 2019, Google launched an app——Live Transcribe , has brought the dawn of portable speech conversion. It is an application that instantly transcribes real-world speech, using the phone's built-in microphone to convert speech into text that is displayed in real time.
Then, at the Google I/O conference in May, Project Euphoria It was proposed that this program provide a speech-to-text solution for people with language impairment caused by ALS.

This month, Google launched a new AI tool, Parrotron, that can directly convert vague sounds into standard synthetic sounds.This takes the technology of overcoming language barriers a step further.
Parrotron byEnd-to-end deep neural networkIt starts from the perspective of audio analysis. When in use, the tester speaks to a mobile phone or other device and can quickly get the standard pronunciation after retelling.
In the paper "Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation"In the study, Parrotrn performed well, with new breakthroughs in the accuracy of speech recognition and conversion.
Paper address: https://arxiv.org/abs/1904.04169
Parrotron: Translating vague words into clear speech
So how is this seemingly high-tech technology achieved?
Parrotron is an end-to-end sequence-to-sequence model trained using a parallel corpus of input/output speech pairs to map ambiguous speech to normal sentences..

The network model consists of an encoder and decoder with an attention mechanism, and finally a vocoder synthesizes the time domain waveform to provide a predicted audio signal.
The encoder converts a sequence of acoustic frames into a hidden feature representation, and the decoder parses the representation to predict the spectrogram.
The operation is divided into two steps:First, we build a speech-to-speech model for standard smooth audio, and then adjust the model parameters to take fuzzy speech as input so that the model can learn to distinguish and recognize.

In order to simulate the speech characteristics of ALS patients, they used the ALS speech corpus from Project Euphonia and created ambiguous sentences by synthesizing language as training data.
For specific individuals, the recorded material is provided by the individual himself.
After training,Transformation model can eliminate interference factors in language, such as the effects of stress, rhythm, and background noise;At the same time, ignore all non-verbal interference, including speaker characteristics, environmental factors, speaking style, and only analyzing and processing the content of the conversation.
Parrontron's first two testers: no suspense
To verify the actual effect of Parrotron, we naturally have to see how it performs in practice. The best candidates for testing are undoubtedly Dimitri Kanevsky and Aubrie Lee.
In the experiment, Dimitri recorded a 15-hour long corpus and let the model learn the subtleties of his speech. Through learning, the model was able to accurately translate the finalThe translation error rate in the test set has been reduced from 89% at the beginning to 32% .
In other words, using the speech transcribed by Parrotron, the other party or the ASR (speech recognition) system can easily understand him.
Details of Kanevsky's use of Parrotron
Later, Aubrie Lee also conducted a test.Through the 1.5 hours of speech she contributed, the model translated the accurate speech, allowing her to express herself clearly..
AI for Social Good: The mission of artificial intelligence
Barrier-free projects created by artificial intelligence have been frequently proposed in recent years. Many caring technologies have emerged, trying to help people with disabilities open new doors.
Of course, while technology is serving these people, it is also driven by these special groups. For example, Dimitri Kanevsky, who is well aware of the difficulties caused by dysarthria, has been committed to research in speech recognition and communication. And Aubrie Lee, with her enthusiastic and vigorous attitude towards life, encourages and urges more research investment in people with disabilities.

Although the current data shows that the situation is not optimistic:Only one in ten people with disabilities worldwide have access to technology toolsBut thankfully, a lot of that is changing, with some promising progress.
As a technology giant, Google is still implementing their AI for social goodplans, and tools such as Parrotron are probably the steps towards that beautiful vision.
At a time when artificial intelligence technology is sweeping the world, we have seen AI’s transformation and creativity in art, and its positive promotion of social life. But we have also seen some people use AI to maliciously change faces, splice together images, and create things out of nothing.
I hope AI can return to its original scientific purpose, help more people in need, and make the world a better place!
-- over--