Women's Day Special Issue | Wu Mengyue From Shanghai Jiaotong University: Using Speech Intelligence Technology to Issue the First Diagnosis and Treatment Order for Mental Illness

From the babbling of children, to the endless talk of youthful thoughts in youth, and then to middle age, we hear the care and concern of parents and elders, as well as the constant urging of life pressure... Thinking back carefully, the experiences and memories of each stage of life are lingering with different sounds. The chirping of cicadas in summer and the noise on the basketball court are youth, and the sighs in the evening and the notification sounds of mobile phone messages are growth. Just as people cannot completely shut off their hearing even if they cover their ears tightly or wear noise-canceling headphones, sounds are everywhere.
From a physics perspective, sound is a sound wave generated by the vibration of an object, which can be transmitted through the air medium. Therefore, sound can only be isolated in a completely vacuum environment. From another perspective, as a carrier of language, sound is not only an important medium for communication, but also an outlet for externalizing emotions. With the vigorous development of AI, analytical technology and related capabilities are becoming increasingly powerful. The sound waves that were originally ethereal in the air have also become data that can be studied and can even reflect the health of the human body.
In recent years, speech-based disease detection technology has begun to emerge in the fields of respiratory medicine, cardiology, gastroenterology, etc.In comparison, the use of audio in the diagnosis and treatment of mental illness is mixed:The worry is that there is a serious shortage of relevant audio data (mainly due to the high privacy nature of mental illness); the good news is that online diagnosis of the disease based on audio can help patients alleviate their sense of shame during the initial diagnosis stage and determine whether they are ill as soon as possible.
Lu Lin, an academician of the Chinese Academy of Sciences, mentioned in an interview that by the end of 2021, there were 6.6 million patients with severe mental disorders registered in the national database.The number of persons receiving treatment shall not exceed 20%.Vigorously expanding and promoting diagnostic methods based on audio analysis is of great significance in solving the low rate of treatment for mental illness.
Wu Mengyue, an associate professor and doctoral supervisor at the Department of Computer Science at Shanghai Jiao Tong University, has published a speech intelligence model for the diagnosis and treatment of mental illness.Based on a large language model, the different roles of doctors and patients were simulated, and with the joint participation of human mental illness diagnosticians, the world's first open source depression consultation dialogue dataset that meets clinical standards was constructed.

On the occasion of International Women’s Day, HyperAI had the honor of conducting an in-depth interview with Ms. Wu Mengyue, and further understood the charm of voice and its wonderful reaction with AI. At the same time, we also saw how this determined and free-spirited woman started from her interests and gradually made achievements in her professional field.
Keep doing research that can be implemented
Everyone's vocal tract, oral cavity, nasal cavity, etc. are slightly different, so people's voiceprints are as different as fingerprints and faces. Wu Mengyue has been very interested in the unique way people talk to each other since she was a child. "Knowing people by their voices" in daily life can be said to be the ladder that led her into the world of audio.
During her undergraduate studies, Wu Mengyue, who was highly sensitive and interested in sound, studied psychoacoustics at Beijing Normal University. During this period, she realized the importance of using physical acoustic characteristics to explain sound differences at a technical level.And in the final stageComputational modeling was chosen to analyze the acoustic data.

If the early attempts to conduct audio analysis research in the field of psychoacoustics during her undergraduate studies planted a seed in Wu Mengyue's heart, then the two related research projects that she came into close contact with during her doctoral studies served as a catalyst.
During her doctoral studies, one of her classmates who was conducting voice analysis research on schizophrenia came into contact with a large number of street homeless people with schizophrenia during field research. Another roommate, after graduating with a doctorate in clinical psychology, entered Melbourne Prison to conduct mental illness assessments for prisoners in order to explain whether their related criminal behaviors were affected by mental illness.
The experiences of these two classmates brought profound inspiration and influence to Wu Mengyue. After returning to China, she began to conduct more in-depth research on the diagnosis and treatment of mental illness.
She said,In the field of phonetics, speech and language are actually external manifestations of human brain functions. Therefore, whether it is emotional disorders or cognitive dysfunction, they will eventually leave clues in the expression of speech and language.In other words, relevant biomarkers can be found in the audio to effectively and conveniently screen people with mental or emotional disorders.

In a sense, when people are hesitant about whether or how to seek medical treatment, audio analysis can become the first checklist in the diagnosis of mental illness and to a certain extent weaken the patient's sense of shame.
All along,Wu Mengyue insists on "doing practical research".After graduating with a Ph.D., Wu Mengyue received an invitation from Nuance Communications, an AI speech recognition company, to engage in technical research and implementation of cockpit human-computer interaction in the industry. Later, in a conversation with Yu Kai, a professor at the Department of Computer Science and Engineering of Shanghai Jiao Tong University, she saw the scientific research advantages and new ideas for the transformation of research results in universities through Professor Yu Kai's experience, and then returned to academia from the industry.
Wu Mengyue said frankly,Whether in colleges and universities or in industry, the original intention of "conducting practical research" has never changed.During the epidemic, its research team developed a practical depression consultation app based on actual needs, which students can use directly.
Enrich audio analysis and overcome data shortage issues
After returning to university, Wu Mengyue still chose the research direction of audio analysis and incorporated more AI technologies into it.At present, the main research direction of his research group is Rich Audio Analysis, which is generally speaking all audio processing except speech recognition.
Wu Mengyue introduced that sound can be divided into three levels:One is what people say.This is the research focus of speech recognition;The second is how people speak.That is, the same sentence can be expressed in many different ways, with different meanings behind it, which can be used to detect a person's mental state or cognitive function;The third is the understanding of environmental audio.This is also the key to making machines more like humans in audio understanding. The above constitutes a rich audio analysis, among which speech recognition is relatively mature, so her research focus is on the latter two.
Currently, Wu Mengyue's research group has more than 20 students, conducting relevant research in these two directions - computational psychiatry and pathological speech research in audio understanding.

In terms of application scenarios, for example, in a driving environment, voice recognition refers to the interactive system passively recognizing commands and executing corresponding controls. If active interaction can be achieved, the driver's mood or fatigue can be judged based on his tone of voice, and then the mood can be adjusted through the lights or sound effects in the car. At the same time, when the machine senses that the user's tone is not good, it can also synchronously adjust the strategy and thinking when replying to commands.
For example, during the epidemic, microphones were used to collect the ambient sounds of opening and closing doors, and the away/at home status was analyzed and determined. Compared with traditional camera surveillance, this method is more helpful in protecting people's privacy and safety.
In the conventional impression, audio data should be extremely large and rich in sample size, but in fact, when it comes to disease diagnosis and treatment, especially in the field of mental illness, data becomes a major challenge. On the one hand, the privacy of doctors and patients and the stigma of patients make it extremely difficult to obtain audio of psychological counseling conversations; on the other hand, some hospitals or doctors may record consultations in the form of conversations, but no standardized data has been formed, and the audio quality is often not high. In addition, due to privacy protection, it is often impossible to share it with the outside world.
to this end,Wu Mengyue led the research team to build the world's first open source depression consultation dialogue dataset that meets clinical standards.First, we had long and in-depth communication with doctors and patients at the mental health center to settle the consultation process and dialogue points, and then organized the relevant content into a decision tree structure dialogue process, and repeatedly deliberated and adjusted it with professional doctors. Secondly, we also simulated the doctor-patient dialogue in the form of role-playing, and finally asked professional doctors to screen the obtained data, and obtained data that was closer to clinical consultation, thus forming this open source data set.
Dataset link:https://x-lance.github.io/D4/
Similar to other scholars engaged in AI for Science research, Wu Mengyue's background in the interdisciplinary field of psychology and computer science enables her to address the actual pain points of current patients in the process of promoting AI-enabled diagnosis and treatment of mental illness, and to flexibly adjust research strategies in the form of simulated data when research encounters challenges. Interdisciplinary backgrounds can often bring innovative breakthroughs to the field of scientific research in a more creative way.

Driven by interest, you can achieve great things
During the interview with Wu Mengyue, I heard the word "interest" several times - focusing on audio research is based on interest; she is interested in research related to the diagnosis of mental illness; switching to the computer department is not to keep up with the trend, but is based on her own interest; and she also hopes to prioritize the interests of the students in the research group...
It is undeniable that whether it is rigorous academic research or fast-paced workplace work, "interest" is a piece of soil with more nutrients. If it is sown in childhood, the driving force for upward growth is also stronger. While Professor Wu Mengyue is based on her interests, she is not slacking off even though she is "Buddhist". Whether it is the scientific research accumulation in the frequency of paper publications or the practical application of the integration of production and research, it is a strong proof of her practice of "doing practical research".

In recent years, more and more female forces have become active in the fields of science and technology, and scientific research, bringing revolutionary innovations that have amazed the world. The United Nations Women's Fund has also set the theme of this year's International Women's Day as "Investing in Women: Accelerating Progress", which to a certain extent highlights the important role of women in the social process.
Although I don’t want to emphasize the differences between the sexes, pressure does exist in the real social environment. However, as Wu Mengyue said, “We should focus on happiness and start from our interests.” Especially when the outside world does not have too high expectations for women, it actually means that there are not too many restrictions, which may provide growth space for accumulating strength and choosing the right time to explode.
Finally, on this special day, I wish all women can be like Teacher Wu Mengyue, grow in their interests, absorb nutrients confidently, and live a more wonderful and free life!