HyperAI
Back to Headlines

Japanese AI System J-Moshi Mimics Natural Conversation Patterns, Enabling Simultaneous Speaking and Listening

3 days ago

Researchers at Nagoya University's Higashinaka Laboratory have developed J-Moshi, the first publicly available AI system designed to mimic natural Japanese conversational patterns, including the short verbal responses known as "aizuchi." These interjections, like "Sou desu ne" (that's right) and "Naruhodo" (I see), are integral to showing active engagement in Japanese conversations. Traditional AI systems struggle with simultaneous speaking and listening, but J-Moshi has overcome this challenge, making it particularly well-suited for Japanese dialogue. J-Moshi was built by adapting the English-language Moshi model from Kyutai, a non-profit laboratory. The development process, which took around four months, involved extensive training using Japanese speech datasets. The primary dataset came from J-CHAT, the largest public Japanese dialogue dataset, containing approximately 67,000 hours of audio from podcasts and YouTube. Smaller, high-quality datasets were also used, some collected within the lab and others dating back 20 to 30 years. To expand the training data, researchers converted written chat conversations into artificial speech using custom-developed text-to-speech programs. In January 2024, J-Moshi gained widespread attention when demonstration videos went viral on social media. Its natural conversational flow impressed many Japanese speakers, highlighting its significance not only in technical innovation but also in practical applications. For instance, J-Moshi can aid non-native speakers in practicing and understanding natural Japanese conversation patterns. The research team, led by Professor Ryuichiro Higashinaka, sees potential uses in call centers, healthcare settings, and customer service, where the system can handle routine interactions and seamlessly hand off more complex queries to human operators. Professor Higashinaka, who has a background in corporate AI research at NTT Corporation, established the Higashinaka Lab at Nagoya University's Graduate School of Informatics in 2020. His team, consisting of 20 members, addresses challenges such as limited Japanese speech data and privacy concerns. To overcome these issues, they developed programs to separate mixed voices in podcast recordings into individual speaker tracks. Despite its advancements, J-Moshi still faces limitations in handling complex social situations and physical environment factors. For example, visual obstacles like masks or hats can hinder the AI's performance by obscuring important visual cues such as facial expressions. To mitigate these challenges, the lab is working on dialog summarization and breakdown detection systems to alert human operators to potential problems and facilitate quick intervention. Beyond J-Moshi, the Higashinaka Lab explores various human-robot interaction techniques, including developing robots that coordinate speech, gestures, and movements for natural communication. Collaborating with Unitree Robotics, they aim to create systems that can collaborate seamlessly with humans. These advancements are part of a broader national Cabinet Office Moonshot Project aimed at enhancing service quality through advanced AI-human collaboration. The research on J-Moshi has been accepted for publication at Interspeech, the largest international conference on speech technology, and will be presented in Rotterdam, The Netherlands, in August 2025. Professor Higashinaka envisions a future where AI systems can interact naturally and effectively with humans, contributing to a transformative society. Industry experts praise J-Moshi for its innovative approach to natural language processing in Japanese. The breakthrough could enhance AI adoption in Japan, where cultural nuances play a crucial role in communication. As Professor Higashinaka emphasizes, the technology's development is crucial not only for improving conversational AI but also for fostering better human-AI collaboration in various fields. The lab's focus on integrating AI with human operators positions them as leaders in creating hybrid systems that balance automation with human oversight, addressing practical limitations and ensuring robust applications.

Related Links