HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Yuhao Zhang Yuhao Du Zhanchen Dai Xiangnan Ma Kaiqi Kou Benyou Wang Haizhou Li

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for
  Speech-to-Speech LLMs

Abstract

Speech-to-speech large language models (SLLMs) are attracting increasingattention. Derived from text-based large language models (LLMs), SLLMs oftenexhibit degradation in knowledge and reasoning capabilities. We hypothesizethat this limitation arises because current training paradigms for SLLMs failto bridge the acoustic-semantic gap in the feature representation space. Toaddress this issue, we propose EchoX, which leverages semantic representationsand dynamically generates speech training targets. This approach integratesboth acoustic and semantic learning, enabling EchoX to preserve strongreasoning abilities as a speech LLM. Experimental results demonstrate thatEchoX, with about six thousand hours of training data, achieves advancedperformance on multiple knowledge-based question-answering benchmarks. Theproject is available at https://github.com/FreedomIntelligence/EchoX.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs | Papers | HyperAI