HyperAI
14 hours ago

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Chengqian Ma, Wei Tao, Yiwen Guo
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring
  Challenges in Complex Conversations
Abstract

Spoken Dialogue Models (SDMs) have recently attracted significant attentionfor their ability to generate voice responses directly to users' spokenqueries. Despite their increasing popularity, there exists a gap in researchfocused on comprehensively understanding their practical effectiveness incomprehending and emulating human conversations. This is especially truecompared to text-based Large Language Models (LLMs), which benefit fromextensive benchmarking. Human voice interactions are inherently more complexthan text due to characteristics unique to spoken dialogue. Ambiguity poses onechallenge, stemming from semantic factors like polysemy, as well asphonological aspects such as heterograph, heteronyms, and stress patterns.Additionally, context-dependency, like omission, coreference, and multi-turninteraction, adds further complexity to human conversational dynamics. Toilluminate the current state of SDM development and to address thesechallenges, we present a benchmark dataset in this paper, which comprises 1,079instances in English and Chinese. Accompanied by an LLM-based evaluation methodthat closely aligns with human judgment, this dataset facilitates acomprehensive exploration of the performance of SDMs in tackling thesepractical challenges.