Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

Recent advancements in large language models (LLMs) have significantlyadvanced text-to-SQL systems. However, most LLM-based methods often narrowlyfocus on SQL generation, neglecting the complexities of real-worldconversational queries. This oversight can lead to unreliable responses,particularly for ambiguous questions that cannot be directly addressed withSQL. To bridge this gap, we propose MMSQL, a comprehensive test suite designedto evaluate the question classification and SQL generation capabilities of LLMsby simulating real-world scenarios with diverse question types and multi-turnQ&A interactions. Using MMSQL, we assessed the performance of popular LLMs,including both open-source and closed-source models, and identified key factorsimpacting their performance in such scenarios. Moreover, we introduce anLLM-based multi-agent framework that employs specialized agents to identifyquestion types and determine appropriate answering strategies. Our experimentsdemonstrate that this approach significantly enhances the model's ability tonavigate the complexities of conversational dynamics, effectively handling thediverse and complex nature of user queries. Our dataset and code are publiclyavailable at https://mcxiaoxiao.github.io/MMSQL.