CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

In tackling the challenges of large language model (LLM) performance forText-to-SQL tasks, we introduce CHASE-SQL, a new framework that employsinnovative strategies, using test-time compute in multi-agent modeling toimprove candidate generation and selection. CHASE-SQL leverages LLMs' intrinsicknowledge to generate diverse and high-quality SQL candidates using differentLLM generators with: (1) a divide-and-conquer method that decomposes complexqueries into manageable sub-queries in a single LLM call; (2) chain-of-thoughtreasoning based on query execution plans, reflecting the steps a databaseengine takes during execution; and (3) a unique instance-aware syntheticexample generation technique, which offers specific few-shot demonstrationstailored to test questions.To identify the best candidate, a selection agent isemployed to rank the candidates through pairwise comparisons with a fine-tunedbinary-candidates selection LLM. This selection approach has been demonstratedto be more robust over alternatives. The proposed generators-selector frameworknot only enhances the quality and diversity of SQL queries but also outperformsprevious methods. Overall, our proposed CHASE-SQL achieves the state-of-the-artexecution accuracy of 73.0% and 73.01% on the test set and development set ofthe notable BIRD Text-to-SQL dataset benchmark, rendering CHASE-SQL the topsubmission of the leaderboard (at the time of paper submission).