HyperAIHyperAI
2 months ago

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

Gao, Yingqi ; Liu, Yifu ; Li, Xiaoxia ; Shi, Xiaorong ; Zhu, Yin ; Wang, Yiming ; Li, Shiqi ; Li, Wei ; Hong, Yuntao ; Luo, Zhiling ; Gao, Jinyang ; Mou, Liyu ; Li, Yu
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for
  Text-to-SQL
Abstract

To tackle the challenges of large language model performance in naturallanguage to SQL tasks, we introduce XiYan-SQL, an innovative framework thatemploys a multi-generator ensemble strategy to improve candidate generation. Weintroduce M-Schema, a semi-structured schema representation method designed toenhance the understanding of database structures. To enhance the quality anddiversity of generated candidate SQL queries, XiYan-SQL integrates thesignificant potential of in-context learning (ICL) with the precise control ofsupervised fine-tuning. On one hand, we propose a series of training strategiesto fine-tune models to generate high-quality candidates with diversepreferences. On the other hand, we implement the ICL approach with an exampleselection method based on named entity recognition to prevent overemphasis onentities. The refiner optimizes each candidate by correcting logical orsyntactical errors. To address the challenge of identifying the best candidate,we fine-tune a selection model to distinguish nuances of candidate SQL queries.The experimental results on multiple dialect datasets demonstrate therobustness of XiYan-SQL in addressing challenges across different scenarios.Overall, our proposed XiYan-SQL achieves the state-of-the-art executionaccuracy of 75.63% on Bird benchmark, 89.65% on the Spider test set, 69.86% onSQL-Eval, 41.20% on NL2GQL. The proposed framework not only enhances thequality and diversity of SQL queries but also outperforms previous methods.

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL | Latest Papers | HyperAI