HyperAIHyperAI
2 months ago

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Lei, Fangyu ; Chen, Jixuan ; Ye, Yuxiao ; Cao, Ruisheng ; Shin, Dongchan ; Su, Hongjin ; Suo, Zhaoqing ; Gao, Hongcheng ; Hu, Wenjing ; Yin, Pengcheng ; Zhong, Victor ; Xiong, Caiming ; Sun, Ruoxi ; Liu, Qian ; Wang, Sida ; Yu, Tao
Spider 2.0: Evaluating Language Models on Real-World Enterprise
  Text-to-SQL Workflows
Abstract

Real-world enterprise text-to-SQL workflows often involve complex cloud orlocal data across various database systems, multiple SQL queries in variousdialects, and diverse operations from data transformation to analytics. Weintroduce Spider 2.0, an evaluation framework comprising 632 real-worldtext-to-SQL workflow problems derived from enterprise-level database use cases.The databases in Spider 2.0 are sourced from real data applications, oftencontaining over 1,000 columns and stored in local or cloud database systemssuch as BigQuery and Snowflake. We show that solving problems in Spider 2.0frequently requires understanding and searching through database metadata,dialect documentation, and even project-level codebases. This challenge callsfor models to interact with complex SQL workflow environments, processextremely long contexts, perform intricate reasoning, and generate multiple SQLqueries with diverse operations, often exceeding 100 lines, which goes farbeyond traditional text-to-SQL challenges. Our evaluations indicate that basedon o1-preview, our code agent framework successfully solves only 21.3% of thetasks, compared with 91.2% on Spider 1.0 and 73.0% on BIRD. Our results onSpider 2.0 show that while language models have demonstrated remarkableperformance in code generation -- especially in prior text-to-SQL benchmarks --they require significant improvement in order to achieve adequate performancefor real-world enterprise usage. Progress on Spider 2.0 represents crucialsteps towards developing intelligent, autonomous, code agents for real-worldenterprise settings. Our code, baseline models, and data are available athttps://spider2-sql.github.io

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows | Latest Papers | HyperAI