HyperAI초신경

Text To Sql On Bird Big Bench For Large Scale

평가 지표

Execution Accuracy % (Dev)
Execution Accuracy % (Test)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Execution Accuracy % (Dev)
Execution Accuracy % (Test)
Paper TitleRepository
PURPLE + RED + GPT-4o68.1270.21--
MSc-SQL65.6-MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
Dubo-SQL, v159.7160.71--
SFT CodeS-15B58.4760.37--
PURPLE + GPT-4o62.9764.51--
DAIL-SQL + GPT-454.7657.41Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
ChatGPT (Baseline)37.2239.30Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs-
SENSE55.4863.39--
ExSL + granite-34b-code72.4373.17--
Human Performance--Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs-
SENSE-13B55.4863.39--
Claude-2 (Baseline)42.7049.02Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?
SCL-SQL64.7365.23--
Arcwise + GPT-4o67.9966.21--
ByteBrain65.4568.87--
MCS-SQL + GPT-463.3665.45--
Codex (Baseline)34.3536.47Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs-
CHASE-SQL + Gemini73.1474.06CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL-
XiYan-SQL73.3475.63A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL
OpenSearch-SQL+ v2 + GPT-4o69.372.28--
0 of 40 row(s) selected.