HyperAI超神经

Text To Sql On Bird Big Bench For Large Scale

评估指标

Execution Accuracy % (Dev)
Execution Accuracy % (Test)

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称Execution Accuracy % (Dev)Execution Accuracy % (Test)
模型 168.1270.21
msc-sql-multi-sample-critiquing-small65.6-
模型 359.7160.71
模型 458.4760.37
模型 562.9764.51
text-to-sql-empowered-by-large-language54.7657.41
can-llm-already-serve-as-a-database-interface37.2239.30
模型 855.4863.39
模型 972.4373.17
can-llm-already-serve-as-a-database-interface--
模型 1155.4863.39
can-llms-effectively-leverage-structural42.7049.02
模型 1364.7365.23
模型 1467.9966.21
模型 1565.4568.87
模型 1663.3665.45
can-llm-already-serve-as-a-database-interface34.3536.47
chase-sql-multi-path-reasoning-and-preference73.1474.06
xiyan-sql-a-multi-generator-ensemble73.3475.63
模型 2069.372.28
chess-contextual-harnessing-for-efficient-sql6566.69
can-llms-effectively-leverage-structural46.3554.89
模型 2360.564.84
模型 2462.5863.22
模型 2557.1759.25
模型 2658.562.66
mac-sql-multi-agent-collaboration-for-text-to57.5659.59
模型 2866.8264.00
模型 2965.3867.86
模型 3064.62-
knowledge-to-sql-enhancing-sql-generation48.92-
the-death-of-schema-linking-text-to-sql-in67.2171.83
can-llm-already-serve-as-a-database-interface36.6440.08
模型 3437.6847.74
din-sql-decomposed-in-context-learning-of-150.7255.90
模型 3666.9569.03
模型 3772.1670.26
can-llm-already-serve-as-a-database-interface27.3833.04
模型 3974.3274.12
模型 4061.3464.95