HyperAI超神经

Long Context Understanding On Ada Leval

评估指标

12k
16k
1k
2k
4k
6k
8k

评测结果

各个模型在此基准测试上的表现结果

模型名称
12k
16k
1k
2k
4k
6k
8k
Paper TitleRepository
Vicuna-7b-v1.5-16k1.91.037.011.15.83.21.8Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Claude-212.011.065.043.523.515.017.0--
LongChat-7b-v1.5-32k1.60.832.410.75.73.11.9Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Vicuna-13b-v1.5-16k1.40.953.429.213.14.32.2Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
GPT-3.5-Turbo-11062.52.561.548.541.529.517.0--
ChatGLM3-6b-32k0.90.539.818.89.05.03.4GLM-130B: An Open Bilingual Pre-trained Model
ChatGLM2-6b-32k0.00.331.210.94.51.61.6GLM-130B: An Open Bilingual Pre-trained Model
InternLM2-7b2.00.858.649.533.912.313.4InternLM2 Technical Report
GPT-4-Turbo-012552.044.573.573.565.563.056.5GPT-4 Technical Report
GPT-4-Turbo-110649.544.074.073.567.559.553.5GPT-4 Technical Report
0 of 10 row(s) selected.