HyperAI

Long Context Understanding On Ada Leval

Métriques

12k
16k
1k
2k
4k
6k
8k

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
12k
16k
1k
2k
4k
6k
8k
Paper TitleRepository
Vicuna-7b-v1.5-16k1.91.037.011.15.83.21.8Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Claude-212.011.065.043.523.515.017.0--
LongChat-7b-v1.5-32k1.60.832.410.75.73.11.9Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Vicuna-13b-v1.5-16k1.40.953.429.213.14.32.2Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
GPT-3.5-Turbo-11062.52.561.548.541.529.517.0--
ChatGLM3-6b-32k0.90.539.818.89.05.03.4GLM-130B: An Open Bilingual Pre-trained Model
ChatGLM2-6b-32k0.00.331.210.94.51.61.6GLM-130B: An Open Bilingual Pre-trained Model
InternLM2-7b2.00.858.649.533.912.313.4InternLM2 Technical Report
GPT-4-Turbo-012552.044.573.573.565.563.056.5GPT-4 Technical Report
GPT-4-Turbo-110649.544.074.073.567.559.553.5GPT-4 Technical Report
0 of 10 row(s) selected.