HyperAI

Long Context Understanding On Ada Leval

Métriques

12k
16k
1k
2k
4k
6k
8k

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèle12k16k1k2k4k6k8k
judging-llm-as-a-judge-with-mt-bench-and-11.91.037.011.15.83.21.8
Modèle 212.011.065.043.523.515.017.0
judging-llm-as-a-judge-with-mt-bench-and-11.60.832.410.75.73.11.9
judging-llm-as-a-judge-with-mt-bench-and-11.40.953.429.213.14.32.2
Modèle 52.52.561.548.541.529.517.0
glm-130b-an-open-bilingual-pre-trained-model0.90.539.818.89.05.03.4
glm-130b-an-open-bilingual-pre-trained-model0.00.331.210.94.51.61.6
internlm2-technical-report2.00.858.649.533.912.313.4
gpt-4-technical-report-152.044.573.573.565.563.056.5
gpt-4-technical-report-149.544.074.073.567.559.553.5