HyperAI

Long Context Understanding On Ada Leval

Metrics

12k
16k
1k
2k
4k
6k
8k

Results

Performance results of various models on this benchmark

Comparison Table
Model Name12k16k1k2k4k6k8k
judging-llm-as-a-judge-with-mt-bench-and-11.91.037.011.15.83.21.8
Model 212.011.065.043.523.515.017.0
judging-llm-as-a-judge-with-mt-bench-and-11.60.832.410.75.73.11.9
judging-llm-as-a-judge-with-mt-bench-and-11.40.953.429.213.14.32.2
Model 52.52.561.548.541.529.517.0
glm-130b-an-open-bilingual-pre-trained-model0.90.539.818.89.05.03.4
glm-130b-an-open-bilingual-pre-trained-model0.00.331.210.94.51.61.6
internlm2-technical-report2.00.858.649.533.912.313.4
gpt-4-technical-report-152.044.573.573.565.563.056.5
gpt-4-technical-report-149.544.074.073.567.559.553.5