HyperAI초신경

Long Context Understanding On Ada Leval Tsort

평가 지표

128k
16k
2k
32k
4k
64k
8k

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름128k16k2k32k4k64k8k
gpt-4-technical-report-12.05.515.52.016.54.08.5
모델 2-5.54.0-4.5-4.5
glm-130b-an-open-bilingual-pre-trained-model-0.90.9-0.2-0.7
glm-130b-an-open-bilingual-pre-trained-model-0.72.3-2.4-2.0
judging-llm-as-a-judge-with-mt-bench-and-1-2.55.3-5.0-3.1
judging-llm-as-a-judge-with-mt-bench-and-1-1.75.3-2.2-2.3
모델 7-3.05.00.05.00.04.5
judging-llm-as-a-judge-with-mt-bench-and-1-3.15.4-5.0-2.4
gpt-4-technical-report-16.03.518.56.015.56.07.5
internlm2-technical-report-4.35.1-3.9-5.1