HyperAI초신경

Long Context Understanding On Ada Leval Tsort

평가 지표

128k
16k
2k
32k
4k
64k
8k

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
128k
16k
2k
32k
4k
64k
8k
Paper TitleRepository
GPT-4-Turbo-01252.05.515.52.016.54.08.5GPT-4 Technical Report
GPT-3.5-Turbo-1106-5.54.0-4.5-4.5--
ChatGLM2-6b-32k-0.90.9-0.2-0.7GLM-130B: An Open Bilingual Pre-trained Model
ChatGLM3-6b-32k-0.72.3-2.4-2.0GLM-130B: An Open Bilingual Pre-trained Model
LongChat-7b-v1.5-32k-2.55.3-5.0-3.1Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Vicuna-7b-v1.5-16k-1.75.3-2.2-2.3Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Claude-2-3.05.00.05.00.04.5--
Vicuna-13b-v1.5-16k-3.15.4-5.0-2.4Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
GPT-4-Turbo-11066.03.518.56.015.56.07.5GPT-4 Technical Report
InternLM2-7b-4.35.1-3.9-5.1InternLM2 Technical Report
0 of 10 row(s) selected.