HyperAI超神经

Long Context Understanding On Mmneedle

评估指标

1 Image, 2*2 Stitching, Exact Accuracy
1 Image, 4*4 Stitching, Exact Accuracy
1 Image, 8*8 Stitching, Exact Accuracy
10 Images, 1*1 Stitching, Exact Accuracy
10 Images, 2*2 Stitching, Exact Accuracy
10 Images, 4*4 Stitching, Exact Accuracy
10 Images, 8*8 Stitching, Exact Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
1 Image, 2*2 Stitching, Exact Accuracy
1 Image, 4*4 Stitching, Exact Accuracy
1 Image, 8*8 Stitching, Exact Accuracy
10 Images, 1*1 Stitching, Exact Accuracy
10 Images, 2*2 Stitching, Exact Accuracy
10 Images, 4*4 Stitching, Exact Accuracy
10 Images, 8*8 Stitching, Exact Accuracy
Paper TitleRepository
CogVLM2-Llama-37.30.90.10000CogVLM: Visual Expert for Pretrained Language Models
InstructBLIP-Vicuna-13B0000000InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
InstructBLIP-Flan-T5-XXL3.86.22.20000InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
GPT-4V86.0954.727.372.3634.247.580GPT-4 Technical Report
IDEFICS2-8B18.97.80.90000What matters when building vision-language models?-
Gemini Pro 1.029.5324.782.1116.254.820.40Gemini: A Family of Highly Capable Multimodal Models
Gemini Pro 1.590.3439.8529.8189.9445.216.090.62Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
CogVLM-17B00.10.30000CogVLM: Visual Expert for Pretrained Language Models
Claude 3 Opus52.2512.31.666.934.60.40The Claude 3 Model Family: Opus, Sonnet, Haiku-
mPLUG-Owl-v21.90.30.70.40.100mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
LLaVA-Llama-343.817.53.30000LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
GPT-4o94.683199781.826.91GPT-4 Technical Report
0 of 12 row(s) selected.