Long Context Understanding On Mmneedle
Métriques
1 Image, 2*2 Stitching, Exact Accuracy
1 Image, 4*4 Stitching, Exact Accuracy
1 Image, 8*8 Stitching, Exact Accuracy
10 Images, 1*1 Stitching, Exact Accuracy
10 Images, 2*2 Stitching, Exact Accuracy
10 Images, 4*4 Stitching, Exact Accuracy
10 Images, 8*8 Stitching, Exact Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | 1 Image, 2*2 Stitching, Exact Accuracy | 1 Image, 4*4 Stitching, Exact Accuracy | 1 Image, 8*8 Stitching, Exact Accuracy | 10 Images, 1*1 Stitching, Exact Accuracy | 10 Images, 2*2 Stitching, Exact Accuracy | 10 Images, 4*4 Stitching, Exact Accuracy | 10 Images, 8*8 Stitching, Exact Accuracy |
---|---|---|---|---|---|---|---|
cogvlm-visual-expert-for-pretrained-language | 7.3 | 0.9 | 0.1 | 0 | 0 | 0 | 0 |
instructblip-towards-general-purpose-vision | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
instructblip-towards-general-purpose-vision | 3.8 | 6.2 | 2.2 | 0 | 0 | 0 | 0 |
gpt-4-technical-report-1 | 86.09 | 54.72 | 7.3 | 72.36 | 34.24 | 7.58 | 0 |
what-matters-when-building-vision-language | 18.9 | 7.8 | 0.9 | 0 | 0 | 0 | 0 |
gemini-a-family-of-highly-capable-multimodal-1 | 29.53 | 24.78 | 2.11 | 16.25 | 4.82 | 0.4 | 0 |
gemini-1-5-unlocking-multimodal-understanding | 90.34 | 39.85 | 29.81 | 89.94 | 45.21 | 6.09 | 0.62 |
cogvlm-visual-expert-for-pretrained-language | 0 | 0.1 | 0.3 | 0 | 0 | 0 | 0 |
the-claude-3-model-family-opus-sonnet-haiku | 52.25 | 12.3 | 1.6 | 66.93 | 4.6 | 0.4 | 0 |
mplug-owl2-revolutionizing-multi-modal-large | 1.9 | 0.3 | 0.7 | 0.4 | 0.1 | 0 | 0 |
llava-uhd-an-lmm-perceiving-any-aspect-ratio | 43.8 | 17.5 | 3.3 | 0 | 0 | 0 | 0 |
gpt-4-technical-report-1 | 94.6 | 83 | 19 | 97 | 81.8 | 26.9 | 1 |