Vcgbench Diverse On Videoinstruct
Métriques
Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Consistency | Contextual Understanding | Correctness of Information | Dense Captioning | Detail Orientation | Reasoning | Spatial Understanding | Temporal Understanding | mean |
---|---|---|---|---|---|---|---|---|---|
one-for-all-video-conversation-is-feasible | 2.27 | 2.59 | 2.20 | 1.03 | 2.62 | 3.62 | 2.35 | 1.29 | 2.19 |
videogpt-integrating-image-and-video-encoders | 2.59 | 2.81 | 2.46 | 1.38 | 2.73 | 3.63 | 2.80 | 1.78 | 2.47 |
chat-univi-unified-visual-representation | 2.36 | 2.66 | 2.29 | 1.33 | 2.56 | 3.59 | 2.36 | 1.56 | 2.29 |
mvbench-a-comprehensive-multi-modal-video | 2.27 | 2.51 | 2.13 | 1.26 | 2.42 | 3.13 | 2.43 | 1.66 | 2.20 |
vtimellm-empower-llm-to-grasp-video-moments | 2.35 | 2.48 | 2.16 | 1.13 | 2.41 | 3.45 | 2.29 | 1.46 | 2.17 |
video-chatgpt-towards-detailed-video | 2.06 | 2.46 | 2.07 | 0.89 | 2.42 | 3.60 | 2.25 | 1.39 | 2.08 |