Vcgbench Diverse On Videoinstruct
Metrics
Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Consistency | Contextual Understanding | Correctness of Information | Dense Captioning | Detail Orientation | Reasoning | Spatial Understanding | Temporal Understanding | mean |
---|---|---|---|---|---|---|---|---|---|
one-for-all-video-conversation-is-feasible | 2.27 | 2.59 | 2.20 | 1.03 | 2.62 | 3.62 | 2.35 | 1.29 | 2.19 |
videogpt-integrating-image-and-video-encoders | 2.59 | 2.81 | 2.46 | 1.38 | 2.73 | 3.63 | 2.80 | 1.78 | 2.47 |
chat-univi-unified-visual-representation | 2.36 | 2.66 | 2.29 | 1.33 | 2.56 | 3.59 | 2.36 | 1.56 | 2.29 |
mvbench-a-comprehensive-multi-modal-video | 2.27 | 2.51 | 2.13 | 1.26 | 2.42 | 3.13 | 2.43 | 1.66 | 2.20 |
vtimellm-empower-llm-to-grasp-video-moments | 2.35 | 2.48 | 2.16 | 1.13 | 2.41 | 3.45 | 2.29 | 1.46 | 2.17 |
video-chatgpt-towards-detailed-video | 2.06 | 2.46 | 2.07 | 0.89 | 2.42 | 3.60 | 2.25 | 1.39 | 2.08 |