Vcgbench Diverse On Videoinstruct
评估指标
Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Consistency | Contextual Understanding | Correctness of Information | Dense Captioning | Detail Orientation | Reasoning | Spatial Understanding | Temporal Understanding | mean | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|---|
BT-Adapter | 2.27 | 2.59 | 2.20 | 1.03 | 2.62 | 3.62 | 2.35 | 1.29 | 2.19 | BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning | |
VideoGPT+ | 2.59 | 2.81 | 2.46 | 1.38 | 2.73 | 3.63 | 2.80 | 1.78 | 2.47 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | |
Chat-UniVi | 2.36 | 2.66 | 2.29 | 1.33 | 2.56 | 3.59 | 2.36 | 1.56 | 2.29 | Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | |
VideoChat2 | 2.27 | 2.51 | 2.13 | 1.26 | 2.42 | 3.13 | 2.43 | 1.66 | 2.20 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | |
VTimeLLM | 2.35 | 2.48 | 2.16 | 1.13 | 2.41 | 3.45 | 2.29 | 1.46 | 2.17 | VTimeLLM: Empower LLM to Grasp Video Moments | |
Video-ChatGPT | 2.06 | 2.46 | 2.07 | 0.89 | 2.42 | 3.60 | 2.25 | 1.39 | 2.08 | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models |
0 of 6 row(s) selected.