Vcgbench Diverse On Videoinstruct

評価指標

Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean
Paper TitleRepository
BT-Adapter2.272.592.201.032.623.622.351.292.19BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning-
VideoGPT+2.592.812.461.382.733.632.801.782.47VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding-
Chat-UniVi2.362.662.291.332.563.592.361.562.29Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding-
VideoChat22.272.512.131.262.423.132.431.662.20MVBench: A Comprehensive Multi-modal Video Understanding Benchmark-
VTimeLLM2.352.482.161.132.413.452.291.462.17VTimeLLM: Empower LLM to Grasp Video Moments-
Video-ChatGPT2.062.462.070.892.423.602.251.392.08Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models-
0 of 6 row(s) selected.
Vcgbench Diverse On Videoinstruct | SOTA | HyperAI超神経