HyperAI초신경

Vcgbench Diverse On Videoinstruct

평가 지표

Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean
Paper TitleRepository
BT-Adapter2.272.592.201.032.623.622.351.292.19BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
VideoGPT+2.592.812.461.382.733.632.801.782.47VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Chat-UniVi2.362.662.291.332.563.592.361.562.29Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
VideoChat22.272.512.131.262.423.132.431.662.20MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VTimeLLM2.352.482.161.132.413.452.291.462.17VTimeLLM: Empower LLM to Grasp Video Moments
Video-ChatGPT2.062.462.070.892.423.602.251.392.08Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
0 of 6 row(s) selected.