Zero Shot Video Question Answer On Tvqa
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | Accuracy | Paper Title | Repository |
---|---|---|---|
VideoChat2 (no speech) | 40.6 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | |
FrozenBiLM (with speech) | 59.7 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | |
FrozenBILM (no speech) | 29.7 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | |
InternVideo (no speech) | 35.9 | InternVideo: General Video Foundation Models via Generative and Discriminative Learning | |
IG-VLM (no speech, GPT-4V) | 57.8 | An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM | |
VideoChat_HD_mistral (no speech) | 50.6 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | |
VideoChat_mistral (no speech) | 46.4 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | |
SEVILA (no speech) | 38.2 | Self-Chained Image-Language Model for Video Localization and Question Answering | |
MiniGPT4-video-7B | 54.21 | MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens |
0 of 9 row(s) selected.