Zero Shot Video Question Answer On Zero Shot
Métriques
Accuracy (% )
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | Accuracy (% ) | Paper Title | Repository |
---|---|---|---|
GPT-4o | 64.0 | GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding | - |
Video-RAG (based on LLaVA-Video) | 65.4 | Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | |
LLaVA-Video | 61.9 | Video Instruction Tuning With Synthetic Data | - |
Gemini 1.5 Pro | 66.7 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context |
0 of 4 row(s) selected.