Zero Shot Video Question Answer On Zero Shot

Accuracy (% )

Résultats

Résultats de performance de divers modèles sur ce benchmark

		Paper Title
Gemini 1.5 Pro	66.7	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Video-RAG (based on LLaVA-Video)	65.4	Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
GPT-4o	64.0	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
LLaVA-Video	61.9	Video Instruction Tuning With Synthetic Data

0 of 4 row(s) selected.