Zeroshot Video Question Answer
Zero-Shot Video Question Answer任务旨在使大语言模型在无需特定训练的情况下,直接对视频内容进行准确的问题回答。该任务属于计算机视觉领域,通过提升模型的跨模态理解能力,实现对未见过的视频数据的即时解析与响应,具有重要的应用价值,特别是在智能对话系统、视频内容检索和自动问答等场景中。
ActivityNet-QA
IG-VLM
CinePile: A Long Video Question Answering Dataset and Benchmark
EgoSchema (fullset)
BIMBA-LLaVA-Qwen2-7B
EgoSchema (subset)
Tarsier (34B)
IntentQA
IG-VLM
MSRVTT-QA
Flash-VStream
MSVD-QA
Video-LLaVA-7B
MVBench
TS-LLaVA-34B
NExT-GQA
NExT-QA
Tarsier (34B)
STAR Benchmark
VideoChat2
TGIF-QA
PLLaVA
TVQA
FrozenBiLM (with speech)
Video-MME
Gemini 1.5 Pro
Video-MME (w/o subs)
Video-RAG (based on LLaVA-Video)
Zero-shot Video Question Answering on LongVideoBench
Gemini 1.5 Pro