Video Question Answering On Tvqa
المقاييس
Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | Accuracy | Paper Title | Repository |
---|---|---|---|
Hero w/ pre-training | 74.24 | HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | |
iPerceive (Chadha et al., 2020) | 76.96 | iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | |
LLaMA-VQA | 82.2 | Large Language Models are Temporal and Causal Reasoners for Video Question Answering | |
STAGE (Lei et al., 2019) | 70.50 | TVQA+: Spatio-Temporal Grounding for Video Question Answering | |
VindLU | 79.0 | VindLU: A Recipe for Effective Video-and-Language Pretraining | |
FrozenBiLM | 82 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models |
0 of 6 row(s) selected.