HyperAI
HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
ビデオ質問応答
Video Question Answering On Situated
Video Question Answering On Situated
評価指標
Average Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Average Accuracy
Paper Title
Repository
MIST
51.13
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
-
TraveLER (0-shot)
44.9
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
-
SHG-VQA (trained from scratch)
39.47
Learning Situation Hyper-Graphs for Video Question Answering
-
Flamingo-9B (4-shot)
42.8
Flamingo: a Visual Language Model for Few-Shot Learning
-
SeViLA
64.9
Self-Chained Image-Language Model for Video Localization and Question Answering
-
All-in-one
47.5
All in One: Exploring Unified Video-Language Pre-training
-
GF(sup)
53.94
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
-
VLAP (4 frames)
67.1
ViLA: Efficient Video-Language Alignment for Video Question Answering
-
SeViLA (0-shot)
44.6
Self-Chained Image-Language Model for Video Localization and Question Answering
-
Flamingo-80B (0-shot)
39.7
Flamingo: a Visual Language Model for Few-Shot Learning
-
LLaMA-VQA
65.4
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
-
InternVideo
58.7
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
-
Flamingo-9B (0-shot)
41.8
Flamingo: a Visual Language Model for Few-Shot Learning
-
Temp[ATP]
48.37
Revisiting the "Video" in Video-Language Understanding
-
AnyMAL-70B (0-shot)
48.2
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
-
Flamingo-80B (4-shot)
42.4
Flamingo: a Visual Language Model for Few-Shot Learning
-
GF(uns)
53.86
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
-
0 of 17 row(s) selected.
Previous
Next
Video Question Answering On Situated | SOTA | HyperAI超神経