HyperAI
HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
비디오 질문 답변
Video Question Answering On Situated
Video Question Answering On Situated
평가 지표
Average Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Average Accuracy
Paper Title
Repository
MIST
51.13
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
TraveLER (0-shot)
44.9
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
SHG-VQA (trained from scratch)
39.47
Learning Situation Hyper-Graphs for Video Question Answering
Flamingo-9B (4-shot)
42.8
Flamingo: a Visual Language Model for Few-Shot Learning
SeViLA
64.9
Self-Chained Image-Language Model for Video Localization and Question Answering
All-in-one
47.5
All in One: Exploring Unified Video-Language Pre-training
GF(sup)
53.94
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
VLAP (4 frames)
67.1
ViLA: Efficient Video-Language Alignment for Video Question Answering
SeViLA (0-shot)
44.6
Self-Chained Image-Language Model for Video Localization and Question Answering
Flamingo-80B (0-shot)
39.7
Flamingo: a Visual Language Model for Few-Shot Learning
LLaMA-VQA
65.4
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
InternVideo
58.7
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Flamingo-9B (0-shot)
41.8
Flamingo: a Visual Language Model for Few-Shot Learning
Temp[ATP]
48.37
Revisiting the "Video" in Video-Language Understanding
AnyMAL-70B (0-shot)
48.2
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Flamingo-80B (4-shot)
42.4
Flamingo: a Visual Language Model for Few-Shot Learning
GF(uns)
53.86
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
0 of 17 row(s) selected.
Previous
Next
Video Question Answering On Situated | SOTA | HyperAI초신경