Command Palette
Search for a command to run...
Video Question Answering On Agqa 2 0 Balanced
평가 지표
Average Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
| Paper Title | ||
|---|---|---|
| GF (sup) - Faster RCNN | 55.08 | Glance and Focus: Memory Prompting for Multi-Event Video Question Answering |
| MIST - CLIP | 54.39 | MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering |
| GF (uns) - S3D | 53.33 | Glance and Focus: Memory Prompting for Multi-Event Video Question Answering |
| SViTT | 52.7 | SViTT: Temporal Learning of Sparse Video-Text Transformers |
| MIST - AIO | 50.96 | MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering |
| SHG-VQA (trained from scratch) | 49.2 | Learning Situation Hyper-Graphs for Video Question Answering |
| AIO - ViT | 48.59 | Glance and Focus: Memory Prompting for Multi-Event Video Question Answering |
| MMTF | 44.36 | MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering |
0 of 8 row(s) selected.