Command Palette
Search for a command to run...
Video Question Answering On How2Qa
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
| Paper Title | ||
|---|---|---|
| Text + Text (no Multimodal Pretext Training) | 93.2 | Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval |
| FrozenBiLM | 86.7 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models |
| Just Ask | 84.4 | Just Ask: Learning to Answer Questions from Millions of Narrated Videos |
| SeViLA | 83.7 | - |
| Hero w/ pre-training | 77.75 | HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training |
| ATP | 65.1 | Revisiting the "Video" in Video-Language Understanding |
| FrozenBiLM (0-shot) | 58.4 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models |
| Just Ask (0-shot) | 51.1 | Just Ask: Learning to Answer Questions from Millions of Narrated Videos |
0 of 8 row(s) selected.