Command Palette
Search for a command to run...
Visual Question Answering On Vcr Q Ar Test
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
| Paper Title | ||
|---|---|---|
| GPT4RoI | 81.6 | GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest |
| ERNIE-ViL-large(ensemble of 15 models) | 70.5 | ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph |
| UNITER (Large) | 62.8 | UNITER: UNiversal Image-TExt Representation Learning |
| KVL-BERTLARGE | 60.3 | KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning |
| VL-BERTLARGE | 59.7 | VL-BERT: Pre-training of Generic Visual-Linguistic Representations |
| VL-T5 | 58.9 | Unifying Vision-and-Language Tasks via Text Generation |
| VisualBERT | 52.4 | VisualBERT: A Simple and Performant Baseline for Vision and Language |
0 of 7 row(s) selected.