Visual Question Answering On Vcr Q A Test
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Accuracy |
---|---|
vl-bert-pre-training-of-generic-visual | 75.8 |
multimodal-adaptive-distillation-for | 79.6 |
uniter-learning-universal-image-text-1 | 77.3 |
gpt4roi-instruction-tuning-large-language | 89.4 |
visualbert-a-simple-and-performant-baseline | 71.6 |
ernie-vil-knowledge-enhanced-vision-language | 81.6 |
uniter-learning-universal-image-text-1 | 79.8 |
harnessing-the-power-of-multi-task | 71.2 |
harnessing-the-power-of-multi-task | 62 |
unifying-vision-and-language-tasks-via-text | 75.3 |
kvl-bert-knowledge-enhanced-visual-and | 76.4 |