Visual Question Answering On Vqa V2 Val 1
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | Accuracy | Paper Title | Repository |
---|---|---|---|
BLIP-2 ViT-G FlanT5 XL (fine-tuned) | 81.55 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
BLIP-2 ViT-G OPT 6.7B (fine-tuned) | 82.19 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
LocVLM-L | 55.9 | Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | |
BLIP-2 ViT-G OPT 2.7B (fine-tuned) | 81.59 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |
0 of 4 row(s) selected.