HyperAI
HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Visual Question Answering (VQA) - Questionnement Visuel Automatisé
Visual Question Answering On Vqa V2 Val
Visual Question Answering On Vqa V2 Val
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
Accuracy
Paper Title
Repository
MetaLM
41.1
Language Models are General-Purpose Interfaces
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
65.2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 6.7B (zero-shot)
54.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Frozen
29.5
Multimodal Few-Shot Learning with Frozen Language Models
-
BLIP-2 ViT-G FlanT5 XL (zero-shot)
63.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-L OPT 2.7B (zero-shot)
50.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
VLKD(ViT-B/16)
38.6
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
-
BLIP-2 ViT-L FlanT5 XL (zero-shot)
62.6
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
PNP-VQA
63.3
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
Few VLM (zero-shot)
47.7
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
BLIP-2 ViT-G OPT 2.7B (zero-shot)
53.5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
0 of 11 row(s) selected.
Previous
Next
Visual Question Answering On Vqa V2 Val | SOTA | HyperAI