HyperAI
HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Visuelles Fragebeantworten (VQA)
Visual Question Answering On Vqa V2 Val
Visual Question Answering On Vqa V2 Val
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Accuracy
Paper Title
Repository
MetaLM
41.1
Language Models are General-Purpose Interfaces
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
65.2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 6.7B (zero-shot)
54.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Frozen
29.5
Multimodal Few-Shot Learning with Frozen Language Models
-
BLIP-2 ViT-G FlanT5 XL (zero-shot)
63.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-L OPT 2.7B (zero-shot)
50.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
VLKD(ViT-B/16)
38.6
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
-
BLIP-2 ViT-L FlanT5 XL (zero-shot)
62.6
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
PNP-VQA
63.3
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
Few VLM (zero-shot)
47.7
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
BLIP-2 ViT-G OPT 2.7B (zero-shot)
53.5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
0 of 11 row(s) selected.
Previous
Next