HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Visuelles Fragebeantworten
Visual Question Answering On Vip Bench
Visual Question Answering On Vip Bench
Metriken
GPT-4 score (bbox)
GPT-4 score (human)
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
GPT-4 score (bbox)
GPT-4 score (human)
Paper Title
GPT-4V-turbo-detail:high (Visual Prompt)
60.7
59.9
GPT-4 Technical Report
GPT-4V-turbo-detail:low (Visual Prompt)
52.8
51.4
GPT-4 Technical Report
LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt
50.5
49.0
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
ViP-LLaVA-13B (Visual Prompt)
48.3
48.2
Making Large Language Models Better Data Creators
LLaVA-1.5-13B (Coordinates)
47.1
-
Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat (Coordinates)
45.3
-
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt
45.1
48.2
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
LLaVA-1.5-13B (Visual Prompt)
41.8
42.9
Improved Baselines with Visual Instruction Tuning
Qwen-VL-Chat (Visual Prompt)
39.2
41.7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
InstructBLIP-13B (Visual Prompt)
35.8
35.2
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
GPT4ROI 7B (ROI)
35.1
-
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shikra-7B (Coordinates)
33.7
-
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Kosmos-2 (Discrete Token)
26.9
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
0 of 13 row(s) selected.
Previous
Next
Visual Question Answering On Vip Bench | SOTA | HyperAI