HyperAIHyperAI

Visual Question Answering On Vip Bench

المقاييس

GPT-4 score (bbox)
GPT-4 score (human)

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
GPT-4 score (bbox)
GPT-4 score (human)
Paper TitleRepository
LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt50.549.0Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning-
InstructBLIP-13B (Visual Prompt)35.835.2InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning-
GPT-4V-turbo-detail:high (Visual Prompt)60.759.9GPT-4 Technical Report-
GPT4ROI 7B (ROI)35.1-GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest-
Kosmos-2 (Discrete Token)26.9-Kosmos-2: Grounding Multimodal Large Language Models to the World-
Qwen-VL-Chat (Coordinates)45.3-Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond-
LLaVA-1.5-13B (Visual Prompt)41.842.9Improved Baselines with Visual Instruction Tuning-
Shikra-7B (Coordinates)33.7-Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic-
LLaVA-1.5-13B (Coordinates)47.1-Improved Baselines with Visual Instruction Tuning-
GPT-4V-turbo-detail:low (Visual Prompt)52.851.4GPT-4 Technical Report-
Qwen-VL-Chat (Visual Prompt)39.241.7Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond-
LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt45.148.2Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning-
ViP-LLaVA-13B (Visual Prompt)48.348.2Making Large Language Models Better Data Creators-
0 of 13 row(s) selected.
Visual Question Answering On Vip Bench | SOTA | HyperAI