Natural Language Visual Grounding On

평가 지표

Accuracy (%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Accuracy (%)
Paper TitleRepository
OS-Atlas-Base-7B82.47OS-ATLAS: A Foundation Action Model for Generalist GUI Agents-
Aguvis-7B83.0Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction-
Qwen-GUI28.6GUICourse: From General Vision Language Models to Versatile GUI Agents-
UGround73.3Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents-
OS-Atlas-Base-4B68.0OS-ATLAS: A Foundation Action Model for Generalist GUI Agents-
ShowUI75.1ShowUI: One Vision-Language-Action Model for GUI Visual Agent
UGround-V1-7B86.34Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents-
Groma5.2Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models-
ShowUI-G75.0ShowUI: One Vision-Language-Action Model for GUI Visual Agent
MiniGPT-v25.7MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning-
Aria-UI81.1Aria-UI: Visual Grounding for GUI Instructions-
Qwen2-VL-7B42.1Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution-
OmniParser73.0OmniParser for Pure Vision Based GUI Agent-
UGround-V1-2B77.67Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents-
Qwen-VL5.2Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond-
SeeClick53.4SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents-
Aguvis-G-7B81.0Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction-
CogAgent47.4CogAgent: A Visual Language Model for GUI Agents-
0 of 18 row(s) selected.