HyperAI초신경

Natural Language Visual Grounding On

평가 지표

Accuracy (%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Accuracy (%)
os-atlas-a-foundation-action-model-for82.47
aguvis-unified-pure-vision-agents-for83.0
guicourse-from-general-vision-language-models28.6
navigating-the-digital-world-as-humans-do73.3
os-atlas-a-foundation-action-model-for68.0
showui-one-vision-language-action-model-for75.1
navigating-the-digital-world-as-humans-do86.34
groma-localized-visual-tokenization-for5.2
showui-one-vision-language-action-model-for75.0
minigpt-v2-large-language-model-as-a-unified5.7
aria-ui-visual-grounding-for-gui-instructions81.1
qwen2-vl-enhancing-vision-language-model-s42.1
omniparser-for-pure-vision-based-gui-agent73.0
navigating-the-digital-world-as-humans-do77.67
qwen-vl-a-frontier-large-vision-language5.2
seeclick-harnessing-gui-grounding-for53.4
aguvis-unified-pure-vision-agents-for81.0
cogagent-a-visual-language-model-for-gui47.4