HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Abstract

In the field of AI-driven human-GUI interaction automation, while rapidadvances in multimodal large language models and reinforcement fine-tuningtechniques have yielded remarkable progress, a fundamental challenge persists:their interaction logic significantly deviates from natural human-GUIcommunication patterns. To fill this gap, we propose "Blink-Think-Link" (BTL),a brain-inspired framework for human-GUI interaction that mimics the humancognitive process between users and graphical interfaces. The system decomposesinteractions into three biologically plausible phases: (1) Blink - rapiddetection and attention to relevant screen areas, analogous to saccadic eyemovements; (2) Think - higher-level reasoning and decision-making, mirroringcognitive planning; and (3) Link - generation of executable commands forprecise motor control, emulating human action selection mechanisms.Additionally, we introduce two key technical innovations for the BTL framework:(1) Blink Data Generation - an automated annotation pipeline specificallyoptimized for blink data, and (2) BTL Reward -- the first rule-based rewardmechanism that enables reinforcement learning driven by both process andoutcome. Building upon this framework, we develop a GUI agent model namedBTL-UI, which demonstrates consistent state-of-the-art performance across bothstatic GUI understanding and dynamic interaction tasks in comprehensivebenchmarks. These results provide conclusive empirical validation of theframework's efficacy in developing advanced GUI Agents.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent | Papers | HyperAI