HyperAIHyperAI
2 days ago

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, Jitong Liao, Qi Zheng, Fei Huang, Jingren Zhou, Ming Yan
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Abstract

This paper introduces GUI-Owl, a foundational GUI agent model that achievesstate-of-the-art performance among open-source end-to-end models on ten GUIbenchmarks across desktop and mobile environments, covering grounding, questionanswering, planning, decision-making, and procedural knowledge. GUI-Owl-7Bachieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we proposeMobile-Agent-v3, a general-purpose GUI agent framework that further improvesperformance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a newstate-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporatesthree key innovations: (1) Large-scale Environment Infrastructure: acloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows,enabling our Self-Evolving GUI Trajectory Production framework. This generateshigh-quality interaction data via automated query generation and correctnessvalidation, leveraging GUI-Owl to refine trajectories iteratively, forming aself-improving loop. It supports diverse data pipelines and reduces manualannotation. (2) Diverse Foundational Agent Capabilities: by integrating UIgrounding, planning, action semantics, and reasoning patterns, GUI-Owl supportsend-to-end decision-making and can act as a modular component in multi-agentsystems. (3) Scalable Environment RL: we develop a scalable reinforcementlearning framework with fully asynchronous training for real-world alignment.We also introduce Trajectory-aware Relative Policy Optimization (TRPO) foronline RL, achieving 34.9 on OSWorld. GUI-Owl and Mobile-Agent-v3 areopen-sourced at https://github.com/X-PLUG/MobileAgent.