Mobile-Agent-v3: Foundamental Agents for GUI Automation

This paper introduces GUI-Owl, a foundational GUI agent model that achievesstate-of-the-art performance among open-source end-to-end models on ten GUIbenchmarks across desktop and mobile environments, covering grounding, questionanswering, planning, decision-making, and procedural knowledge. GUI-Owl-7Bachieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we proposeMobile-Agent-v3, a general-purpose GUI agent framework that further improvesperformance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a newstate-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporatesthree key innovations: (1) Large-scale Environment Infrastructure: acloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows,enabling our Self-Evolving GUI Trajectory Production framework. This generateshigh-quality interaction data via automated query generation and correctnessvalidation, leveraging GUI-Owl to refine trajectories iteratively, forming aself-improving loop. It supports diverse data pipelines and reduces manualannotation. (2) Diverse Foundational Agent Capabilities: by integrating UIgrounding, planning, action semantics, and reasoning patterns, GUI-Owl supportsend-to-end decision-making and can act as a modular component in multi-agentsystems. (3) Scalable Environment RL: we develop a scalable reinforcementlearning framework with fully asynchronous training for real-world alignment.We also introduce Trajectory-aware Relative Policy Optimization (TRPO) foronline RL, achieving 34.9 on OSWorld. GUI-Owl and Mobile-Agent-v3 areopen-sourced at https://github.com/X-PLUG/MobileAgent.