Command Palette
Search for a command to run...
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Abstract
Graphical User Interface (GUI) agents have demonstrated remarkable progressin automating complex user interface interactions through reinforcementlearning. However, current approaches face a fundamental dilemma: offline RLenables stable training on pre-collected trajectories, but struggles withmulti-step task execution for lack of trajectory-level reward signals; onlineRL captures these signals through environment interaction, but suffers fromsparse rewards and prohibitive deployment costs. To address it, we presentSemi-online Reinforcement Learning, a novel paradigm that simulates online RLon offline trajectories. During each rollout process, we preserve the originalmodel output within the multi-turn dialogue, where a Patch Module adaptivelyrecovers the divergence between rollout and expert trajectories. To capturelong-term training signals, Semi-online RL introduces discounted future returnsinto the reward computation and optimizes the policy with weighted step-leveland episode-level advantages. We further introduce Semi-Online Performance(SOP), a metric that aligns better with true online performance, serving as apractical and effective proxy for real-world evaluation. Experiments show thatours Semi-online RL achieves SOTA performance among 7B models across fourdynamic benchmarks, with significant gains over the base model (e.g., +12.0% onAndroidWorld, +23.8% on AITW), demonstrating significant progress in bridgingthe gap between offline training efficiency and online multi-turn reasoning.The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.