2 months ago

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Zhengxi Lu Jiabo Ye Fei Tang Yongliang Shen Haiyang Xu Ziwei Zheng Weiming Lu Ming Yan Fei Huang Jun Xiao

Abstract

Graphical User Interface (GUI) agents have demonstrated remarkable progressin automating complex user interface interactions through reinforcementlearning. However, current approaches face a fundamental dilemma: offline RLenables stable training on pre-collected trajectories, but struggles withmulti-step task execution for lack of trajectory-level reward signals; onlineRL captures these signals through environment interaction, but suffers fromsparse rewards and prohibitive deployment costs. To address it, we presentSemi-online Reinforcement Learning, a novel paradigm that simulates online RLon offline trajectories. During each rollout process, we preserve the originalmodel output within the multi-turn dialogue, where a Patch Module adaptivelyrecovers the divergence between rollout and expert trajectories. To capturelong-term training signals, Semi-online RL introduces discounted future returnsinto the reward computation and optimizes the policy with weighted step-leveland episode-level advantages. We further introduce Semi-Online Performance(SOP), a metric that aligns better with true online performance, serving as apractical and effective proxy for real-world evaluation. Experiments show thatours Semi-online RL achieves SOTA performance among 7B models across fourdynamic benchmarks, with significant gains over the base model (e.g., +12.0% onAndroidWorld, +23.8% on AITW), demonstrating significant progress in bridgingthe gap between offline training efficiency and online multi-turn reasoning.The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Zhengxi Lu Jiabo Ye Fei Tang Yongliang Shen Haiyang Xu Ziwei Zheng Weiming Lu Ming Yan Fei Huang Jun Xiao1 more

Abstract

Build AI with AI

Hyper Newsletters

Zhengxi Lu Jiabo Ye Fei Tang Yongliang Shen Haiyang Xu Ziwei Zheng Weiming Lu Ming Yan Fei Huang Jun Xiao