6 months ago

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan

Abstract

Recent studies have delved into constructing autonomous agents capable ofperforming complex Graphical User Interface (GUI)-based computer tasks, withthe potential to revolutionize human-computer interaction. Despite encouragingresults, existing efforts mainly focus on short-term interactions and rely onoutcome-only verification, thereby limiting their scalability in real-world GUIapplications that demand long-horizon task decomposition and execution. In thiswork, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designedto facilitate the development and evaluation of generalist GUI agents operatingin realistic computer environments. Our dataset emphasizes two criticaldimensions: (1) long-chain complexity, with tasks decomposed into a sequence ofinterdependent subtasks spanning hundreds of steps, explicitly designed toallow any subtask to serve as a valid starting point; and (2) subtask-levelverifiability, which enables diverse exploration strategies within eachsubtask, while ensuring that each subtask-level goal remains verifiable andconsistent. The dataset consists of GUI task trajectories across both desktopand web, annotated by human experts. Extensive experiments on VeriGUI usingvarious agents with different foundation models reveal significant performancegaps in handling long-horizon tasks, highlighting the need for more robustplanning and decision-making capabilities in GUI agents.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

VeriGUI: Verifiable Long-Chain GUI Dataset | Papers | HyperAI

Command Palette

VeriGUI: Verifiable Long-Chain GUI Dataset

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan22 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VeriGUI: Verifiable Long-Chain GUI Dataset

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan22 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VeriGUI: Verifiable Long-Chain GUI Dataset

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan22 more

Abstract

Build AI with AI

HyperAI Newsletters

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan

Shunyu Liu Minghao Liu Huichi Zhou Zhenyu Cui Yang Zhou Yuhao Zhou Wendong Fan Ge Zhang Jiajun Shi Weihao Xuan