3 months ago

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados

Abstract

Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen elements. While large datasets exist for web and mobile interactions, high-quality resources for desktop environments are limited. To address this gap, we introduce GroundCUA, a large-scale desktop grounding dataset built from expert human demonstrations. It covers 87 applications across 12 categories and includes 56K screenshots, with every on-screen element carefully annotated for a total of over 3.56M human-verified annotations. From these demonstrations, we generate diverse instructions that capture a wide range of real-world tasks, providing high-quality data for model training. Using GroundCUA, we develop the GroundNext family of models that map instructions to their target UI elements. At both 3B and 7B scales, GroundNext achieves state-of-the-art results across five benchmarks using supervised fine-tuning, while requiring less than one-tenth the training data of prior work. Reinforcement learning post-training further improves performance, and when evaluated in an agentic setting on the OSWorld benchmark using o3 as planner, GroundNext attains comparable or superior results to models trained with substantially more data,. These results demonstrate the critical role of high-quality, expert-driven datasets in advancing general-purpose computer-use agents.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

3 months ago

Dataset

Document Understanding

Any-to-Any

AI Infra

Natural Language Processing

Multimodality

Task/Problem

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

3 months ago

Dataset

Document Understanding

Any-to-Any

AI Infra

Natural Language Processing

Multimodality

Task/Problem

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Grounding Computer Use Agents on Human Demonstrations

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Grounding Computer Use Agents on Human Demonstrations

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Grounding Computer Use Agents on Human Demonstrations

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados7 more

Abstract

Build AI with AI

HyperAI Newsletters

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados

Aarash Feizi Shravan Nayak Xiangru Jian Kevin Qinghong Lin Kaixin Li Rabiul Awal Xing Han Lù Johan Obando-Ceron Juan A. Rodriguez Nicolas Chapados