HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform
  Data

Abstract

Vision-Language Models (VLMs) have enabled computer use agents (CUAs) thatoperate GUIs autonomously, showing great potential, yet progress is limited bythe lack of large-scale, open-source computer use data and foundation models.In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. Itoffers a large-scale dataset spanning 6 operating systems and 3 task domains,built via a closed-loop pipeline uniting automated agents with human experts.Trained on this scaled-up data, ScaleCUA can operate seamlessly acrossplatforms. Specifically, it delivers strong gains over baselines (+26.6 onWebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-artresults (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% onWebArena-Lite-v2). These findings underscore the power of data-driven scalingfor general-purpose computer use agents. We will release data, models, and codeto advance future research: https://github.com/OpenGVLab/ScaleCUA.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp