4 months ago

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen

Abstract

Training computer-use agents requires massive amounts of GUI interactiondata, but manually annotating action trajectories at scale is prohibitivelyexpensive. We present VideoAgentTrek, a scalable pipeline that automaticallymines training data from publicly available screen-recorded videos at webscale, eliminating the need for manual annotation. Our approach addresses a keychallenge: raw videos contain implicit demonstrations but lack explicit actionlabels. To solve this, we develop Video2Action, an inverse dynamics module(IDM) with two components: (1) a video grounding model that detects andlocalizes GUI actions with precise temporal boundaries and context, and (2) anaction-content recognizer that extracts structured parameters like clickcoordinates and typed text with high fidelity. Applied to 39,000 YouTubetutorial videos, our pipeline generates 1.52 million interaction stepsautomatically. We leverage this data through continued pretraining followed bysupervised fine-tuning. On OSWorld-Verified, our approach improves task successrates from 9.3% (SFT-only baseline) to 15.8%, a 70% relative improvement. OnAgentNetBench, step accuracy increases from 64.1% to 69.3%. Our resultsdemonstrate that passive internet videos can be transformed into high-qualitysupervision for computer-use agents, providing a scalable alternative toexpensive manual annotation.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 months ago

Action Recognition

Human-Computer Interaction

Multimodal Representation

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 months ago

Action Recognition

Human-Computer Interaction

Multimodal Representation

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen5 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen5 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen5 more

Abstract

Build AI with AI

HyperAI Newsletters

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen

Dunjie Lu Yiheng Xu Junli Wang Haoyuan Wu Xinyuan Wang Zekun Wang Junlin Yang Hongjin Su Jixuan Chen Junda Chen