HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic
  Data and Scalable Reinforcement Learning

Abstract

Transcending human cognitive limitations represents a critical frontier inLLM training. Proprietary agentic systems like DeepResearch have demonstratedsuperhuman capabilities on extremely complex information-seeking benchmarkssuch as BrowseComp, a feat previously unattainable. We posit that their successhinges on a sophisticated reasoning pattern absent in open-source models: theability to systematically reduce extreme uncertainty when navigating vastinformation landscapes. Based on this insight, we introduce WebSailor, acomplete post-training methodology designed to instill this crucial capability.Our approach involves generating novel, high-uncertainty tasks throughstructured sampling and information obfuscation, RFT cold start, and anefficient agentic RL training algorithm, Duplicating Sampling PolicyOptimization (DUPO). With this integrated pipeline, WebSailor significantlyoutperforms all open-source agents in complex information-seeking tasks,matching proprietary agents' performance and closing the capability gap.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp