Command Palette
Search for a command to run...
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Abstract
Transcending human cognitive limitations represents a critical frontier inLLM training. Proprietary agentic systems like DeepResearch have demonstratedsuperhuman capabilities on extremely complex information-seeking benchmarkssuch as BrowseComp, a feat previously unattainable. We posit that their successhinges on a sophisticated reasoning pattern absent in open-source models: theability to systematically reduce extreme uncertainty when navigating vastinformation landscapes. Based on this insight, we introduce WebSailor, acomplete post-training methodology designed to instill this crucial capability.Our approach involves generating novel, high-uncertainty tasks throughstructured sampling and information obfuscation, RFT cold start, and anefficient agentic RL training algorithm, Duplicating Sampling PolicyOptimization (DUPO). With this integrated pipeline, WebSailor significantlyoutperforms all open-source agents in complex information-seeking tasks,matching proprietary agents' performance and closing the capability gap.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.