WebSailor: Navigating Super-human Reasoning for Web Agent

Transcending human cognitive limitations represents a critical frontier inLLM training. Proprietary agentic systems like DeepResearch have demonstratedsuperhuman capabilities on extremely complex information-seeking benchmarkssuch as BrowseComp, a feat previously unattainable. We posit that their successhinges on a sophisticated reasoning pattern absent in open-source models: theability to systematically reduce extreme uncertainty when navigating vastinformation landscapes. Based on this insight, we introduce WebSailor, acomplete post-training methodology designed to instill this crucial capability.Our approach involves generating novel, high-uncertainty tasks throughstructured sampling and information obfuscation, RFT cold start, and anefficient agentic RL training algorithm, Duplicating Sampling PolicyOptimization (DUPO). With this integrated pipeline, WebSailor significantlyoutperforms all opensource agents in complex information-seeking tasks,matching proprietary agents' performance and closing the capability gap.