5 months ago

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu

Abstract

Transcending human cognitive limitations represents a critical frontier inLLM training. Proprietary agentic systems like DeepResearch have demonstratedsuperhuman capabilities on extremely complex information-seeking benchmarkssuch as BrowseComp, a feat previously unattainable. We posit that their successhinges on a sophisticated reasoning pattern absent in open-source models: theability to systematically reduce extreme uncertainty when navigating vastinformation landscapes. Based on this insight, we introduce WebSailor, acomplete post-training methodology designed to instill this crucial capability.Our approach involves generating novel, high-uncertainty tasks throughstructured sampling and information obfuscation, RFT cold start, and anefficient agentic RL training algorithm, Duplicating Sampling PolicyOptimization (DUPO). With this integrated pipeline, WebSailor significantlyoutperforms all open-source agents in complex information-seeking tasks,matching proprietary agents' performance and closing the capability gap.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

Agent

Reinforcement Learning

Reasoning

Method/Architecture

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

Agent

Reinforcement Learning

Reasoning

Method/Architecture

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu7 more

Abstract

Build AI with AI

HyperAI Newsletters

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu

Kuan Li Zhongwang Zhang Huifeng Yin Rui Ye Yida Zhao Liwen Zhang Litu Ou Dingchu Zhang Xixi Wu Jialong Wu