3 months ago

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang

Abstract

Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-contextual paradigm that accumulates all information in a single, expanding context window, leading to context suffocation and noise contamination that limit their effectiveness on long-horizon tasks. We introduce IterResearch, a novel iterative deep-research paradigm that reformulates long-horizon research as a Markov Decision Process with strategic workspace reconstruction. By maintaining an evolving report as memory and periodically synthesizing insights, our approach preserves consistent reasoning capacity across arbitrary exploration depths. We further develop Efficiency-Aware Policy Optimization (EAPO), a reinforcement learning framework that incentivizes efficient exploration through geometric reward discounting and enables stable distributed training via adaptive downsampling. Extensive experiments demonstrate that IterResearch achieves substantial improvements over existing open-source agents with average +14.5pp across six benchmarks and narrows the gap with frontier proprietary systems. Remarkably, our paradigm exhibits unprecedented interaction scaling, extending to 2048 interactions with dramatic performance gains (from 3.5% to 42.5%), and serves as an effective prompting strategy, improving frontier models by up to 19.2pp over ReAct on long-horizon tasks. These findings position IterResearch as a versatile solution for long-horizon reasoning, effective both as a trained agent and as a prompting paradigm for frontier models.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

3 months ago

Reinforcement Learning

Reasoning

Retrieval-Augmented Generation

Method/Architecture

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

3 months ago

Reinforcement Learning

Reasoning

Retrieval-Augmented Generation

Method/Architecture

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang6 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang6 more

Abstract

Build AI with AI

HyperAI Newsletters

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang

Guoxin Chen Zile Qiao Xuanzhong Chen Donglei Yu Haotian Xu Wayne Xin Zhao Ruihua Song Wenbiao Yin Huifeng Yin Liwen Zhang