7 months ago

Xingyu Wu Yuchen Yan Shangke Lyu Linjuan Wu Yiwen Qiu Yongliang Shen Weiming Lu Jian Shao Jun Xiao Yueting Zhuang

Abstract

Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an external constraint into an intrinsic model capability. Unlike existing approaches that impose rigid limits or rely on post-hoc interventions, LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process. In the first stage, models learn natural reasoning patterns by discovering the statistical distribution of successful solution lengths. The second stage leverages these patterns as meta-cognitive guidance, embedding them directly within the model's reasoning context to ensure inference-time flexibility. Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%. Our analysis reveals that models trained with LAPO develop emergent abilities to allocate computational resources based on problem complexity, achieving efficient reasoning without sacrificing quality.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 months ago

Xingyu Wu Yuchen Yan Shangke Lyu Linjuan Wu Yiwen Qiu Yongliang Shen Weiming Lu Jian Shao Jun Xiao Yueting Zhuang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 months ago

Xingyu Wu Yuchen Yan Shangke Lyu Linjuan Wu Yiwen Qiu Yongliang Shen Weiming Lu Jian Shao Jun Xiao Yueting Zhuang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Xingyu Wu Yuchen Yan Shangke Lyu Linjuan Wu Yiwen Qiu Yongliang Shen Weiming Lu Jian Shao Jun Xiao Yueting Zhuang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Xingyu Wu Yuchen Yan Shangke Lyu Linjuan Wu Yiwen Qiu Yongliang Shen Weiming Lu Jian Shao Jun Xiao Yueting Zhuang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Xingyu Wu Yuchen Yan Shangke Lyu Linjuan Wu Yiwen Qiu Yongliang Shen Weiming Lu Jian Shao Jun Xiao Yueting Zhuang

Abstract

Build AI with AI

HyperAI Newsletters