HyperAIHyperAI

Command Palette

Search for a command to run...

2달 전
LLM
Reasoning

균형 잡힌 사고를 통한 효율적 추론

Yulin Li Tengyao Tu Li Ding Junjie Wang Huiling Zhen Yixin Chen Yong Li Zhuotao Tian

초록

대규모 추론 모델(Large Reasoning Models, LRMs)은 뛰어난 추론 능력을 입증해 왔으나, 단순한 문제에서 불필요한 계산 단계를 반복하는 '과도한 추론(overthinking)' 또는 내재된 능력에도 불구하고 충분한 추론 경로를 탐색하지 못하는 '부족한 추론(underthinking)'에 시달리는 경우가 많습니다. 이러한 문제들은 비효율성과 잠재적 정확도 저하를 초래하여, 자원 제약 환경에서의 실용적 배포를 제한합니다. 과도한 추론을 완화하기 위한 기존 방법들, 예를 들어 반성적 키워드 억제 또는 추론 길이 조절 등은 오히려 부족한 추론을 유발하여 정확성을 훼손할 수 있습니다. 이에 본 연구는 균형 잡힌 사고를 통해 효율적인 추론을 실현하는 훈련이 불필요한(free) 프레임워크인 ReBalance 를 제안합니다. ReBalance 는 신뢰도(confidence) 를 추론 역학의 연속적 지표로 활용하여, 높은 신뢰도 분산(variance) 을 통해 과도한 추론을 식별하고, 일관된 과신(overconfidence) 을 통해 부족한 추론을 탐지합니다. 소규모 데이터셋의 은닉 상태(hidden states) 를 추론 모드 프로토타입으로 집계하여 유도 벡터(steering vector) 를 계산함으로써 LRMs 의 추론 궤적을 안내합니다. 동적 제어 함수는 실시간 신뢰도에 기반하여 이 벡터의 강도와 방향을 조절하여, 과도한 추론 시 중복성을 제거하고 부족한 추론 시 탐색을 촉진합니다. 0.5B 에서 32B 에 이르는 네 가지 모델과 수학 추론, 일반 질의 응답, 코딩 작업 등 아홉 가지 벤치마크(benchmark) 에서 수행한 광범위한 실험 결과, ReBalance 는 출력의 중복성을 효과적으로 줄이면서 정확도를 향상시키는 것을 확인했습니다. 이는 효율적이고 견고한 LRM 배포를 위한 범용적이고 훈련이 불필요하며 플러그 앤 플레이(plug-and-play) 방식의 전략을 제공합니다. 관련 코드는 https://github.com/yu-lin-li/ReBalance 에서 확인할 수 있습니다.

One-sentence Summary

Researchers from Harbin Institute of Technology and collaborating institutes propose REBALANCE, a training-free framework that uses confidence-based steering vectors to dynamically balance reasoning depth. This approach effectively mitigates overthinking and underthinking in Large Reasoning Models, enhancing accuracy and efficiency across math, coding, and general question-answering benchmarks without requiring fine-tuning.

Key Contributions

  • The paper introduces REBALANCE, a training-free framework that achieves efficient reasoning by leveraging confidence as a continuous indicator to identify overthinking through high variance and underthinking via consistent overconfidence.
  • A steering vector is computed by aggregating hidden states into reasoning mode prototypes, which a dynamic control function modulates in real-time to prune redundancy or promote exploration based on the model's confidence levels.
  • Extensive experiments across four models ranging from 0.5B to 32B and nine benchmarks demonstrate that the method effectively reduces output redundancy while simultaneously improving accuracy in math reasoning, general question answering, and coding tasks.

Introduction

Large Reasoning Models (LRMs) excel at complex tasks but often suffer from inefficiency due to overthinking on simple problems or underthinking on difficult ones, which hinders their deployment in resource-constrained environments. Prior attempts to fix overthinking by suppressing reflection or shortening reasoning chains frequently backfire by inducing underthinking, leading to premature and inaccurate conclusions. The authors leverage confidence as a continuous signal to distinguish between these two states and propose REBALANCE, a training-free framework that dynamically steers the model's hidden states to prune redundancy during overthinking while encouraging exploration during underthinking.

Dataset

  • The authors curate a diverse evaluation suite spanning mathematics, science, and coding, drawing from established benchmarks like MATH-500, AIME, GSM8K, GPQA DIAMOND, and LIVECODEBENCH.
  • The dataset composition includes three difficulty tiers: simple sets like GSM8K (1,319 problems) and AMC23 (40 problems); moderate sets like MATH-500 (500 problems); and hard sets including AIME24/AIME25 (30 problems each), GPQA DIAMOND (198 problems), OLYMPIADBENCH (675 problems), and LIVECODEBENCH v1 (400 problems).
  • Specific filtering and sourcing rules apply to each subset, such as using the official 2024/2025 AIME cycles, selecting expert-authored graduate-level questions for GPQA, and ensuring contamination awareness in LIVECODEBENCH by using version v1 with execution-based unit tests.
  • For training and evaluation, the authors utilize standard splits where available, such as the ~7.5k training and ~1k test split for GSM8K, while treating other benchmarks as held-out test sets to assess reasoning capabilities.
  • The processing pipeline applies a unified prompt template across all math-related subsets, instructing the model to reason step by step and format the final answer within a boxed notation.

Method

The authors propose ReBALANCE, a training-free framework designed to dynamically balance overthinking and underthinking in Large Reasoning Models (LRMs) to improve efficiency without compromising accuracy. The framework operates through a two-stage process involving offline data collection and online inference with dynamic steering. Refer to the framework diagram for a comprehensive overview of the system architecture.

To effectively control the reasoning process, the method first explicitly models reasoning states prone to overthinking or underthinking using stepwise confidence and confidence variance. Overthinking is identified as a state characterized by low confidence and high variance, reflecting unstable or oscillating reasoning trajectories. Conversely, underthinking is defined by persistently high confidence and low variance, indicating premature convergence. Refer to the examples illustrating these distinct reasoning behaviors and the target balanced state.

The framework extracts steering vectors from the hidden states of the LRM to guide the model away from these undesirable modes. During the offline stage, a one-pass data collection is performed on a small seen dataset to identify prototypes for overthinking and underthinking. The authors analyze the linear decodability of confidence signals across layers to automatically select the optimal deep layer for intervention, as visualized in the layer-wise R2R^2R2 analysis. The steering vector is then constructed as the normalized difference between the overthinking and underthinking prototypes, establishing a direction in the latent space for behavior modulation.

During online inference, a dynamic control function adaptively modulates the steering strength and direction based on real-time model states. This function takes the current stepwise confidence and confidence variance as inputs to compute a steering weight. The weight is designed to push the model's state away from the nearest reasoning boundary, ensuring the trajectory remains within a balanced region. Refer to the visualization of the control function surface, which demonstrates how the steering strength varies non-linearly based on confidence and variance levels to mitigate both overthinking and underthinking.

Experiment

  • Analysis of reasoning length distributions reveals that existing overthinking mitigation methods often induce underthinking by prematurely truncating necessary steps, whereas the proposed ReBALANCE method achieves a balanced reduction that preserves accuracy while shortening outputs.
  • Experiments demonstrate that confidence variance and step-level confidence serve as reliable indicators for distinguishing between overthinking (high variance, low confidence) and underthinking (persistently high confidence), enabling fine-grained behavioral control without auxiliary models.
  • Evaluations across diverse benchmarks in mathematics, science, code, and commonsense reasoning confirm that ReBALANCE significantly reduces token usage and inference latency while improving or maintaining Pass@1 accuracy, outperforming prompt-based and external verifier-based baselines.
  • Ablation studies validate that dynamic control based on confidence signals is superior to static adjustments, and that steering vectors extracted from medium-difficulty datasets generalize effectively across different domains and model sizes.
  • Additional tests on NPU devices and creative writing tasks show that the method maintains robust performance on specialized hardware and preserves or enhances the model's creative expressiveness and linguistic diversity.

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp
균형 잡힌 사고를 통한 효율적 추론 | 문서 | HyperAI초신경