Command Palette
Search for a command to run...
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Zhi Zhou Yuhao Tan Zenan Li Yuan Yao Lan-Zhe Guo Yu-Feng Li Xiaoxing Ma

Abstract
Test-time scaling seeks to improve the reasoning performance of largelanguage models (LLMs) by adding computational resources. A prevalent approachwithin the field is sampling-based test-time scaling methods, which enhancereasoning by generating multiple reasoning paths for a given input duringinference. However, despite its practical success, the theoretical foundationsremain underexplored. In this paper, we provide the first theoretical frameworkfor analyzing sampling-based test-time scaling methods, grounded in theperspective of confidence estimation. Based on the framework, we analyze twodominant paradigms: self-consistency and perplexity, and reveal keylimitations: self-consistency suffers from high estimation error whileperplexity exhibits substantial modeling error and possible degradation of theestimation error convergence. To address these limitations, we introduce RPC, ahybrid method that leverages our theoretical insights through two keycomponents: Perplexity Consistency and Reasoning Pruning. PerplexityConsistency combines the strengths of self-consistency and perplexity, boostingthe convergence rate of estimation error from linear to exponential whilepreserving model error. Reasoning Pruning prevents degradation by eliminatinglow-probability reasoning paths. Both theoretical analysis and empiricalresults across seven benchmark datasets demonstrate that RPC has a strongpotential for reducing reasoning error. Notably, RPC achieves reasoningperformance comparable to self-consistency while not only enhancing confidencereliability but also reducing sampling costs by 50%. The code and resources areavailable at https://wnjxyk.github.io/RPC.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.