HyperAIHyperAI
13 days ago

Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu
Pruning the Unsurprising: Efficient Code Reasoning via First-Token
  Surprisal
Abstract

Recently, Large Reasoning Models (LRMs) have demonstrated remarkablecapabilities in code reasoning by scaling up the length of Chain-of-Thought(CoT). However, excessively long reasoning traces introduce substantialchallenges in terms of training cost, inference latency, and deploymentfeasibility. While various CoT compression approaches have emerged to addressthis challenge, they face inherent trade-offs: token-level methods oftendisrupt syntactic and logical coherence, while step-level methods based onperplexity fail to reliably capture the logically critical reasoning steps. Inthis paper, we propose ASAP (Anchor-guided, Surprisal-based Pruning), a novelcoarse-to-fine framework for CoT compression. ASAP first performs anchor-guidedpruning to preserve the core reasoning structure, which efficiently reduces thesearch space for subsequent processing. It then enables a logic-aware pruningby selecting logically essential reasoning steps based on a novel first-tokensurprisal metric. Finally, ASAP teaches models to autonomously generate andleverage these concise CoTs at inference time, enabling efficient reasoning incoding tasks. Experiments show that ASAP achieves state-of-the-art accuracyacross multiple code generation benchmarks while substantially reducingtraining and inference costs. On the challenging LiveCodeBench v4_v5 benchmark,our approach reduces token generation by 23.5% and inference latency by 43.5%compared to the strongest baseline, while achieving a competitive accuracy of36.19% in Pass@1. Our results highlight a promising direction for buildingpowerful and efficient LRMs.