Command Palette
Search for a command to run...
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States
Qinglin Zhu Yizhen Yao Runcong Zhao Yanzheng Xiang Amrutha Saseendran Chen Jin Philip Alexander Teare Bin Liang Yulan He Lin Gui

Abstract
Autoregressive (AR) models remain the standard for natural languagegeneration but still suffer from high latency due to strictly sequentialdecoding. Recent diffusion-inspired approaches, such as LlaDA and Dream,mitigate this by generating in parallel, yet they suffer from two corelimitations: information loss, as predictive distributions for non-finalizedtokens are discarded at each step, and premature commitment, where localdecisions are made without sufficient global coordination. We introduce LatentRefinement Decoding (LRD), a two-stage framework with Latent Refinement and aPredictive Feedback Loop. The first stage maintains masked positions asdistributional mixtures of predicted tokens and the mask embedding, allowingthe model to establish more globally consistent beliefs. The second stageprogressively finalizes confident tokens while retaining uncertain ones foriterative feedback. KL-divergence dynamics provide a principled and reliablecriterion for convergence and early stopping. Experiments across coding(HumanEval +6.3, MBPP +2.6) and reasoning (GSM8K +2.9, MATH500 +3.8) show thatLRD improves accuracy while delivering speedups of up to 10.6x, making it astrong and versatile alternative for parallel sequence generation.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.