Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

Diffusion large language models (dLLMs) generate text through iterativedenoising, yet current decoding strategies discard rich intermediatepredictions in favor of the final output. Our work here reveals a criticalphenomenon, temporal oscillation, where correct answers often emerge in themiddle process, but are overwritten in later denoising steps. To address thisissue, we introduce two complementary methods that exploit temporalconsistency: 1) Temporal Self-Consistency Voting, a training-free, test-timedecoding strategy that aggregates predictions across denoising steps to selectthe most consistent output; and 2) a post-training method termed TemporalConsistency Reinforcement, which uses Temporal Semantic Entropy (TSE), ameasure of semantic stability across intermediate predictions, as a rewardsignal to encourage stable generations. Empirical results across multiplebenchmarks demonstrate the effectiveness of our approach. Using the negativeTSE reward alone, we observe a remarkable average improvement of 24.7% on theCountdown dataset over an existing dLLM. Combined with the accuracy reward, weachieve absolute gains of 2.0% on GSM8K, 4.3% on MATH500, 6.6% on SVAMP, and25.3% on Countdown, respectively. Our findings underscore the untappedpotential of temporal dynamics in dLLMs and offer two simple yet effectivetools to harness them.