a month ago

Variational Reasoning for Language Models

Xiangxin Zhou Zichen Liu Haonan Wang Chao Du Min Lin Chongxuan Li Liang Wang Tianyu Pang

Abstract

We introduce a variational reasoning framework for language models thattreats thinking traces as latent variables and optimizes them throughvariational inference. Starting from the evidence lower bound (ELBO), we extendit to a multi-trace objective for tighter bounds and propose a forward-KLformulation that stabilizes the training of the variational posterior. Wefurther show that rejection sampling finetuning and binary-reward RL, includingGRPO, can be interpreted as local forward-KL objectives, where an implicitweighting by model accuracy naturally arises from the derivation and reveals apreviously unnoticed bias toward easier questions. We empirically validate ourmethod on the Qwen 2.5 and Qwen 3 model families across a wide range ofreasoning tasks. Overall, our work provides a principled probabilisticperspective that unifies variational inference with RL-style methods and yieldsstable objectives for improving the reasoning ability of language models. Ourcode is available at https://github.com/sail-sg/variational-reasoning.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Variational Reasoning for Language Models

Xiangxin Zhou Zichen Liu Haonan Wang Chao Du Min Lin Chongxuan Li Liang Wang Tianyu Pang

Abstract

Build AI with AI

Hyper Newsletters