Command Palette
Search for a command to run...
생성적 재귀적 추론
생성적 재귀적 추론
Junyeob Baek Mingyu Jo Minsu Kim Mengye Ren Yoshua Bengio Sungjin Ahn
초록
재귀적 추론 모델(Recursive Reasoning Models, RRMs)은 공유 전이 함수(transition functions)를 사용하여 반복적인 잠재 상태(latent-state) 정교화 과정을 수행함으로써 자기회귀적 서열 확장(autoregressive sequence extension)에 대한 유망한 대안을 제시한다. 그러나 기존 RRMs는 대부분 결정론적(deterministic)이며, 단일 잠재 궤적(latent trajectory)을 따르고 단일 예측으로 수렴하는 경향이 있다. 본 논문은 재귀적 잠재 추론을 확률적 다중 궤적 계산(probabilistic multi-trajectory computation)으로 전환하는 프레임워크인 생성적 재귀 추론 모델(Generative Recursive reAsoning Models, GRAM)을 소개한다. GRAM은 추론을 확률적 잠재 궤적으로 모델링하여, 재귀적 깊이(recursive depth)와 병렬 궤적 샘플링(parallel trajectory sampling) 모두를 통해 다중 가설, 대체 해결 전략, 그리고 추론 시간 추론(inference-time scaling)을 가능하게 한다. 이를 통해 GRAM은 조건부 추론을 위한 pθ(y∣x)와 고정되거나 결측된 입력에 대한 비조건부 생성(unconditional generation)을 위한 pθ(x)를 지원하는 잠재 변수 생성 모델(latent-variable generative model)을 구성한다. 사변적 변분 추론(amortized variational inference)으로 학습된 GRAM은 구조화된 추론 및 다중 솔루션 제약 만족(multi-solution constraint satisfaction) 작업에서 결정론적 순환 및 재귀적 베이스라인(deterministic recurrent and recursive baselines) 대비 성능을 개선하며, 비조건부 생성 능력을 입증한다.
One-sentence Summary
Generative Recursive reAsoning Models (GRAM), trained with amortized variational inference, transform deterministic recursive reasoning into probabilistic multi-trajectory computation via stochastic latent trajectories to enable multiple hypotheses and inference-time scaling through recursive depth and parallel trajectory sampling, improving over deterministic recurrent and recursive baselines on structured reasoning and multi-solution constraint satisfaction tasks while supporting conditional reasoning and unconditional generation.
Key Contributions
- The paper introduces Generative Recursive Reasoning Models (GRAM), a framework that formulates recursive reasoning as a latent-variable generative process using stochastic latent trajectories. This approach allows solutions to be obtained by marginalizing over multiple sampling paths instead of converging to a single deterministic prediction.
- This work establishes width-based inference-time scaling, enabling the model to scale computation through both recursive depth and the number of parallel sampled trajectories. This mechanism facilitates the maintenance of multiple hypotheses and the exploration of alternative solution strategies during inference.
- Empirical evaluations on benchmarks such as Sudoku-Extreme, ARC-AGI, and N-Queens demonstrate improvements over deterministic recurrent and recursive baselines. Results confirm the framework yields advantages in structured reasoning, multi-solution constraint satisfaction, and unconditional generation capabilities.
Introduction
Future neural reasoning systems require extended computation mechanisms beyond standard autoregressive sequence generation. Recursive Reasoning Models address this by iteratively refining a persistent latent state to decouple reasoning depth from parameter scale. However, existing implementations remain fundamentally deterministic, causing reasoning paths to converge on a single prediction and failing to explore alternative hypotheses. The authors introduce Generative Recursive Reasoning Models (GRAM) to transform recursive latent reasoning into probabilistic multi-trajectory computation. This framework treats reasoning as a stochastic latent trajectory, enabling the maintenance of multiple solutions and inference scaling through parallel trajectory sampling alongside recursive depth.
Dataset
-
Dataset Composition and Sources
- The authors utilize a suite of discrete reasoning and generation tasks including Sudoku, ARC-AGI, N-Queens, Graph Coloring, and MNIST.
- Synthetic data for N-Queens and Graph Coloring is algorithmically generated to control solution counts and difficulty.
- Vision tasks draw from the MNIST dataset and the ARC-AGI benchmark.
-
Subset Details and Processing
- N-Queens
- Source: All valid complete solutions for N=8 and N=10 boards serve as the base.
- Filtering: Puzzle instances form by removing k queens (5 to 7 for N=8, 7 to 9 for N=10).
- Split: An 85:15 train-test split operates on unique input configurations to prevent memorization.
- Encoding: Boards flatten into 1D sequences with a vocabulary of padding, empty cells, and queens.
- Graph Coloring
- Source: Graphs sample from an Erdos-Renyi random model with 8 or 10 nodes and 3 colors.
- Filtering: Only 3-colorable graphs retain, with canonical forms kept to remove color permutation redundancy.
- Size: The dataset includes 7,002 training and 255 test instances for N=8, plus 13,465 training and 192 test instances for N=10.
- Encoding: Inputs encode the flattened upper triangular adjacency matrix while outputs map node positions to color IDs.
- Other Tasks
- Sudoku: 9×9 grids flatten row-by-row into 81 tokens with a vocabulary size of 11.
- ARC-AGI: Variable grids pad to a fixed 30×30 canvas with EOS markers and task-specific embeddings.
- MNIST: Images quantize and process via CNN-based patchification into 14×14 flattened sequences with a vocabulary size of 3.
- N-Queens
Method
Generative Recursive Reasoning Models (GRAM) introduce a probabilistic framework for recursive reasoning, distinguishing itself from deterministic approaches by modeling a distribution over latent reasoning trajectories. Rather than following a single fixed path, GRAM allows the latent state to evolve stochastically, enabling the exploration of multiple solution paths for a given input. This conceptual difference is illustrated in the figure below, where deterministic models converge to a single trajectory, whereas GRAM samples diverse paths to reach potential solutions.
The architecture of GRAM is organized into an encoder, a hierarchical recursive core, and a decoder, as shown in the architecture schematic below. The encoder first computes an embedding ex from the input x, which is reused throughout the computation. The core maintains a latent state z=(h,l) composed of a high-level component h and a low-level component l. The high-level state accumulates abstract reasoning information across transitions, while the low-level state undergoes rapid refinement within each transition. Specifically, the low-level component is updated K times via a function fL while the high-level component remains fixed. Subsequently, the high-level component is updated via fH and augmented with a learnable stochastic guidance signal ϵt. This hierarchical structure allows for multi-scale reasoning, separating fine-grained computation from high-level trajectory steering.
Training GRAM involves optimizing a variational evidence lower bound (ELBO) to approximate the conditional likelihood pθ(y∣x). The model is treated as a latent-variable probabilistic model where the full latent trajectory τ consists of a sequence of latent variables. To handle the intractable marginalization over trajectories, a variational posterior qϕ(τ∣x,y) is introduced alongside the prior pθ(τ∣x). The optimization objective includes a reconstruction term and a Kullback-Leibler (KL) divergence term that regularizes the posterior to be close to the prior. During training, deep supervision is applied over multiple supervision steps, where the terminal state of one step serves as the initial state of the next. Gradients are propagated through the final transition of each step to ensure memory efficiency.
The probabilistic nature of the model enables it to capture multimodal distributions over outputs. For tasks with multiple valid solutions, such as graph coloring, GRAM can sample distinct valid configurations by traversing different latent trajectories, as demonstrated in the figure below.
Furthermore, the same recursive mechanism can be adapted for unconditional generative modeling pθ(x) by replacing the input conditioning with an empty embedding. In this setting, the model generates data iteratively through the latent space, with the quality of the generated samples improving as the recursive depth increases. This capability allows GRAM to function as a generative model for tasks such as image synthesis, where it progressively refines noise into coherent structures, as shown in the generation process below.
Experiment
The evaluation assesses GRAM across structured reasoning benchmarks, multi-solution puzzles, and unconditional generation tasks to validate its probabilistic recursive architecture. Findings indicate that stochastic transitions allow the model to explore diverse reasoning paths and scale effectively through parallel sampling, allowing it to outperform deterministic baselines that suffer from mode collapse on complex problems. Furthermore, recursive refinement enables stricter constraint satisfaction than generative sampling alone, with ablation studies confirming that stochastic guidance is essential for navigating multi-solution spaces and improving overall performance.
The authors evaluate GRAM on unconditional Sudoku generation, comparing it against D3PM baselines of varying sizes. The results demonstrate that GRAM achieves superior validity rates while utilizing significantly fewer parameters and inference steps compared to the diffusion-based baselines. GRAM achieves the highest validity rate among all tested methods. GRAM requires substantially fewer computational steps compared to the D3PM variants. GRAM operates with a smaller parameter count than the competing baselines.
The the the table outlines the training configuration and computational costs for the GRAM model across various structured reasoning and generation benchmarks. It indicates that ARC-AGI is the most resource-intensive task, requiring substantially more training epochs and time compared to other benchmarks like Sudoku or N-Queens. ARC-AGI requires the most extensive training resources, taking significantly longer than other tasks. N-Queens tasks are the most computationally efficient, requiring the fewest epochs and shortest training duration. Graph Coloring training time increases notably as the number of nodes grows, despite maintaining the same number of training epochs.
The the the table presents an ablation study evaluating the impact of stochastic guidance and stochasticity on the GRAM model's performance on Sudoku and N-Queens tasks. The full GRAM model achieves the highest accuracy on both benchmarks, demonstrating the effectiveness of combining stochastic transitions with learned guidance. Removing stochastic guidance significantly degrades performance, particularly on the multi-solution N-Queens task, while removing stochasticity entirely causes the model to fail completely. Full GRAM outperforms all ablated variants and baselines on both benchmarks. Stochasticity alone maintains high Sudoku performance but collapses on N-Queens, highlighting the need for structured guidance in multi-solution spaces. Deterministic guidance without stochasticity results in total failure, indicating that stochastic transitions are essential for navigating the solution space.
The the the table presents an evaluation of unconditional image generation on binarized MNIST, comparing the proposed GRAM model against VAE, D3PM, and TRM baselines. Results demonstrate that GRAM significantly outperforms the deterministic TRM baseline, which exhibits mode collapse, while achieving performance comparable to the diffusion-based D3PM model. Furthermore, the findings indicate that increasing the number of recursion steps during inference leads to consistent improvements in generation quality. The deterministic TRM baseline suffers from mode collapse, yielding significantly poorer generation quality than GRAM. GRAM produces images with quality comparable to the D3PM diffusion model. Generation quality improves monotonically as the number of recursion steps increases.
The authors evaluate GRAM on multi-solution puzzles like N-Queens and Graph Coloring to test its ability to capture diverse valid solutions. Results indicate that GRAM outperforms deterministic recursive baselines in coverage and surpasses generative models in accuracy and constraint satisfaction. This suggests that combining recursive refinement with stochastic sampling allows the model to navigate complex solution spaces more effectively than using either approach alone. GRAM achieves superior accuracy and solution coverage on N-Queens compared to both recursive and autoregressive baselines. The method significantly reduces constraint violations in Graph Coloring tasks while maintaining high diversity in generated solutions. Deterministic recursive models struggle with multi-solution coverage, whereas GRAM leverages stochastic transitions to explore diverse reasoning paths.
The authors evaluate GRAM on structured reasoning benchmarks such as Sudoku, N-Queens, and ARC-AGI alongside unconditional image generation, comparing performance against diffusion and deterministic baselines. Results indicate that GRAM achieves superior validity and accuracy with fewer parameters and inference steps while avoiding the mode collapse observed in deterministic recursive models. Ablation studies further confirm that combining stochastic transitions with learned guidance is essential for navigating complex solution spaces and ensuring robust constraint satisfaction.