HyperAIHyperAI

Command Palette

Search for a command to run...

ManCAR:順次推薦における多様体制約付き潜在的推論と適応的テスト時計算

Kun Yang Yuxuan Zhu Yazhe Chen Siyao Zheng Bangyang Hong Kangle Wu Yabo Ni Anxiang Zeng Cong Fu Hui Li

概要

順次推薦は、テスト時計算の性能向上を図るために、潜在的な多段階推論をますます活用している。実験的に優れた成果が得られているものの、従来のアプローチは、ターゲット中心の目的関数によって中間推論状態を駆動する一方で、明示的な妥当性制約を設けていない。その結果、推論軌道が非現実的な領域へと逸脱する「潜在的ドリフト(latent drift)」が生じる。本研究では、有効な推薦推論は、自由な潜在空間の精緻化ではなく、協調的多様体(collaborative manifold)上のナビゲーションとして捉えるべきだと主張する。これを実現するために、我々は「ManCAR(Manifold-Constrained Adaptive Reasoning)」と呼ばれる原理的なフレームワークを提案する。ManCARは、グローバルな相互作用グラフのトポロジー内に推論を根拠づける。具体的には、ユーザーの最近の行動の協調的近傍から局所的な意図事前分布(local intent prior)を構築し、アイテム単体(item simplex)上の確率分布として表現する。学習段階では、モデルが潜在的な予測分布をこの事前分布に段階的に一致させるように設計されており、推論軌道が有効な多様体内に留まるよう強制される。テスト時には、予測分布が安定するまで適応的に推論が進行し、過剰な精緻化を回避する。さらに、ManCARのドリフト抑制およびテスト時適応的停止メカニズムを、変分的解釈(variational interpretation)を通じて理論的に検証する。7つのベンチマークにおける実験結果から、ManCARが最先端のベースラインを一貫して上回り、NDCG@10において最大46.88%の相対的改善を達成することが示された。本研究のコードは、https://github.com/FuCongResearchSquad/ManCAR にて公開されている。

One-sentence Summary

Researchers from institutions including Tsinghua University and Alibaba propose ManCAR, a manifold-constrained reasoning framework that grounds sequential recommendation in collaborative graph topology, preventing latent drift via adaptive test-time refinement and outperforming baselines by up to 46.88% in NDCG@10.

Key Contributions

  • ManCAR introduces a manifold-constrained reasoning framework for sequential recommendation, using collaborative neighborhoods from an interaction graph to define feasible regions on the item probability simplex, thereby preventing latent drift during multi-step latent refinement.
  • The method incorporates a variational interpretation that theoretically validates its drift-prevention mechanism and enables adaptive test-time stopping via convergence of the predictive distribution, avoiding unnecessary over-refinement.
  • Evaluated on seven benchmarks, ManCAR consistently outperforms state-of-the-art baselines, achieving up to a 46.88% relative gain in NDCG@10, demonstrating both effectiveness and efficiency in real-world recommendation scenarios.

Introduction

The authors leverage latent multi-step reasoning to enhance sequential recommendation, treating it as a process of navigating a collaborative manifold defined by user-item interaction graphs rather than unconstrained latent refinement. Prior methods suffer from latent drift—where reasoning trajectories wander into implausible regions—because they optimize only for target alignment without enforcing feasibility constraints. ManCAR addresses this by grounding reasoning within a manifold derived from a user’s collaborative neighborhood, using a variational framework to align latent states with a graph-induced prior and adaptively stopping computation when predictions stabilize. This approach prevents drift, improves generalization, and achieves up to 46.88% relative gains in NDCG@10 across seven benchmarks.

Dataset

  • The authors use seven sub-category datasets from the Amazon 2023 Reviews corpus: CDs & Vinyl, Video & Games, Office Products, Arts Crafts & Sewing, Grocery & Gourmet Food, Musical Instruments, and Toys & Games.
  • Each dataset treats ratings above 3 as positive feedback. Users with fewer than 10 interactions are filtered from CDs; fewer than 5 from all others.
  • The official absolute-timestamp split is adopted, and each user’s interaction history is truncated to a maximum length of 50 items.
  • The datasets are used for training and evaluation with Recall@K and NDCG@K (K=5,10) to measure retrieval coverage and ranking quality.
  • No mixture ratios are applied; each subset is evaluated independently.
  • No cropping or image-based processing is involved — all data is textual interaction sequences.
  • Metadata includes user-item interactions, timestamps, and ratings; no additional metadata is constructed beyond filtering and truncation.

Method

The authors leverage a geometrically grounded framework called ManCAR to constrain latent reasoning trajectories within a collaborative manifold derived from the user’s recent interaction history and a global item interaction graph. The core insight is that unconstrained latent reasoning in high-dimensional spaces leads to drift, whereas user intent is inherently localized within a sparse, graph-defined neighborhood of collaboratively plausible items. ManCAR operationalizes this by treating latent reasoning as approximate inference over an intent variable, regularized by a graph-conditioned teacher prior that enforces local smoothness and feasibility.

As shown in the figure below, the reasoning trajectory in ManCAR (dashed black path) is confined to a low-dimensional region—the graph-induced manifold—defined by the k-hop neighborhood of recent items. This contrasts sharply with unconstrained reasoning (dashed red path), which wanders freely across the latent space and risks diverging from plausible intent. The manifold is instantiated via a finite candidate set C(k)=C(In,G,k)C(k) = C(I_n, \mathcal{G}, k)C(k)=C(In,G,k), which includes the most recent items InI_nIn and their k-hop neighbors on the global item graph G\mathcal{G}G. Each latent reasoning state is projected onto the item probability simplex, but only distributions concentrating mass on C(k)C(k)C(k) are considered valid, thereby restricting the reasoning space to a structured subregion.

The training objective is derived from a variational perspective. The model introduces a discrete latent intent variable cC(k)c \in C(k)cC(k), and the conditional likelihood of the next item ii^*i is factorized as pθ(iH)=cC(k)pθ(cH)pθ(ic,H)p_\theta(i^*|H) = \sum_{c \in C(k)} p_\theta(c|H) p_\theta(i^*|c,H)pθ(iH)=cC(k)pθ(cH)pθ(ic,H). A graph-conditioned teacher prior q(cIn,G)q(c|I_n, \mathcal{G})q(cIn,G) is constructed independently of model parameters and encodes prior knowledge about plausible intents reachable from recent interactions. The authors derive an ELBO-like objective:

logpθ(iH)Eq(cIn,G)[logpθ(ic,H)]DKL(q(cIn,G)pθ(cH)).\log p_\theta(i^*|H) \geq \mathbb{E}_{q(c|I_n, \mathcal{G})} \left[ \log p_\theta(i^*|c,H) \right] - D_{\mathrm{KL}}(q(c|I_n, \mathcal{G}) \parallel p_\theta(c|H)).logpθ(iH)Eq(cIn,G)[logpθ(ic,H)]DKL(q(cIn,G)pθ(cH)).

The first term encourages accurate prediction conditioned on graph-feasible intents, while the KL term regularizes the model’s inferred intent distribution to align with the teacher prior. This regularization explicitly limits the freedom of latent refinement and mitigates drift.

Refer to the framework diagram for the full implementation. The SeqRec model, typically a Transformer-based encoder, processes the user’s interaction history HHH to produce an initial latent state r1=hT1\mathbf{r}_1 = \mathbf{h}_{T-1}r1=hT1. At each reasoning step tt't, the model refines the latent state via a shared reasoning module and projects it onto the item space using the item embedding matrix E\mathbf{E}E to produce logits zt=rtE\mathbf{z}_{t'} = \mathbf{r}_{t'}^\top \mathbf{E}zt=rtE. The resulting distribution pθ(t)(iH)p_\theta^{(t')}(i|H)pθ(t)(iH) is used for both target prediction and intent modeling.

The main prediction loss at step tt't is approximated by injecting the candidate set C(k)C(k)C(k) as auxiliary context, yielding Lmain(t)=logpθ(t)(iH,C(k))\mathcal{L}_{\mathrm{main}}^{(t')} = -\log p_\theta^{(t')}(i^*|H, C(k))Lmain(t)=logpθ(t)(iH,C(k)). The regularization loss is the KL divergence between the teacher prior and the induced intent distribution: Lreg(t)=DKL(q(cIn,G)pθ(t)(cH))\mathcal{L}_{\mathrm{reg}}^{(t')} = D_{\mathrm{KL}}(q(c|I_n, \mathcal{G}) \parallel p_\theta^{(t')}(c|H))Lreg(t)=DKL(q(cIn,G)pθ(t)(cH)). The overall objective sums these losses across all reasoning steps, weighted by a regularization coefficient λ\lambdaλ.

To guide reasoning progressively, the teacher prior is scheduled to sharpen over time. Using the Rank-Based Distribution Mass Assignment (RDMA) strategy, the teacher distribution is defined as qt(c)exp(rank(c)/γt)q_{t'}(c) \propto \exp(-\mathrm{rank}(c)/\gamma_{t'})qt(c)exp(rank(c)/γt), where γt=γbase(Tt+1)\gamma_{t'} = \gamma_{\mathrm{base}} \cdot (T' - t' + 1)γt=γbase(Tt+1) decreases linearly, transitioning from a diffuse prior to a sharply peaked distribution centered on the target. This scheduling ensures bounded teacher drift and enables adaptive test-time termination: reasoning halts when the KL divergence between consecutive student distributions falls below a threshold ϵ\epsilonϵ, indicating convergence.

Additionally, to stabilize multi-step reasoning, the authors apply latent state norm rescaling after each refinement step: hϕhhavg(E)\mathbf{h} \gets \boldsymbol{\phi} \cdot \frac{\mathbf{h}}{\|\mathbf{h}\|} \cdot \operatorname{avg}(\mathbf{E})hϕhhavg(E), where avg(E)\operatorname{avg}(\mathbf{E})avg(E) is the average norm of item embeddings and ϕ\boldsymbol{\phi}ϕ is a learnable scaling parameter. This aligns latent state norms with the embedding space, improving softmax conditioning and supporting the stepwise contraction behavior assumed in the theoretical analysis.

Experiment

  • ManCAR consistently outperforms all baselines across datasets and metrics, especially in ranking quality (NDCG), demonstrating superior intent refinement via structured multi-step reasoning.
  • Graph-conditioned context enhances performance, but explicit latent reasoning (as in ManCAR) provides further gains over context-only models, showing iterative refinement resolves uncertainty more effectively.
  • ManCAR adapts reasoning depth dynamically at inference, using deeper steps on complex datasets and halting early on simpler ones, achieving near-oracle performance while balancing efficiency and accuracy.
  • Ablation studies confirm that graph-driven manifold constraints, context injection, latent state rescaling, and scheduled loss are critical; removing any degrades performance, with teacher-guided manifold control being most vital.
  • The model is sensitive to graph neighbor count and training steps but robust to other hyperparameters, allowing reliable tuning via simple grid search.
  • KL divergence analysis shows reasoning trajectories stabilize over steps, confirming convergence and consistent refinement behavior.
  • Attention visualizations reveal reasoning states repeatedly anchor to graph-conditioned context, with later layers consolidating prior steps—supporting a constrained, data-driven refinement path grounded in recent interactions and graph signals.

ManCAR adapts its reasoning depth dynamically during training and inference, unlike prior methods that use fixed step counts across datasets. It employs deeper reasoning on complex datasets like CDs and Toys while stopping early on simpler ones like Arts and Grocery, achieving better performance with efficient computation. This data-aware flexibility enables ManCAR to balance expressiveness and efficiency across varying data characteristics.

The authors use ablation experiments to isolate the contribution of key components in ManCAR, showing that removing the teacher prior causes the largest performance drop, indicating its critical role in guiding stable latent reasoning. Results show that each component—context injection, norm rescaling, and loss scheduling—contributes meaningfully to performance, with their absence leading to consistent degradation across metrics. This confirms that ManCAR’s design elements work synergistically to constrain and refine the reasoning trajectory effectively.

ManCAR adapts its reasoning depth dynamically during training and inference, using more steps on complex datasets like CDs and Toys while stopping early on simpler ones like Arts and Grocery. This data-aware computation strategy allows it to outperform fixed-step baselines while maintaining efficiency. The model’s ability to adjust step count per dataset reflects its capacity to balance reasoning expressiveness with computational cost based on data characteristics.

The authors use ManCAR to outperform all baseline models across multiple datasets and metrics, with particularly strong gains in ranking quality as measured by NDCG. Results show that ManCAR’s integration of graph-conditioned context, scheduled teacher supervision, and adaptive inference enables more stable and effective multi-step reasoning compared to prior methods. The model achieves near-ceiling performance through data-aware step allocation, demonstrating that its structured refinement process adapts to dataset complexity while maintaining computational efficiency.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています