HyperAIHyperAI

Command Palette

Search for a command to run...

3年前

デジャヴ:順序推薦のための文脈化された時間的注意機構

Jibang Wu Renqin Cai Hongning Wang

ResNetとアテンション機構を備えたLSTM

RTX 5090のコンピュートリソースがわずか20時間分 $1 (価値 $7)
ノートブックへ移動

概要

タイトル:文脈依存型時間的注意機構を用いた時系列レコメンデーション

抄録:過去の時系列行動に基づいてユーザーの嗜好を予測することは、現代のレコメンデーションシステムにとって重要かつ困難な課題である。既存の多くの時系列レコメンデーションアルゴリズムは、時系列行動間の遷移構造に焦点を当てているが、過去の事象が現在の予測に与える影響をモデル化する際、時間的および文脈的情報を大きく無視している。本論文では、過去の事象がユーザーの現在の行動に与える影響は、時間の経過および異なる文脈の下で変化するべきであると主張する。したがって、我々は文脈依存型時間的注意機構(Contextualized Temporal Attention Mechanism)を提案する。この機構は、行動が何であったかだけでなく、いつ、どのようにその行動が行われたかという点において、過去の行動の影響を重み付けすることを学習する。より具体的には、自己注意機構からの相対的な入力の依存度を動的に較正するために、複数のパラメータ化されたカーネル関数を導入し、様々な時間的ダイナミクスを学習する。その後、各入力に対してどの再重み付けカーネルに従うべきかを決定するために文脈情報を使用する。2つの大規模な公開レコメンデーションデータセットにおける実証評価において、本モデルは広範な最先端の時系列レコメンデーション手法を一貫して上回る性能を示した。

One-sentence Summary

Déjà vu introduces a contextualized temporal attention mechanism for sequential recommendation that dynamically weights historical interactions via parameterized kernel functions to capture time-varying and context-dependent influences, consistently outperforming state-of-the-art sequential recommendation methods on two large public datasets.

Key Contributions

  • The paper introduces a Contextualized Temporal Attention Mechanism for sequential recommendation that explicitly models the time- and context-dependent influence of historical user actions. This approach replaces standard transition-focused modeling with a framework that dynamically weights past interactions based on their specific timing and contextual conditions.
  • To calibrate self-attention dependencies, the model employs multiple parameterized kernel functions that capture diverse temporal dynamics. Contextual signals then select the appropriate reweighing kernel for each input, enabling adaptive adjustment of historical action impacts according to their temporal and contextual attributes.
  • Empirical evaluations on two large public recommendation datasets demonstrate that the proposed framework consistently outperforms an extensive set of state-of-the-art sequential recommendation methods. These results validate the effectiveness of integrating adaptive temporal and contextual weighting into attention-based sequence modeling.

Introduction

Predicting user preferences from historical interaction sequences is a foundational challenge for modern recommender systems. While existing sequential recommendation models effectively capture item transitions, they largely overlook how temporal dynamics and contextual factors modulate the true influence of past actions. Prior approaches often rely on static time adjustments, struggle with sparse behavioral data, or fail to scale across diverse event types. To address these gaps, the authors leverage a Contextualized Temporal Attention Mechanism that dynamically recalibrates historical influence using parameterized kernel functions. By routing these kernels based on contextual signals, the model adaptively weighs past interactions according to both timing and situational factors, delivering consistent performance gains over current baselines.

Dataset

  • Dataset Composition and Sources: The authors utilize two public datasets to capture user behavior across distinct application domains: a professional networking platform and an e-commerce marketplace.
  • XING Subset: Sourced from the Recsys Challenge 2016 collection, this dataset tracks job posting interactions. Each entry records user ID, item ID, timestamp, and interaction type. The authors exclude "delete" actions and ignore specific interaction categories. They apply popularity thresholds by removing items with fewer than 50 actions and restricting users to between 10 and 1,000 actions. Interactions with a dwell time under 10 seconds for the same item and action type are also discarded.
  • UserBehavior Subset: Provided by Alibaba, this dataset contains commercial product interactions with identical metadata fields. To ensure computational tractability, the authors randomly sub-sample 100,000 user sequences. They filter out items with fewer than 20 actions, limit user activity to between 20 and 300 actions, and remove any interactions falling outside the dataset's native 9-day temporal window.
  • Processing and Model Usage: The authors clean and structure both subsets into sequential interaction records for model training. The preprocessing pipeline emphasizes temporal constraints, engagement thresholds, and dwell time validation to eliminate noise and low-quality signals. While the provided excerpt does not specify exact train-test splits or mixture ratios, the authors rely on these filtered sequences to train and evaluate their deep learning architecture, prioritizing data quality and manageable model complexity.

Method

The proposed model, Contextualized Temporal Attention Mechanism (CTA), employs a three-stage pipeline to capture content, temporal, and contextual information for sequential recommendation. The framework processes a user's historical interaction sequence of item and timestamp pairs, denoted as {(ti,si)}i=1L\{(t_i, s_i)\}_{i=1}^{L}{(ti,si)}i=1L, along with the current prediction time tL+1t_{L+1}tL+1. The input sequence of items is mapped to an embedding space using EinputRN×dinE_{\text{input}} \in \mathbb{R}^{N \times d_{\text{in}}}EinputRN×din, resulting in X=[s1,,sL]EinputRL×dinX = [s_1, \ldots, s_L] \cdot E_{\text{input}} \in \mathbb{R}^{L \times d_{\text{in}}}X=[s1,,sL]EinputRL×din. The timestamps are transformed into intervals relative to the prediction time, forming T=[tL+1t1,,tL+1tL]RL×1T = [t_{L+1} - t_1, \ldots, t_{L+1} - t_L] \in \mathbb{R}^{L \times 1}T=[tL+1t1,,tL+1tL]RL×1. The model then processes these inputs through three sequential stages: MαM^{\alpha}Mα, MβM^{\beta}Mβ, and MγM^{\gamma}Mγ, which respectively model content, temporal, and contextual dependencies.

The first stage, α\alphaα stage, focuses on content-based importance. It utilizes a stack of dld_ldl self-attentive encoder blocks with dhd_hdh heads and dad_ada hidden units to process the input sequence XXX. Each attention block computes multi-head attention via scaled dot-product, where queries, keys, and values are derived from the input state HjH^jHj through learnable projections WiQ,WiK,WiVW_i^Q, W_i^K, W_i^VWiQ,WiK,WiV. The resulting attention heads are concatenated and projected to form ZjZ^jZj, which is then combined with the input via a residual connection and layer normalization to produce the next layer's hidden state Hj+1H^{j+1}Hj+1. The final hidden state HdlH^{d_l}Hdl is used to compute content-based importance scores α\alphaα by performing a scaled dot-product attention between the last layer's hidden states and the last item's embedding, resulting in a vector αRL×1\alpha \in \mathbb{R}^{L \times 1}αRL×1.

The second stage, β\betaβ stage, models temporal dynamics. It applies a set of KKK kernel functions to the time intervals TTT to capture the influence of past events based on their temporal gaps. The kernels include exponential decay, logarithmic decay, linear decay, and constant functions, each parameterized by aaa and bbb. These kernels transform the raw time intervals into KKK sets of temporal importance scores, forming βRL×K\beta \in \mathbb{R}^{L \times K}βRL×K, which represent different possible temporal influence patterns.

The third stage, γ\gammaγ stage, fuses content and temporal information based on contextual cues. It first extracts context features CRL×drC \in \mathbb{R}^{L \times d_r}CRL×dr using a Bidirectional RNN (BiRNN) on the input item sequence XXX. This captures the surrounding event context from both past and future actions, reflecting the user's sensitivity and seriousness towards each interaction. Optionally, additional context features CattrC_{\text{attr}}Cattr can be concatenated with the BiRNN output. A feed-forward layer FYF^{\mathcal{Y}}FY maps the context features to a probability distribution over the KKK temporal kernels, which is normalized via a softmax layer to obtain P(C)RL×KP(\cdot|C) \in \mathbb{R}^{L \times K}P(C)RL×K. This distribution is used to mix the temporal scores from the β\betaβ stage, producing a contextualized temporal influence βc=βP(C)\beta^c = \beta \cdot P(\cdot|C)βc=βP(C). The final contextualized attention score γ\gammaγ is obtained by element-wise multiplication of the content score α\alphaα and the contextualized temporal score βc\beta^cβc, followed by a softmax normalization to ensure the weights sum to one. This weighted sum is then used to compute the predicted item representation x^L+1\hat{x}_{L+1}x^L+1, which is projected to the output embedding space and used to compute similarity scores with all items for recommendation.

Experiment

The proposed Contextualized Temporal Attention model was evaluated on two large-scale user behavior datasets against diverse baselines with matched parameter counts to establish a fair comparative foundation. Overall performance and dataset-specific analyses validate that the model effectively bridges the gap between first-order transition and sequence popularity patterns, outperforming specialized architectures by integrating a three-stage weighing mechanism. Ablation studies and architectural tests further confirm that shared item embeddings, multi-kernel temporal modeling, and dynamic content scoring are critical components for robust performance. Finally, attention visualizations demonstrate that the network learns meaningful, non-linear temporal and contextual reweighting, ultimately proving that explicitly modeling contextualized temporal dynamics yields a highly accurate, interpretable, and efficient solution for sequential recommendation.

The authors evaluate their proposed Contextualized Temporal Attention Mechanism against a range of baseline models on two large datasets, demonstrating that their model outperforms all baselines in Recall@5 on both datasets. The results indicate that the model effectively captures sequential popularity patterns, particularly on the UserBehavior dataset, while showing limitations in modeling first-order transition patterns. The performance improvements are attributed to the model's ability to weigh historical events based on content, temporal influence, and context, with ablation studies confirming the importance of each component. The proposed model significantly outperforms all baseline methods in Recall@5 on both datasets, indicating strong effectiveness in sequential recommendation. The model's performance is particularly strong on the UserBehavior dataset, which exhibits sequence popularity patterns, and it captures long-term dependencies better than recurrent models. Ablation studies confirm that the model's three-stage weighing mechanism, incorporating content, temporal influence, and context, is crucial for its improved performance over baselines.

The authors evaluate their proposed Contextualized Temporal Attention Mechanism (CTA) against a range of baseline models on two datasets, demonstrating that CTA outperforms all baselines in Recall@5 on both datasets. The model shows strong performance on the UserBehavior dataset, where sequential popularity patterns dominate, but weaker results on the XING dataset, where first-order transition patterns are more prevalent. The ablation study reveals that the model's effectiveness relies on its three-stage weighing mechanism, with the temporal influence component being particularly crucial for capturing long-term dependencies. CTA achieves the highest Recall@5 on both datasets, outperforming all baselines. The model's performance is stronger on the UserBehavior dataset, where it captures sequential popularity, compared to the XING dataset, where first-order transitions dominate. The ablation study confirms that the temporal influence component is critical for the model's effectiveness, especially in capturing long-term dependencies.

The authors conduct an ablation study to analyze the impact of various components in their proposed model, focusing on different architectural choices and their effects on performance across two datasets. The results show that the model's performance varies significantly with changes in window size, loss function, attention settings, and the use of different temporal kernels, indicating that the design choices are task-dependent and require careful tuning. The model demonstrates robustness in capturing contextual and temporal influences, with certain configurations leading to notable improvements in ranking metrics. The model's performance is sensitive to the choice of window size, with optimal settings varying between datasets based on the underlying behavioral patterns. Using a ranking-based loss function and specific temporal kernel combinations leads to significant improvements in recommendation quality. The model effectively captures contextual and temporal influences, with the combined importance score being a dominant factor in determining event relevance.

The proposed Contextualized Temporal Attention Mechanism was evaluated against multiple baselines on two datasets to assess its overall effectiveness in sequential recommendation. The experiments demonstrate that the model successfully captures long-term dependencies and sequential popularity patterns, though it shows limitations when first-order transition patterns dominate. Ablation studies validate that a three-stage weighing mechanism combining content, context, and temporal influence is essential, with temporal weighting proving particularly crucial for performance. Additionally, the results indicate that model effectiveness is highly sensitive to architectural choices, highlighting the importance of dataset-specific tuning for window sizes and loss functions.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています