Command Palette
Search for a command to run...
الدها فو: آلية انتباه زمنية سياقية للتوصية المتسلسلة
الدها فو: آلية انتباه زمنية سياقية للتوصية المتسلسلة
Jibang Wu Renqin Cai Hongning Wang
ResNet و LSTM مع آلية الانتباه
الملخص
العنوان: (غير متوفر)
الملخص: يُعد التنبؤ بتفضيلات المستخدمين استنادًا إلى سلوكياتهم المتسلسلة في السجل التاريخي أمرًا صعبًا وحيويًا لأنظمة التوصية الحديثة. تركز معظم خوارزميات التوصية المتسلسلة الحالية على البنية الانتقالية بين الإجراءات المتسلسلة، لكنها تتجاهل إلى حد كبير المعلومات الزمنية والسياقية عند نمذجة تأثير حدث تاريخي على التنبؤ الحالي. في هذه الورقة، نجادل بأن تأثير الأحداث الماضية على الإجراء الحالي للمستخدم يجب أن يتغير مع مرور الوقت وتحت سياقات مختلفة. وبالتالي، نقترح آلية انتباه زمنية سياقية (Contextualized Temporal Attention Mechanism) تتعلم وزن تأثير الإجراءات التاريخية ليس فقط على نوع الإجراء نفسه، بل أيضًا على توقيت حدوثه وكيفية تنفيذه. وبشكل أكثر تحديدًا، ومعادلة الاعتمادية النسبية للإدخال ديناميكيًا من آلية الانتباه الذاتي (self-attention)، نوظف دوال نواة متعددة معلمة (parameterized kernel functions) لتعلم ديناميكيات زمنية متنوعة، ثم نستخدم المعلومات السياقية لتحديد أي من نوى إعادة الوزن هذه يجب اتباعه لكل إدخال. في التقييمات التجريبية على مجموعتي بيانات توصية عامة كبيرتين، تفوق نموذجنا باستمرار على مجموعة واسعة من أحدث طرق التوصية المتسلسلة.
One-sentence Summary
Déjà vu introduces a contextualized temporal attention mechanism for sequential recommendation that dynamically weights historical interactions via parameterized kernel functions to capture time-varying and context-dependent influences, consistently outperforming state-of-the-art sequential recommendation methods on two large public datasets.
Key Contributions
- The paper introduces a Contextualized Temporal Attention Mechanism for sequential recommendation that explicitly models the time- and context-dependent influence of historical user actions. This approach replaces standard transition-focused modeling with a framework that dynamically weights past interactions based on their specific timing and contextual conditions.
- To calibrate self-attention dependencies, the model employs multiple parameterized kernel functions that capture diverse temporal dynamics. Contextual signals then select the appropriate reweighing kernel for each input, enabling adaptive adjustment of historical action impacts according to their temporal and contextual attributes.
- Empirical evaluations on two large public recommendation datasets demonstrate that the proposed framework consistently outperforms an extensive set of state-of-the-art sequential recommendation methods. These results validate the effectiveness of integrating adaptive temporal and contextual weighting into attention-based sequence modeling.
Introduction
Predicting user preferences from historical interaction sequences is a foundational challenge for modern recommender systems. While existing sequential recommendation models effectively capture item transitions, they largely overlook how temporal dynamics and contextual factors modulate the true influence of past actions. Prior approaches often rely on static time adjustments, struggle with sparse behavioral data, or fail to scale across diverse event types. To address these gaps, the authors leverage a Contextualized Temporal Attention Mechanism that dynamically recalibrates historical influence using parameterized kernel functions. By routing these kernels based on contextual signals, the model adaptively weighs past interactions according to both timing and situational factors, delivering consistent performance gains over current baselines.
Dataset
- Dataset Composition and Sources: The authors utilize two public datasets to capture user behavior across distinct application domains: a professional networking platform and an e-commerce marketplace.
- XING Subset: Sourced from the Recsys Challenge 2016 collection, this dataset tracks job posting interactions. Each entry records user ID, item ID, timestamp, and interaction type. The authors exclude "delete" actions and ignore specific interaction categories. They apply popularity thresholds by removing items with fewer than 50 actions and restricting users to between 10 and 1,000 actions. Interactions with a dwell time under 10 seconds for the same item and action type are also discarded.
- UserBehavior Subset: Provided by Alibaba, this dataset contains commercial product interactions with identical metadata fields. To ensure computational tractability, the authors randomly sub-sample 100,000 user sequences. They filter out items with fewer than 20 actions, limit user activity to between 20 and 300 actions, and remove any interactions falling outside the dataset's native 9-day temporal window.
- Processing and Model Usage: The authors clean and structure both subsets into sequential interaction records for model training. The preprocessing pipeline emphasizes temporal constraints, engagement thresholds, and dwell time validation to eliminate noise and low-quality signals. While the provided excerpt does not specify exact train-test splits or mixture ratios, the authors rely on these filtered sequences to train and evaluate their deep learning architecture, prioritizing data quality and manageable model complexity.
Method
The proposed model, Contextualized Temporal Attention Mechanism (CTA), employs a three-stage pipeline to capture content, temporal, and contextual information for sequential recommendation. The framework processes a user's historical interaction sequence of item and timestamp pairs, denoted as {(ti,si)}i=1L, along with the current prediction time tL+1. The input sequence of items is mapped to an embedding space using Einput∈RN×din, resulting in X=[s1,…,sL]⋅Einput∈RL×din. The timestamps are transformed into intervals relative to the prediction time, forming T=[tL+1−t1,…,tL+1−tL]∈RL×1. The model then processes these inputs through three sequential stages: Mα, Mβ, and Mγ, which respectively model content, temporal, and contextual dependencies.
The first stage, α stage, focuses on content-based importance. It utilizes a stack of dl self-attentive encoder blocks with dh heads and da hidden units to process the input sequence X. Each attention block computes multi-head attention via scaled dot-product, where queries, keys, and values are derived from the input state Hj through learnable projections WiQ,WiK,WiV. The resulting attention heads are concatenated and projected to form Zj, which is then combined with the input via a residual connection and layer normalization to produce the next layer's hidden state Hj+1. The final hidden state Hdl is used to compute content-based importance scores α by performing a scaled dot-product attention between the last layer's hidden states and the last item's embedding, resulting in a vector α∈RL×1.
The second stage, β stage, models temporal dynamics. It applies a set of K kernel functions to the time intervals T to capture the influence of past events based on their temporal gaps. The kernels include exponential decay, logarithmic decay, linear decay, and constant functions, each parameterized by a and b. These kernels transform the raw time intervals into K sets of temporal importance scores, forming β∈RL×K, which represent different possible temporal influence patterns.
The third stage, γ stage, fuses content and temporal information based on contextual cues. It first extracts context features C∈RL×dr using a Bidirectional RNN (BiRNN) on the input item sequence X. This captures the surrounding event context from both past and future actions, reflecting the user's sensitivity and seriousness towards each interaction. Optionally, additional context features Cattr can be concatenated with the BiRNN output. A feed-forward layer FY maps the context features to a probability distribution over the K temporal kernels, which is normalized via a softmax layer to obtain P(⋅∣C)∈RL×K. This distribution is used to mix the temporal scores from the β stage, producing a contextualized temporal influence βc=β⋅P(⋅∣C). The final contextualized attention score γ is obtained by element-wise multiplication of the content score α and the contextualized temporal score βc, followed by a softmax normalization to ensure the weights sum to one. This weighted sum is then used to compute the predicted item representation x^L+1, which is projected to the output embedding space and used to compute similarity scores with all items for recommendation.
Experiment
The proposed Contextualized Temporal Attention model was evaluated on two large-scale user behavior datasets against diverse baselines with matched parameter counts to establish a fair comparative foundation. Overall performance and dataset-specific analyses validate that the model effectively bridges the gap between first-order transition and sequence popularity patterns, outperforming specialized architectures by integrating a three-stage weighing mechanism. Ablation studies and architectural tests further confirm that shared item embeddings, multi-kernel temporal modeling, and dynamic content scoring are critical components for robust performance. Finally, attention visualizations demonstrate that the network learns meaningful, non-linear temporal and contextual reweighting, ultimately proving that explicitly modeling contextualized temporal dynamics yields a highly accurate, interpretable, and efficient solution for sequential recommendation.
The authors evaluate their proposed Contextualized Temporal Attention Mechanism against a range of baseline models on two large datasets, demonstrating that their model outperforms all baselines in Recall@5 on both datasets. The results indicate that the model effectively captures sequential popularity patterns, particularly on the UserBehavior dataset, while showing limitations in modeling first-order transition patterns. The performance improvements are attributed to the model's ability to weigh historical events based on content, temporal influence, and context, with ablation studies confirming the importance of each component. The proposed model significantly outperforms all baseline methods in Recall@5 on both datasets, indicating strong effectiveness in sequential recommendation. The model's performance is particularly strong on the UserBehavior dataset, which exhibits sequence popularity patterns, and it captures long-term dependencies better than recurrent models. Ablation studies confirm that the model's three-stage weighing mechanism, incorporating content, temporal influence, and context, is crucial for its improved performance over baselines.
The authors evaluate their proposed Contextualized Temporal Attention Mechanism (CTA) against a range of baseline models on two datasets, demonstrating that CTA outperforms all baselines in Recall@5 on both datasets. The model shows strong performance on the UserBehavior dataset, where sequential popularity patterns dominate, but weaker results on the XING dataset, where first-order transition patterns are more prevalent. The ablation study reveals that the model's effectiveness relies on its three-stage weighing mechanism, with the temporal influence component being particularly crucial for capturing long-term dependencies. CTA achieves the highest Recall@5 on both datasets, outperforming all baselines. The model's performance is stronger on the UserBehavior dataset, where it captures sequential popularity, compared to the XING dataset, where first-order transitions dominate. The ablation study confirms that the temporal influence component is critical for the model's effectiveness, especially in capturing long-term dependencies.
The authors conduct an ablation study to analyze the impact of various components in their proposed model, focusing on different architectural choices and their effects on performance across two datasets. The results show that the model's performance varies significantly with changes in window size, loss function, attention settings, and the use of different temporal kernels, indicating that the design choices are task-dependent and require careful tuning. The model demonstrates robustness in capturing contextual and temporal influences, with certain configurations leading to notable improvements in ranking metrics. The model's performance is sensitive to the choice of window size, with optimal settings varying between datasets based on the underlying behavioral patterns. Using a ranking-based loss function and specific temporal kernel combinations leads to significant improvements in recommendation quality. The model effectively captures contextual and temporal influences, with the combined importance score being a dominant factor in determining event relevance.
The proposed Contextualized Temporal Attention Mechanism was evaluated against multiple baselines on two datasets to assess its overall effectiveness in sequential recommendation. The experiments demonstrate that the model successfully captures long-term dependencies and sequential popularity patterns, though it shows limitations when first-order transition patterns dominate. Ablation studies validate that a three-stage weighing mechanism combining content, context, and temporal influence is essential, with temporal weighting proving particularly crucial for performance. Additionally, the results indicate that model effectiveness is highly sensitive to architectural choices, highlighting the importance of dataset-specific tuning for window sizes and loss functions.