Command Palette
Search for a command to run...
HyTRec : Une architecture hybride à attention sensible au temps pour la recommandation séquentielle de longues séquences de comportements
HyTRec : Une architecture hybride à attention sensible au temps pour la recommandation séquentielle de longues séquences de comportements
Lei Xin Yuhao Zheng Ke Cheng Changjiang Jiang Zifan Zhang Fanhu Zeng
Résumé
La modélisation de séquences longues de comportements utilisateurs s’est imposée comme une frontière critique dans le domaine de la recommandation générative. Toutefois, les solutions existantes font face à un dilemme : les mécanismes d’attention linéaire offrent une efficacité au détriment de la précision de récupération, en raison d’une capacité d’état limitée, tandis que l’attention softmax subit un surcroît computationnel prohibitif. Pour relever ce défi, nous proposons HyTRec, un modèle reposant sur une architecture d’attention hybride qui décompose explicitement les préférences stables à long terme des pics d’intention à court terme. En affectant les séquences historiques massives à une branche d’attention linéaire et en réservant une branche spécialisée d’attention softmax aux interactions récentes, notre approche restaure des capacités de récupération précises dans des contextes industriels impliquant des milliers d’interactions. Afin de réduire le retard dans la capture des dérives rapides d’intérêt au sein des couches linéaires, nous avons également conçu le réseau Delta sensible au temps (TADN), qui dynamiquement amplifie les signaux comportementaux récents tout en supprimant efficacement le bruit historique. Les résultats expérimentaux sur des jeux de données à grande échelle industrielle confirment l’efficacité de notre modèle, qui préserve une vitesse d’inférence linéaire et surpasse significativement les modèles de référence, notamment en offrant une amélioration supérieure à 8 % du taux de frappe pour les utilisateurs possédant des séquences extrêmement longues, tout en maintenant une grande efficacité.
One-sentence Summary
Researchers from Dewu, Wuhan University, USTC, and Beihang propose HyTRec, a hybrid attention model that decouples long-term preferences from short-term intent using linear and softmax branches, enhanced by TADN for dynamic signal weighting, achieving 8%+ Hit Rate gains on ultra-long sequences with linear inference speed.
Key Contributions
- HyTRec introduces a Hybrid Attention architecture that assigns long-term historical sequences to linear attention and recent interactions to softmax attention, resolving the efficiency-precision trade-off while handling sequences of up to ten thousand interactions.
- The Temporal-Aware Delta Network (TADN) dynamically amplifies fresh behavioral signals using an exponential gating mechanism, reducing lag in capturing rapid interest drifts and suppressing noise from outdated interactions.
- Evaluated on industrial-scale e-commerce datasets, HyTRec achieves over 8% higher Hit Rate for users with ultra-long sequences while preserving linear inference speed, outperforming strong baselines in both efficiency and accuracy.
Introduction
The authors leverage the growing need to model ultra-long user behavior sequences in generative recommendation systems, where capturing both long-term preferences and short-term intent shifts is critical for accurate next-item prediction. Prior approaches face a trade-off: linear attention models scale efficiently but lose retrieval precision, while softmax attention preserves fidelity at prohibitive computational cost, and neither handles rapid interest drift well. HyTRec addresses this by introducing a hybrid attention architecture that routes long-term history through linear attention and recent interactions through softmax attention, preserving linear complexity while restoring precision. They further propose the Temporal-Aware Delta Network to dynamically emphasize fresh signals and suppress stale noise, improving responsiveness to intent changes. Empirically, HyTRec outperforms baselines by over 8% in Hit Rate for users with long sequences, without sacrificing inference speed.
Dataset

- The authors use publicly available datasets with no sensitive information, focusing solely on research with no commercial intent.
- Data is merged across partitions by user ID to reconstruct full behavior histories from registration, enabling richer long-sequence modeling.
- User behavior sequences are structured using ad attribution IDs as numerical identifiers, tracking states across funnel stages (click, redirect, activation, wake-up, product click, stay, collection, add-to-cart, payment).
- Behaviors are sorted by correlation strength with payment to build hierarchical long-cycle sequences, prioritizing signals most predictive of conversion.
- The dataset excludes or handles problematic cases: cold-start users (sparse/no interactions), inactive long-term users (old/limited records), and scalper accounts (high interaction volume but low business value) to avoid misleading model training.
- These filtering and structuring strategies ensure the training data supports efficient, accurate modeling of long user sequences without being skewed by edge cases.
Method
The authors leverage a dual-branch architecture to model long user behavior sequences, explicitly decoupling short-term intent spikes from long-term stable preferences. This stratification begins with sequence decomposition: the full historical sequence Su is split into a short-term subsequence Sushort of fixed length K, capturing recent behaviors, and a long-term subsequence Sulong of length n−K, encoding stable consumption patterns. Each subsequence is processed independently through dedicated branches before fusion.
The short-term branch employs standard multi-head self-attention (MHSA) to preserve fine-grained temporal dynamics and ensure high precision for recent interactions. In contrast, the long-term branch is built upon a novel hybrid attention architecture designed to break the O(n2) complexity bottleneck while retaining global context awareness. This branch consists of N encoder layers, predominantly using the proposed Temporal-Aware Delta Network (TADN) for linear complexity, with sparse interleaving of standard softmax attention layers (e.g., at a 7:1 ratio) to maintain retrieval fidelity.
Refer to the framework diagram for a visual representation of this dual-path processing and fusion mechanism.

At the core of the long-term branch is TADN, which introduces a Temporal-Aware Gating Mechanism to dynamically weight historical behaviors based on their temporal proximity to the target action. The temporal decay factor τt quantifies relevance:
τt=exp(−Ttcurrent−tbehaviort),where T is the decay period. This factor is fused with feature similarity to generate dynamic gating weights gt:
gt=α⋅[σ(Wg⋅Concat(ht,Δht)+b)⊙τt]+(1−α)⋅gstatic,where Δht=ht−hˉ represents short-term deviations, and gstatic encodes long-term preferences. The fused representation h~t is then computed as:
h~t=qt⊙Δht+(1−qt)⊙ht.This gating mechanism is integrated into a linear attention recurrence via the state update rule:
St=St−1(I−gtβtktkt⊤)+βtvtkt⊤,which expands into a linear attention formulation with a temporal-aware decay mask D(t,i):
ot=i=1∑tβi(viki⊤)qt⋅D(t,i),where D(t,i)=∏j=i+1t(I−gjβjkjkj⊤). The inclusion of τj in gj ensures that recent interactions are prioritized while long-term patterns are preserved.
The outputs of both branches are fused and passed through a linear layer and softmax to generate the final recommendation prediction. This hybrid design enables efficient processing of long sequences without sacrificing the semantic richness required for accurate intent modeling.
Experiment
- HyTRec outperforms state-of-the-art baselines in modeling long user behavior sequences, achieving top or near-top performance across multiple datasets, particularly excelling in capturing user interests and adapting to sparse or scattered behavioral patterns.
- The model scales efficiently to ultra-long sequences due to its linear attention mechanism, maintaining high throughput even at sequence lengths up to 12k, while transformer-based models suffer sharp efficiency drops beyond 1k.
- Ablation studies confirm that both the TADN branch (for long-term dependencies) and the short-term attention branch (for immediate interest drift) are essential, with their combination yielding the best overall performance through complementary modeling.
- A 3:1 hybrid attention ratio between linear and short-term attention provides the optimal balance between recommendation accuracy and inference efficiency, outperforming other ratios in both performance and latency trade-offs.
- HyTRec demonstrates robustness in cold-start and sparse interaction scenarios by leveraging similar user behavior patterns, showing strong generalization and adaptability to challenging user cases.
- Additional parameter studies indicate that 2 attention heads and 4 experts deliver optimal performance-efficiency trade-offs, aligning with user group heterogeneity and computational constraints.
- The model also shows strong cross-domain transfer capability, suggesting structural advantages in handling domain shifts and motivating future work on broader generalization and noise resilience.
The authors use HyTRec to address long user behavior modeling by combining linear and short-term attention mechanisms, achieving competitive or superior performance across multiple datasets compared to transformer and linear attention baselines. Results show that HyTRec maintains high efficiency at ultra-long sequence lengths while delivering strong recommendation accuracy, particularly in capturing both long-term patterns and short-term interest drift. Ablation and ratio studies confirm that the hybrid architecture’s dual-branch design and 3:1 attention ratio offer the best balance between performance and computational cost.

The authors use ablation experiments to isolate the contributions of HyTRec’s two key components: the TADN branch for long-term sequence modeling and the short-term attention branch for capturing immediate interest drift. Results show that each component independently improves performance over the baseline, but their combination yields the highest scores across all metrics, confirming their complementary roles. This design enables the model to effectively balance long-range dependency modeling with responsiveness to recent user behavior.

The authors evaluate the impact of varying the number of experts in their recommendation model and find that performance peaks at 4 experts, with both recommendation accuracy and inference efficiency declining as the number increases to 6 or 8. Results show that higher expert counts introduce unnecessary computational overhead without meaningful gains in key metrics like H@500, NDCG@500, or AUC. This supports the design choice of aligning expert count with user group heterogeneity for optimal balance between effectiveness and efficiency.

The authors evaluate how different hybrid attention ratios affect recommendation accuracy and inference latency, finding that a 3:1 ratio delivers the best balance between performance gains and computational cost. While higher ratios like 6:1 achieve marginally better metrics, they incur significantly increased latency, making them less practical. Results confirm that moderate hybrid configurations optimize both effectiveness and efficiency in long-sequence modeling.

The authors evaluate HyTRec on challenging user scenarios including cold-start new users and silent old users, showing that the model maintains strong retrieval and ranking performance across both cases. Results indicate consistent AUC scores above 0.86 and competitive H@500 values, demonstrating the model’s robustness in handling sparse interaction data through effective user similarity augmentation and generalization.
