Kuaishou Unveils OneRec: An End-to-End Generative Recommendation Model Inspired by Large Language Models
OneRec is a groundbreaking recommendation model developed by Kuaishou, designed to simplify and enhance the video recommendation process. Unlike traditional recommendation algorithms that rely on a multi-stage pipeline including recall, coarse ranking, fine ranking, and re-ranking, OneRec is an end-to-end model that streamlines these stages into a single, unified process. This innovation is inspired by the success of large language models (LLMs), which have shown that with sufficient data and model size, they can achieve exceptional results. Kuaishou's OneRec leverages a similar approach, generating recommendations directly instead of recommending pre-selected items. Tokenizer The first step in OneRec's workflow is tokenization, which converts videos into semantic IDs. Given the vast number of items—often in the hundreds of millions—directly modeling with item IDs would result in sparse embeddings. To address this, OneRec uses a multi-step process. Initially, it inputs a video's captions, tags, ASR, OCR, cover image, and five uniformly sampled frames into a large model called miniCPM-V-8B, obtaining high-dimensional feature vectors. These vectors are then compressed using a lightweight model called QFormer, which not only retains essential information but also simplifies further processing. The compression results in a three-tiered "coarse-medium-fine" semantic code {s1m, s2m, s3m} for each video. This code captures both broad categories and fine-grained details, allowing a single server to efficiently handle hundreds of millions of videos. During the recommendation process, OneRec generates semantic IDs that match user interests and maps them back to specific videos. Encoder OneRec's encoder integrates four types of user-related features to build a comprehensive user profile: 1. Static User Features: These include user ID, age, and gender, each with its own embedding. 2. Short-Term Behavior Pathway: This pathway processes the most recent 20 user interactions, such as video views, likes, and comments, converting each interaction into an embedding. 3. Positive Feedback Behavior Pathway: This pathway handles sequences of user interactions indicating high engagement, with a maximum length of 256 interactions. 4. Lifecycle Pathway: This pathway processes extremely long historical behavior sequences, up to 100,000 entries. A QFormer is used to compress these sequences into a manageable form. The encoder combines these features, adds positional encoding, and feeds them into Transformer encoder layers. This multi-layer processing helps OneRec capture both short-term trends and long-term preferences, creating a robust foundation for personalized recommendations. Decoder The decoder in OneRec is similar to the Transformer decoder but includes a Mixture of Experts (MoE) structure for faster inference. It generates a sequence of semantic IDs step by step, starting from a BOS token. The semantic IDs represent the user's click sequence and typically include 5 to 10 videos. During the recommendation process, OneRec maps these semantic IDs to actual video IDs. If an ID cannot be mapped, it is considered invalid. To mitigate this, OneRec introduces a format reward mechanism to encourage the generation of valid semantic IDs. Reinforcement Learning To optimize recommendations for various business metrics, OneRec employs a reinforcement learning (RL) technique called ECPO (Early Clipped GRPO). This method integrates multiple feedbacks, such as clicks, likes, and viewing duration, into a "P-Score" using a small neural network. ECPO then optimizes the model's recommendations based on this score, ensuring that the recommendations align with broader business goals. OneRec's reinforcement learning approach builds on DeepSeek’s GRPO but addresses issues like gradient explosion by introducing early clipping. This technique prevents negative improvements from causing excessive compression of valid semantic IDs' generation probabilities, ensuring the model remains stable and effective. Training Process OneRec's training process involves: 1. Pre-training: The model is trained on user behavior representations, generating 18 billion samples per day, translating to 54 billion tokens in the decoder. 2. Post-training: This includes online training with real-time data, rejection sampling to filter out the bottom 50% of samples based on playback duration, and reinforcement learning to further refine recommendations. A 0.935 billion parameter OneRec model required about 100 billion samples to converge. Post-training techniques, such as rejection sampling and reinforcement learning, enhance the model's performance by filtering out less engaging content and optimizing for business metrics. Performance While the pure OneRec model shows modest improvements over traditional recommendation methods, the integration of the Reward Model (RM) significantly enhances its performance. The RM, similar to a fine-ranking model, plays a crucial role in this enhancement, suggesting that the generative model has not entirely eliminated the influence of traditional approaches. However, OneRec outperforms fine-ranking models in several aspects. In Kuaishou’s local lifestyle service scenario, OneRec achieved a 21.01% increase in GMV, a 17.89% growth in the number of orders, and an 18.58% rise in the number of purchasing users, with a notable 23.02% improvement in new customer acquisition efficiency. Evaluation by Industry Insiders Industry experts view OneRec as a significant advancement in recommendation systems, particularly for platforms with large user bases and complex recommendation requirements. The end-to-end nature of the model reduces the operational overhead and potential for errors associated with traditional multi-stage pipelines. While challenges remain, such as managing invalid semantic IDs and ensuring the model's stability, OneRec's innovative approach demonstrates the potential of large language models in personalized recommendation tasks. Companies like Kuaishou are at the forefront of this shift, leveraging advanced techniques to improve user experience and business outcomes. Company Profile Kuaishou is a leading Chinese short video platform and social media company. Known for its user-centric approach and advanced AI technologies, Kuaishou has been investing heavily in research and development to enhance its recommendation algorithms and user engagement. The development and successful deployment of OneRec highlight Kuaishou's commitment to innovation and its competitive stance in the rapidly evolving AI landscape.