a year ago

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

Tutorial: Using LangChain with vLLM

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)

Table of Contents

Abstract

Personalized recommendation stands as a ubiquitous channel for users to explore information or items aligned with their interests. Nevertheless, prevailing recommendation models predominantly rely on unique IDs and categorical features for user-item matching. While this ID-centric approach has witnessed considerable success, it falls short in comprehensively grasping the essence of raw item contents across diverse modalities, such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, particularly in the realm of multimedia services like news, music, and short-video platforms. The recent surge in pretraining and generation techniques presents both opportunities and challenges in the development of multimodal recommender systems. This tutorial seeks to provide a thorough exploration of the latest advancements and future trajectories in multimodal pretraining and generation techniques within the realm of recommender systems. The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications and open challenges in the field of recommendation. Our target audience encompasses scholars, practitioners, and other parties interested in this domain.

One-sentence Summary

This tutorial surveys the transition from ID-centric recommendation models to multimodal pretraining and generation frameworks, detailing how text, image, audio, and video data address categorical feature limitations on news, music, and short-video platforms while systematically covering multimodal pretraining techniques, generation methods, and industrial applications alongside open research challenges.

Key Contributions

This tutorial systematically covers multimodal pretraining and generation techniques to overcome the limitations of conventional ID-based recommenders that fail to capture rich cross-modal item content. It establishes a structured framework that transitions from foundational pretraining methods to generation-based approaches for recommendation systems.
Unlike prior surveys that focus on general multimodal learning or introductory hands-on projects, this work specifically examines the practical adaptation and integration of pretrained multimodal models into recommendation pipelines. It details methodologies for the efficient and personalized adaptation of multimodal large language models to recommendation tasks.
The tutorial substantiates its framework with documented industrial deployment cases from platforms such as Alibaba, JD.com, Tencent, Baidu, Xiaohongshu, Pinterest, and Huawei. It also outlines critical open challenges in multimodal representation fusion, multi-domain pretraining, AIGC for recommendation, and standardized benchmarking.

Introduction

Personalized recommendation systems power content discovery across digital platforms, yet conventional architectures predominantly rely on user and item identifiers paired with categorical features. This ID-centric approach fails to capture the rich semantic information embedded in raw text, images, and audio, which severely limits performance in multimedia-driven applications like news and short-video platforms. The authors leverage recent advances in multimodal pretraining and generative AI to reframe how recommendation systems process cross-modal data. They systematically outline practical adaptation frameworks, detail emerging applications of AI-generated content for personalized recommendations, and distill real-world industrial deployments alongside critical research challenges.

Dataset

Dataset composition and sources: The authors do not provide dataset composition or source information in the submitted text, which only lists tutorial speakers and a session schedule.
Key details for each subset: No subset sizes, origins, or filtering rules are described in the material.
How the paper uses the data: The text does not specify training splits, mixture ratios, or data processing workflows. It instead outlines a tutorial agenda focused on multimodal pretraining and generation for recommendation.
Cropping strategy, metadata construction, or other processing details: The provided content contains no information regarding cropping strategies, metadata assembly, or any other preprocessing steps.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

a year ago

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

Tutorial: Using LangChain with vLLM

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)

Go to Notebook

Table of Contents

Abstract

One-sentence Summary

Key Contributions

This tutorial systematically covers multimodal pretraining and generation techniques to overcome the limitations of conventional ID-based recommenders that fail to capture rich cross-modal item content. It establishes a structured framework that transitions from foundational pretraining methods to generation-based approaches for recommendation systems.
Unlike prior surveys that focus on general multimodal learning or introductory hands-on projects, this work specifically examines the practical adaptation and integration of pretrained multimodal models into recommendation pipelines. It details methodologies for the efficient and personalized adaptation of multimodal large language models to recommendation tasks.
The tutorial substantiates its framework with documented industrial deployment cases from platforms such as Alibaba, JD.com, Tencent, Baidu, Xiaohongshu, Pinterest, and Huawei. It also outlines critical open challenges in multimodal representation fusion, multi-domain pretraining, AIGC for recommendation, and standardized benchmarking.

Introduction

Dataset

Dataset composition and sources: The authors do not provide dataset composition or source information in the submitted text, which only lists tutorial speakers and a session schedule.
Key details for each subset: No subset sizes, origins, or filtering rules are described in the material.
How the paper uses the data: The text does not specify training splits, mixture ratios, or data processing workflows. It instead outlines a tutorial agenda focused on multimodal pretraining and generation for recommendation.
Cropping strategy, metadata construction, or other processing details: The provided content contains no information regarding cropping strategies, metadata assembly, or any other preprocessing steps.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Multimodal Pretraining and Generation for Recommendation: A Tutorial

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

Tutorial: Using LangChain with vLLM

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Multimodal Pretraining and Generation for Recommendation: A Tutorial

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

Tutorial: Using LangChain with vLLM

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Multimodal Pretraining and Generation for Recommendation: A Tutorial

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

Tutorial: Using LangChain with vLLM

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Build AI with AI

HyperAI Newsletters