منذ عام واحد

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

دليل تعليمي: استخدام LangChain مع vLLM

20 ساعة فقط من موارد حوسبة RTX 5090 $1 (قيمة $7)

جدول المحتويات

الملخص

العنوان: (غير متوفر)

الملخص: يُعد التوصية الشخصية قناة شائعة الاستخدام تتيح للمستخدمين استكشاف المعلومات أو العناصر المتوافقة مع اهتماماتهم. ومع ذلك، تعتمد نماذج التوصية السائدة بشكل رئيسي على المعرفات الفريدة (IDs) والسمات التصنيفية لمطابقة المستخدمين بالعناصر. وعلى الرغم من أن هذا النهج المتمحور حول المعرفات قد حقق نجاحاً كبيراً، إلا أنه يفتقر إلى الفهم الشامل لجوهر محتويات العناصر الخام عبر وسائط متعددة متنوعة، مثل النص والصوت والصورة ومقاطع الفيديو. إن هذا الاستغلال غير الكافي للبيانات متعددة الوسائط يشكل قيداً على أنظمة التوصية، لا سيما في مجال الخدمات متعددة الوسائط مثل منصات الأخبار والموسيقى ومقاطع الفيديو القصيرة. إن الارتفاع الأخير في تقنيات التدريب المسبق والتوليد يقدم كل من الفرص والتحديات في تطوير أنظمة التوصية متعددة الوسائط. تهدف هذه الورشة التعليمية إلى تقديم استكشاف شامل لأحدث التطورات والاتجاهات المستقبلية في تقنيات التدريب المسبق والتوليد متعددة الوسائط ضمن مجال أنظمة التوصية. تتألف الورشة التعليمية من ثلاثة أجزاء: التدريب المسبق متعدد الوسائط، والتوليد متعدد الوسائط، والتطبيقات الصناعية والتحديات المفتوحة في مجال التوصية. يمتد جمهورنا المستهدف ليشمل الباحثين والممارسين وأطرافاً أخرى مهتمة بهذا المجال.

One-sentence Summary

This tutorial surveys the transition from ID-centric recommendation models to multimodal pretraining and generation frameworks, detailing how text, image, audio, and video data address categorical feature limitations on news, music, and short-video platforms while systematically covering multimodal pretraining techniques, generation methods, and industrial applications alongside open research challenges.

Key Contributions

This tutorial systematically covers multimodal pretraining and generation techniques to overcome the limitations of conventional ID-based recommenders that fail to capture rich cross-modal item content. It establishes a structured framework that transitions from foundational pretraining methods to generation-based approaches for recommendation systems.
Unlike prior surveys that focus on general multimodal learning or introductory hands-on projects, this work specifically examines the practical adaptation and integration of pretrained multimodal models into recommendation pipelines. It details methodologies for the efficient and personalized adaptation of multimodal large language models to recommendation tasks.
The tutorial substantiates its framework with documented industrial deployment cases from platforms such as Alibaba, JD.com, Tencent, Baidu, Xiaohongshu, Pinterest, and Huawei. It also outlines critical open challenges in multimodal representation fusion, multi-domain pretraining, AIGC for recommendation, and standardized benchmarking.

Introduction

Personalized recommendation systems power content discovery across digital platforms, yet conventional architectures predominantly rely on user and item identifiers paired with categorical features. This ID-centric approach fails to capture the rich semantic information embedded in raw text, images, and audio, which severely limits performance in multimedia-driven applications like news and short-video platforms. The authors leverage recent advances in multimodal pretraining and generative AI to reframe how recommendation systems process cross-modal data. They systematically outline practical adaptation frameworks, detail emerging applications of AI-generated content for personalized recommendations, and distill real-world industrial deployments alongside critical research challenges.

Dataset

Dataset composition and sources: The authors do not provide dataset composition or source information in the submitted text, which only lists tutorial speakers and a session schedule.
Key details for each subset: No subset sizes, origins, or filtering rules are described in the material.
How the paper uses the data: The text does not specify training splits, mixture ratios, or data processing workflows. It instead outlines a tutorial agenda focused on multimodal pretraining and generation for recommendation.
Cropping strategy, metadata construction, or other processing details: The provided content contains no information regarding cropping strategies, metadata assembly, or any other preprocessing steps.

ملف PDF المصدر

جدول المحتويات

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

من الفكرة إلى الإطلاق — سرّع تطوير الذكاء الاصطناعي الخاص بك مع المساعدة البرمجية المجانية بالذكاء الاصطناعي، وبيئة جاهزة للاستخدام، وأفضل أسعار لوحدات معالجة الرسومات.

البرمجة التعاونية باستخدام الذكاء الاصطناعي

وحدات GPU جاهزة للعمل

أفضل الأسعار

ابدأ عرض الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا

سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين

مدعوم بواسطة MailChimp

HyperAI

شغّل هذا الـNotebook ناقش على Discord

منذ عام واحد

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

دليل تعليمي: استخدام LangChain مع vLLM

20 ساعة فقط من موارد حوسبة RTX 5090 $1 (قيمة $7)

الانتقال إلى دفتر

جدول المحتويات

الملخص

العنوان: (غير متوفر)

One-sentence Summary

Key Contributions

This tutorial systematically covers multimodal pretraining and generation techniques to overcome the limitations of conventional ID-based recommenders that fail to capture rich cross-modal item content. It establishes a structured framework that transitions from foundational pretraining methods to generation-based approaches for recommendation systems.
Unlike prior surveys that focus on general multimodal learning or introductory hands-on projects, this work specifically examines the practical adaptation and integration of pretrained multimodal models into recommendation pipelines. It details methodologies for the efficient and personalized adaptation of multimodal large language models to recommendation tasks.
The tutorial substantiates its framework with documented industrial deployment cases from platforms such as Alibaba, JD.com, Tencent, Baidu, Xiaohongshu, Pinterest, and Huawei. It also outlines critical open challenges in multimodal representation fusion, multi-domain pretraining, AIGC for recommendation, and standardized benchmarking.

Introduction

Dataset

Dataset composition and sources: The authors do not provide dataset composition or source information in the submitted text, which only lists tutorial speakers and a session schedule.
Key details for each subset: No subset sizes, origins, or filtering rules are described in the material.
How the paper uses the data: The text does not specify training splits, mixture ratios, or data processing workflows. It instead outlines a tutorial agenda focused on multimodal pretraining and generation for recommendation.
Cropping strategy, metadata construction, or other processing details: The provided content contains no information regarding cropping strategies, metadata assembly, or any other preprocessing steps.

ملف PDF المصدر

جدول المحتويات

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

البرمجة التعاونية باستخدام الذكاء الاصطناعي

وحدات GPU جاهزة للعمل

أفضل الأسعار

ابدأ عرض الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا

سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين

مدعوم بواسطة MailChimp

Command Palette

التدريب المسبق متعدد الوسائط والتوليد للتوصية: دليل تعليمي

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

دليل تعليمي: استخدام LangChain مع vLLM

الملخص

One-sentence Summary

Key Contributions

Introduction

Dataset

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters

Command Palette

التدريب المسبق متعدد الوسائط والتوليد للتوصية: دليل تعليمي

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

دليل تعليمي: استخدام LangChain مع vLLM

الملخص

One-sentence Summary

Key Contributions

Introduction

Dataset

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters

Command Palette

التدريب المسبق متعدد الوسائط والتوليد للتوصية: دليل تعليمي

Jieming Zhu Rui Zhang Chuhan Wu Zhenhua Dong

دليل تعليمي: استخدام LangChain مع vLLM

الملخص

One-sentence Summary

Key Contributions

Introduction

Dataset

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters