8 months ago

Diffusion Model

Image Generation

Method/Architecture

Computer Vision

Zhen Li extsuperscript1,2∗ Mingdeng Cao extsuperscript2,3∗ Xintao Wang extsuperscript2 Zhongang Qi extsuperscript2 Ming-Ming Cheng extsuperscript1† Ying Shan extsuperscript2

Abstract

Recent advances in text-to-image generation have made remarkable progress insynthesizing realistic human photos conditioned on given text prompts. However,existing personalized generation methods cannot simultaneously satisfy therequirements of high efficiency, promising identity (ID) fidelity, and flexibletext controllability. In this work, we introduce PhotoMaker, an efficientpersonalized text-to-image generation method, which mainly encodes an arbitrarynumber of input ID images into a stack ID embedding for preserving IDinformation. Such an embedding, serving as a unified ID representation, can notonly encapsulate the characteristics of the same input ID comprehensively, butalso accommodate the characteristics of different IDs for subsequentintegration. This paves the way for more intriguing and practically valuableapplications. Besides, to drive the training of our PhotoMaker, we propose anID-oriented data construction pipeline to assemble the training data. Under thenourishment of the dataset constructed through the proposed pipeline, ourPhotoMaker demonstrates better ID preservation ability than test-timefine-tuning based methods, yet provides significant speed improvements,high-quality generation results, strong generalization capabilities, and a widerange of applications. Our project page is available athttps://photo-maker.github.io/

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Diffusion Model

Image Generation

Method/Architecture

Computer Vision

Zhen Li extsuperscript1,2∗ Mingdeng Cao extsuperscript2,3∗ Xintao Wang extsuperscript2 Zhongang Qi extsuperscript2 Ming-Ming Cheng extsuperscript1† Ying Shan extsuperscript2

Abstract

Recent advances in text-to-image generation have made remarkable progress insynthesizing realistic human photos conditioned on given text prompts. However,existing personalized generation methods cannot simultaneously satisfy therequirements of high efficiency, promising identity (ID) fidelity, and flexibletext controllability. In this work, we introduce PhotoMaker, an efficientpersonalized text-to-image generation method, which mainly encodes an arbitrarynumber of input ID images into a stack ID embedding for preserving IDinformation. Such an embedding, serving as a unified ID representation, can notonly encapsulate the characteristics of the same input ID comprehensively, butalso accommodate the characteristics of different IDs for subsequentintegration. This paves the way for more intriguing and practically valuableapplications. Besides, to drive the training of our PhotoMaker, we propose anID-oriented data construction pipeline to assemble the training data. Under thenourishment of the dataset constructed through the proposed pipeline, ourPhotoMaker demonstrates better ID preservation ability than test-timefine-tuning based methods, yet provides significant speed improvements,high-quality generation results, strong generalization capabilities, and a widerange of applications. Our project page is available athttps://photo-maker.github.io/

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp