PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Recent advances in text-to-image generation have made remarkable progress insynthesizing realistic human photos conditioned on given text prompts. However,existing personalized generation methods cannot simultaneously satisfy therequirements of high efficiency, promising identity (ID) fidelity, and flexibletext controllability. In this work, we introduce PhotoMaker, an efficientpersonalized text-to-image generation method, which mainly encodes an arbitrarynumber of input ID images into a stack ID embedding for preserving IDinformation. Such an embedding, serving as a unified ID representation, can notonly encapsulate the characteristics of the same input ID comprehensively, butalso accommodate the characteristics of different IDs for subsequentintegration. This paves the way for more intriguing and practically valuableapplications. Besides, to drive the training of our PhotoMaker, we propose anID-oriented data construction pipeline to assemble the training data. Under thenourishment of the dataset constructed through the proposed pipeline, ourPhotoMaker demonstrates better ID preservation ability than test-timefine-tuning based methods, yet provides significant speed improvements,high-quality generation results, strong generalization capabilities, and a widerange of applications. Our project page is available athttps://photo-maker.github.io/