HyperAI

PixelHacker: Image Inpainting with Structural and Semantic Consistency

Ziyang Xu, Kangsheng Duan, Xiaolei Shen, Zhifeng Ding, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang
Veröffentlichungsdatum: 5/7/2025
PixelHacker: Image Inpainting with Structural and Semantic Consistency
Abstract

Image inpainting is a fundamental research area between image editing andimage generation. Recent state-of-the-art (SOTA) methods have explored novelattention mechanisms, lightweight architectures, and context-aware modeling,demonstrating impressive performance. However, they often struggle with complexstructure (e.g., texture, shape, spatial relations) and semantics (e.g., colorconsistency, object restoration, and logical correctness), leading to artifactsand inappropriate generation. To address this challenge, we design a simple yeteffective inpainting paradigm called latent categories guidance, and furtherpropose a diffusion-based model named PixelHacker. Specifically, we firstconstruct a large dataset containing 14 million image-mask pairs by annotatingforeground and background (potential 116 and 21 categories, respectively).Then, we encode potential foreground and background representations separatelythrough two fixed-size embeddings, and intermittently inject these featuresinto the denoising process via linear attention. Finally, by pre-training onour dataset and fine-tuning on open-source benchmarks, we obtain PixelHacker.Extensive experiments show that PixelHacker comprehensively outperforms theSOTA on a wide range of datasets (Places2, CelebA-HQ, and FFHQ) and exhibitsremarkable consistency in both structure and semantics. Project page athttps://hustvl.github.io/PixelHacker.