2 months ago

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu

Abstract

Modern Latent Diffusion Models (LDMs) typically operate in low-level Variational Autoencoder (VAE) latent spaces that are primarily optimized for pixel-level reconstruction. To unify vision generation and understanding, a burgeoning trend is to adopt high-dimensional features from representation encoders as generative latents. However, we empirically identify two fundamental obstacles in this paradigm: (1) the discriminative feature space lacks compact regularization, making diffusion models prone to off-manifold latents that lead to inaccurate object structures; and (2) the encoder's inherently weak pixel-level reconstruction hinders the generator from learning accurate fine-grained geometry and texture. In this paper, we propose a systematic framework to adapt understanding-oriented encoder features for generative tasks. We introduce a semantic-pixel reconstruction objective to regularize the latent space, enabling the compression of both semantic information and fine-grained details into a highly compact representation (96 channels with 16x16 spatial downsampling). This design ensures that the latent space remains semantically rich and achieves state-of-the-art image reconstruction, while remaining compact enough for accurate generation. Leveraging this representation, we design a unified Text-to-Image (T2I) and image editing model. Benchmarking against various feature spaces, we demonstrate that our approach achieves state-of-the-art reconstruction, faster convergence, and substantial performance gains in both T2I and editing tasks, validating that representation encoders can be effectively adapted into robust generative components.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu4 more

Abstract

Build AI with AI

HyperAI Newsletters

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu

Shilong Zhang He Zhang Zhifei Zhang Chongjian Ge Shuchen Xue Shaoteng Liu Mengwei Ren Soo Ye Kim Yuqian Zhou Qing Liu