2 months ago

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang

Abstract

Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despite this potential, training large-scale text-to-image diffusion models entirely within the VFM representation space remains largely unexplored. To bridge this gap, we scale the SVG (Self-supervised representations for Visual Generation) framework, proposing SVG-T2I to support high-quality text-to-image synthesis directly in the VFM feature domain. By leveraging a standard text-to-image diffusion pipeline, SVG-T2I achieves competitive performance, reaching 0.75 on GenEval and 85.78 on DPG-Bench. This performance validates the intrinsic representational power of VFMs for generative tasks. We fully open-source the project, including the autoencoder and generation model, together with their training, inference, evaluation pipelines, and pre-trained weights, to facilitate further research in representation-driven visual generation.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang4 more

Abstract

Build AI with AI

HyperAI Newsletters

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang

Minglei Shi Haolin Wang Borui Zhang Wenzhao Zheng Bohan Zeng Ziyang Yuan Xiaoshi Wu Yuanxing Zhang Huan Yang Xintao Wang