HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Peize Sun Yi Jiang Shoufa Chen Shilong Zhang Bingyue Peng Ping Luo Zehuan Yuan

Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation

Abstract

We introduce LlamaGen, a new family of image generation models that applyoriginal ``next-token prediction'' paradigm of large language models to visualgeneration domain. It is an affirmative answer to whether vanillaautoregressive models, e.g., Llama, without inductive biases on visual signalscan achieve state-of-the-art image generation performance if scaling properly.We reexamine design spaces of image tokenizers, scalability properties of imagegeneration models, and their training data quality. The outcome of thisexploration consists of: (1) An image tokenizer with downsample ratio of 16,reconstruction quality of 0.94 rFID and codebook usage of 97% on ImageNetbenchmark. (2) A series of class-conditional image generation models rangingfrom 111M to 3.1B parameters, achieving 2.18 FID on ImageNet 256x256benchmarks, outperforming the popular diffusion models such as LDM, DiT. (3) Atext-conditional image generation model with 775M parameters, from two-stagetraining on LAION-COCO and high aesthetics quality images, demonstratingcompetitive performance of visual quality and text alignment. (4) We verify theeffectiveness of LLM serving frameworks in optimizing the inference speed ofimage generation models and achieve 326% - 414% speedup. We release all modelsand codes to facilitate open-source community of visual generation andmultimodal foundation models.

Code Repositories

foundationvision/llamagen
Official
pytorch
Mentioned in GitHub
0606zt/panollama
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-generation-on-imagenet-256x256LlamaGen
FID: 2.18

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp