HyperAIHyperAI

Command Palette

Search for a command to run...

Randomized Autoregressive Visual Generation

Qihang Yu Ju He Xueqing Deng Xiaohui Shen Liang-Chieh Chen

Abstract

This paper presents Randomized AutoRegressive modeling (RAR) for visualgeneration, which sets a new state-of-the-art performance on the imagegeneration task while maintaining full compatibility with language modelingframeworks. The proposed RAR is simple: during a standard autoregressivetraining process with a next-token prediction objective, the inputsequence-typically ordered in raster form-is randomly permuted into differentfactorization orders with a probability r, where r starts at 1 and linearlydecays to 0 over the course of training. This annealing training strategyenables the model to learn to maximize the expected likelihood over allfactorization orders and thus effectively improve the model's capability ofmodeling bidirectional contexts. Importantly, RAR preserves the integrity ofthe autoregressive modeling framework, ensuring full compatibility withlanguage modeling while significantly improving performance in imagegeneration. On the ImageNet-256 benchmark, RAR achieves an FID score of 1.48,not only surpassing prior state-of-the-art autoregressive image generators butalso outperforming leading diffusion-based and masked transformer-basedmethods. Code and models will be made available athttps://github.com/bytedance/1d-tokenizer


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Randomized Autoregressive Visual Generation | Papers | HyperAI