HyperAIHyperAI

Command Palette

Search for a command to run...

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Sucheng Ren Qihang Yu Ju He Xiaohui Shen Alan Yuille Liang-Chieh Chen

Abstract

Autoregressive (AR) modeling, known for its next-token prediction paradigm,underpins state-of-the-art language and visual generative models.Traditionally, a ``token'' is treated as the smallest prediction unit, often adiscrete symbol in language or a quantized patch in vision. However, theoptimal token definition for 2D image structures remains an open question.Moreover, AR models suffer from exposure bias, where teacher forcing duringtraining leads to error accumulation at inference. In this paper, we proposexAR, a generalized AR framework that extends the notion of a token to an entityX, which can represent an individual patch token, a cell (a ktimes kgrouping of neighboring patches), a subsample (a non-local grouping of distantpatches), a scale (coarse-to-fine resolution), or even a whole image.Additionally, we reformulate discrete token classification ascontinuous entity regression, leveraging flow-matching methods at eachAR step. This approach conditions training on noisy entities instead of groundtruth tokens, leading to Noisy Context Learning, which effectively alleviatesexposure bias. As a result, xAR offers two key advantages: (1) it enablesflexible prediction units that capture different contextual granularity andspatial structures, and (2) it mitigates exposure bias by avoiding reliance onteacher forcing. On ImageNet-256 generation benchmark, our base model, xAR-B(172M), outperforms DiT-XL/SiT-XL (675M) while achieving 20times fasterinference. Meanwhile, xAR-H sets a new state-of-the-art with an FID of 1.24,running 2.2times faster than the previous best-performing model withoutrelying on vision foundation modules (eg, DINOv2) or advanced guidanceinterval sampling.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | Papers | HyperAI