Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, Lu Jiang

발행일: 6/12/2025

Autoregressive Adversarial Post-Training for Real-Time Interactive Video
Generation

초록

Existing large-scale video generation models are computationally intensive,preventing adoption in real-time and interactive applications. In this work, wepropose autoregressive adversarial post-training (AAPT) to transform apre-trained latent video diffusion model into a real-time, interactive videogenerator. Our model autoregressively generates a latent frame at a time usinga single neural function evaluation (1NFE). The model can stream the result tothe user in real time and receive interactive responses as controls to generatethe next latent frame. Unlike existing approaches, our method exploresadversarial training as an effective paradigm for autoregressive generation.This not only allows us to design an architecture that is more efficient forone-step generation while fully utilizing the KV cache, but also enablestraining the model in a student-forcing manner that proves to be effective inreducing error accumulation during long video generation. Our experimentsdemonstrate that our 8B model achieves real-time, 24fps, streaming videogeneration at 736x416 resolution on a single H100, or 1280x720 on 8xH100 up toa minute long (1440 frames). Visit our research website athttps://seaweed-apt.com/2

논문 세부 정보 보기