SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Recent advances in diffusion-based video restoration (VR) demonstratesignificant improvement in visual quality, yet yield a prohibitivecomputational cost during inference. While several distillation-basedapproaches have exhibited the potential of one-step image restoration,extending existing approaches to VR remains challenging and underexplored,particularly when dealing with high-resolution video in real-world settings. Inthis work, we propose a one-step diffusion-based VR model, termed as SeedVR2,which performs adversarial VR training against real data. To handle thechallenging high-resolution VR within a single step, we introduce severalenhancements to both model architecture and training procedures. Specifically,an adaptive window attention mechanism is proposed, where the window size isdynamically adjusted to fit the output resolutions, avoiding windowinconsistency observed under high-resolution VR using window attention with apredefined window size. To stabilize and improve the adversarial post-trainingtowards VR, we further verify the effectiveness of a series of losses,including a proposed feature matching loss without significantly sacrificingtraining efficiency. Extensive experiments show that SeedVR2 can achievecomparable or even better performance compared with existing VR approaches in asingle step.