HyperAIHyperAI

Command Palette

Search for a command to run...

23 days ago

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Minghong Cai Qiulin Wang Zongli Ye Wenze Liu Quande Liu Weicai Ye Xintao Wang Pengfei Wan Kun Gai Xiangyu Yue

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal
  Patches via In-Context Conditioning

Abstract

We introduce the task of arbitrary spatio-temporal video completion, where avideo is generated from arbitrary, user-specified patches placed at any spatiallocation and timestamp, akin to painting on a video canvas. This flexibleformulation naturally unifies many existing controllable video generationtasks--including first-frame image-to-video, inpainting, extension, andinterpolation--under a single, cohesive paradigm. Realizing this vision,however, faces a fundamental obstacle in modern latent video diffusion models:the temporal ambiguity introduced by causal VAEs, where multiple pixel framesare compressed into a single latent representation, making precise frame-levelconditioning structurally difficult. We address this challenge withVideoCanvas, a novel framework that adapts the In-Context Conditioning (ICC)paradigm to this fine-grained control task with zero new parameters. We proposea hybrid conditioning strategy that decouples spatial and temporal control:spatial placement is handled via zero-padding, while temporal alignment isachieved through Temporal RoPE Interpolation, which assigns each condition acontinuous fractional position within the latent sequence. This resolves theVAE's temporal ambiguity and enables pixel-frame-aware control on a frozenbackbone. To evaluate this new capability, we develop VideoCanvasBench, thefirst benchmark for arbitrary spatio-temporal video completion, covering bothintra-scene fidelity and inter-scene creativity. Experiments demonstrate thatVideoCanvas significantly outperforms existing conditioning paradigms,establishing a new state of the art in flexible and unified video generation.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning | Papers | HyperAI