HyperAI超神経

Towards Understanding Camera Motions in Any Video

Zhiqiu Lin, Siyuan Cen, Daniel Jiang, Jay Karhade, Hewei Wang, Chancharik Mitra, Tiffany Ling, Yuhan Huang, Sifan Liu, Mingyu Chen, Rushikesh Zawar, Xue Bai, Yilun Du, Chuang Gan, Deva Ramanan
公開日: 4/28/2025
Towards Understanding Camera Motions in Any Video
要約

We introduce CameraBench, a large-scale dataset and benchmark designed toassess and improve camera motion understanding. CameraBench consists of ~3,000diverse internet videos, annotated by experts through a rigorous multi-stagequality control process. One of our contributions is a taxonomy of cameramotion primitives, designed in collaboration with cinematographers. We find,for example, that some motions like "follow" (or tracking) requireunderstanding scene content like moving subjects. We conduct a large-scalehuman study to quantify human annotation performance, revealing that domainexpertise and tutorial-based training can significantly enhance accuracy. Forexample, a novice may confuse zoom-in (a change of intrinsics) with translatingforward (a change of extrinsics), but can be trained to differentiate the two.Using CameraBench, we evaluate Structure-from-Motion (SfM) and Video-LanguageModels (VLMs), finding that SfM models struggle to capture semantic primitivesthat depend on scene content, while VLMs struggle to capture geometricprimitives that require precise estimation of trajectories. We then fine-tune agenerative VLM on CameraBench to achieve the best of both worlds and showcaseits applications, including motion-augmented captioning, video questionanswering, and video-text retrieval. We hope our taxonomy, benchmark, andtutorials will drive future efforts towards the ultimate goal of understandingcamera motions in any video.