HyperAI
15 days ago

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Zhuowei Chen, Bingchuan Li, Tianxiang Ma, Lijie Liu, Mingcong Liu, Yi Zhang, Gen Li, Xinghui Li, Siyu Zhou, Qian He, Xinglong Wu
Phantom-Data : Towards a General Subject-Consistent Video Generation
  Dataset
Abstract

Subject-to-video generation has witnessed substantial progress in recentyears. However, existing models still face significant challenges in faithfullyfollowing textual instructions. This limitation, commonly known as thecopy-paste problem, arises from the widely used in-pair training paradigm. Thisapproach inherently entangles subject identity with background and contextualattributes by sampling reference images from the same scene as the targetvideo. To address this issue, we introduce Phantom-Data, the firstgeneral-purpose cross-pair subject-to-video consistency dataset, containingapproximately one million identity-consistent pairs across diverse categories.Our dataset is constructed via a three-stage pipeline: (1) a general andinput-aligned subject detection module, (2) large-scale cross-context subjectretrieval from more than 53 million videos and 3 billion images, and (3)prior-guided identity verification to ensure visual consistency undercontextual variation. Comprehensive experiments show that training withPhantom-Data significantly improves prompt alignment and visual quality whilepreserving identity consistency on par with in-pair baselines.