PlayerOne: Egocentric World Simulator

We introduce PlayerOne, the first egocentric realistic world simulator,facilitating immersive and unrestricted exploration within vividly dynamicenvironments. Given an egocentric scene image from the user, PlayerOne canaccurately construct the corresponding world and generate egocentric videosthat are strictly aligned with the real scene human motion of the user capturedby an exocentric camera. PlayerOne is trained in a coarse-to-fine pipeline thatfirst performs pretraining on large-scale egocentric text-video pairs forcoarse-level egocentric understanding, followed by finetuning on synchronousmotion-video data extracted from egocentric-exocentric video datasets with ourautomatic construction pipeline. Besides, considering the varying importance ofdifferent components, we design a part-disentangled motion injection scheme,enabling precise control of part-level movements. In addition, we devise ajoint reconstruction framework that progressively models both the 4D scene andvideo frames, ensuring scene consistency in the long-form video generation.Experimental results demonstrate its great generalization ability in precisecontrol of varying human movements and worldconsistent modeling of diversescenarios. It marks the first endeavor into egocentric real-world simulationand can pave the way for the community to delve into fresh frontiers of worldmodeling and its diverse applications.