From One to More: Contextual Part Latents for 3D Generation

Recent advances in 3D generation have transitioned from multi-view 2Drendering approaches to 3D-native latent diffusion frameworks that exploitgeometric priors in ground truth data. Despite progress, three key limitationspersist: (1) Single-latent representations fail to capture complex multi-partgeometries, causing detail degradation; (2) Holistic latent coding neglectspart independence and interrelationships critical for compositional design; (3)Global conditioning mechanisms lack fine-grained controllability. Inspired byhuman 3D design workflows, we propose CoPart - a part-aware diffusion frameworkthat decomposes 3D objects into contextual part latents for coherent multi-partgeneration. This paradigm offers three advantages: i) Reduces encodingcomplexity through part decomposition; ii) Enables explicit part relationshipmodeling; iii) Supports part-level conditioning. We further develop a mutualguidance strategy to fine-tune pre-trained diffusion models for joint partlatent denoising, ensuring both geometric coherence and foundation modelpriors. To enable large-scale training, we construct Partverse - a novel 3Dpart dataset derived from Objaverse through automated mesh segmentation andhuman-verified annotations. Extensive experiments demonstrate CoPart's superiorcapabilities in part-level editing, articulated object generation, and scenecomposition with unprecedented controllability.