2 days ago

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

Yanxu Meng, Haoning Wu, Ya Zhang, Weidi Xie

Abstract

3D content generation has recently attracted significant research interestdue to its applications in VR/AR and embodied AI. In this work, we address thechallenging task of synthesizing multiple 3D assets within a single sceneimage. Concretely, our contributions are fourfold: (i) we present SceneGen, anovel framework that takes a scene image and corresponding object masks asinput, simultaneously producing multiple 3D assets with geometry and texture.Notably, SceneGen operates with no need for optimization or asset retrieval;(ii) we introduce a novel feature aggregation module that integrates local andglobal scene information from visual and geometric encoders within the featureextraction module. Coupled with a position head, this enables the generation of3D assets and their relative spatial positions in a single feedforward pass;(iii) we demonstrate SceneGen's direct extensibility to multi-image inputscenarios. Despite being trained solely on single-image inputs, ourarchitectural design enables improved generation performance with multi-imageinputs; and (iv) extensive quantitative and qualitative evaluations confirm theefficiency and robust generation abilities of our approach. We believe thisparadigm offers a novel solution for high-quality 3D content generation,potentially advancing its practical applications in downstream tasks. The codeand model will be publicly available at: https://mengmouxu.github.io/SceneGen.