A new way to create realistic 3D shapes using generative AI
**Abstract:** MIT researchers have developed an improved technique to generate realistic 3D shapes using generative artificial intelligence (AI), specifically leveraging 2D image generation models. The core challenge in creating high-quality 3D models has been the lack of sufficient 3D data for training AI models, leading to subpar 3D outputs that often appear blurry or cartoonish. To address this, the team focused on the Score Distillation Sampling (SDS) method, which combines 2D images into a 3D representation. They identified a key issue in SDS: a mismatch between the noise addition and removal formula used in 3D shape generation and its counterpart in 2D models. By refining this formula and using an approximation technique that infers the missing noise term from the current 3D rendering, the researchers were able to produce sharp and realistic 3D shapes without the need for additional training or complex postprocessing. This advancement not only improves the quality of 3D models but also enhances the mathematical understanding of SDS and related techniques, paving the way for more efficient and higher-quality 3D shape generation in the future. The technique could serve as a valuable tool for designers, making the creation of realistic 3D shapes more accessible and streamlined. The research, funded by various organizations including the Toyota–CSAIL Joint Research Center and the U.S. National Science Foundation, will be presented at the Conference on Neural Information Processing Systems. **Key Events:** - MIT researchers identify and solve a key issue in Score Distillation Sampling (SDS), a technique for generating 3D shapes using 2D image generation models. - The team's method produces high-quality, realistic 3D shapes without the need for additional training or complex postprocessing. - The research enhances the mathematical understanding of SDS and related techniques, opening avenues for further improvements in 3D shape generation. **Key People:** - Artem Lukoianov: Lead author and EECS graduate student at MIT. - Haitz Sáez de Ocáriz Borde: Graduate student at Oxford University. - Kristjan Greenewald: Research scientist in the MIT-IBM Watson AI Lab. - Vitor Campagnolo Guizilini: Scientist at the Toyota Research Institute. - Timur Bagautdinov: Research scientist at Meta. - Vincent Sitzmann: Assistant professor of EECS at MIT and leader of the Scene Representation Group in CSAIL. - Justin Solomon: Associate professor of EECS at MIT and leader of the CSAIL Geometric Data Processing Group. **Key Locations:** - Massachusetts Institute of Technology (MIT): Location of the research and primary affiliation of the lead authors. - Oxford University: Affiliation of co-author Haitz Sáez de Ocáriz Borde. - IBM and Meta: Companies with research scientists contributing to the study. - Toyota Research Institute: Institution where one of the co-authors is a scientist. **Time Elements:** - The research was conducted recently, with the findings to be presented at the Conference on Neural Information Processing Systems. - SDS was developed in 2022, but the MIT team's improvements were made subsequently. **Summary:** Creating realistic 3D models for various applications, such as virtual reality, filmmaking, and engineering design, has traditionally been a labor-intensive process involving significant manual effort. While generative AI models have revolutionized 2D image creation by allowing users to generate lifelike images from text prompts, these models are not inherently designed to produce 3D shapes. To overcome this limitation, researchers at MIT have developed a method to enhance the quality of 3D shapes generated using a technique called Score Distillation Sampling (SDS). SDS, introduced in 2022, utilizes pretrained 2D image diffusion models to combine multiple 2D views into a 3D representation. However, the 3D shapes produced by SDS often suffer from blurriness or a cartoonish appearance due to a mismatch in the noise addition and removal formula used in the process. MIT researchers, led by EECS graduate student Artem Lukoianov, identified this mismatch and proposed a simple yet effective fix. Instead of randomly sampling the noise term, their method infers it from the current 3D rendering, resulting in sharp and realistic 3D shapes. The team's approach also includes increasing the resolution of the image rendering and adjusting model parameters to further enhance the quality of the 3D outputs. This method leverages an off-the-shelf, pretrained diffusion model, avoiding the need for costly retraining or complex postprocessing. The researchers' work not only improves the quality of 3D models but also deepens the mathematical understanding of SDS and related techniques, which could lead to more efficient and higher-quality 3D shape generation in the future. The potential applications of this technique are vast, particularly in fields that require realistic 3D models. By making the process of creating such models more accessible and streamlined, the MIT researchers' method could serve as a valuable co-pilot for designers, reducing the time and effort needed to produce high-quality 3D shapes. The research, funded by multiple organizations including the Toyota–CSAIL Joint Research Center and the U.S. National Science Foundation, will be presented at the Conference on Neural Information Processing Systems. This advancement marks a significant step forward in the integration of AI into 3D modeling processes, promising to transform how designers and engineers work with 3D shapes.
