cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

Kolodiazhnyi, Maksim ; Tarasov, Denis ; Zhemchuzhnikov, Dmitrii ; Nikulin, Alexander ; Zisman, Ilya ; Vorontsova, Anna ; Konushin, Anton ; Kurenkov, Vladislav ; Rukhovich, Danila

Veröffentlichungsdatum: 6/1/2025

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement
Learning

Abstract

Computer-Aided Design (CAD) plays a central role in engineering andmanufacturing, making it possible to create precise and editable 3D models.Using a variety of sensor or user-provided data as inputs for CADreconstruction can democratize access to design applications. However, existingmethods typically focus on a single input modality, such as point clouds,images, or text, which limits their generalizability and robustness. Leveragingrecent advances in vision-language models (VLM), we propose a multi-modal CADreconstruction model that simultaneously processes all three input modalities.Inspired by large language model (LLM) training paradigms, we adopt a two-stagepipeline: supervised fine-tuning (SFT) on large-scale procedurally generateddata, followed by reinforcement learning (RL) fine-tuning using onlinefeedback, obtained programatically. Furthermore, we are the first to explore RLfine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms suchas Group Relative Preference Optimization (GRPO) outperform offlinealternatives. In the DeepCAD benchmark, our SFT model outperforms existingsingle-modal approaches in all three input modalities simultaneously. Moreimportantly, after RL fine-tuning, cadrille sets new state-of-the-art on threechallenging datasets, including a real-world one.

Details der Forschungsarbeit anzeigen