UniDet3D: Multi-dataset Indoor 3D Object Detection

Growing customer demand for smart solutions in robotics and augmented realityhas attracted considerable attention to 3D object detection from point clouds.Yet, existing indoor datasets taken individually are too small andinsufficiently diverse to train a powerful and general 3D object detectionmodel. In the meantime, more general approaches utilizing foundation models arestill inferior in quality to those based on supervised training for a specifictask. In this work, we propose , a simple yet effective 3D objectdetection model, which is trained on a mixture of indoor datasets and iscapable of working in various indoor environments. By unifying different labelspaces, enables learning a strong representation across multipledatasets through a supervised joint training scheme. The proposed networkarchitecture is built upon a vanilla transformer encoder, making it easy torun, customize and extend the prediction pipeline for practical use. Extensiveexperiments demonstrate that obtains significant gains over existing 3Dobject detection methods in 6 indoor benchmarks: ScanNet (+1.1 mAP50),ARKitScenes (+19.4 mAP25), S3DIS (+9.1 mAP50), MultiScan (+9.3 mAP50), 3RScan(+3.2 mAP50), and ScanNet++ (+2.7 mAP50). Code is available athttps://github.com/filapro/unidet3d .