IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks

Image segmentation is a vital task for providing human assistance andenhancing autonomy in our daily lives. In particular, RGB-Dsegmentation-leveraging both visual and depth cues-has attracted increasingattention as it promises richer scene understanding than RGB-only methods.However, most existing efforts have primarily focused on semantic segmentationand thus leave a critical gap. There is a relative scarcity of instance-levelRGB-D segmentation datasets, which restricts current methods to broad categorydistinctions rather than fully capturing the fine-grained details required forrecognizing individual objects. To bridge this gap, we introduce three RGB-Dinstance segmentation benchmarks, distinguished at the instance level. Thesedatasets are versatile, supporting a wide range of applications from indoornavigation to robotic manipulation. In addition, we present an extensiveevaluation of various baseline models on these benchmarks. This comprehensiveanalysis identifies both their strengths and shortcomings, guiding future worktoward more robust, generalizable solutions. Finally, we propose a simple yeteffective method for RGB-D data integration. Extensive evaluations affirm theeffectiveness of our approach, offering a robust framework for advancing towardmore nuanced scene understanding.