SGAligner : 3D Scene Alignment with Scene Graphs

Building 3D scene graphs has recently emerged as a topic in scenerepresentation for several embodied AI applications to represent the world in astructured and rich manner. With their increased use in solving downstreamtasks (eg, navigation and room rearrangement), can we leverage and recycle themfor creating 3D maps of environments, a pivotal step in agent operation? Wefocus on the fundamental problem of aligning pairs of 3D scene graphs whoseoverlap can range from zero to partial and can contain arbitrary changes. Wepropose SGAligner, the first method for aligning pairs of 3D scene graphs thatis robust to in-the-wild scenarios (ie, unknown overlap -- if any -- andchanges in the environment). We get inspired by multi-modality knowledge graphsand use contrastive learning to learn a joint, multi-modal embedding space. Weevaluate on the 3RScan dataset and further showcase that our method can be usedfor estimating the transformation between pairs of 3D scenes. Since benchmarksfor these tasks are missing, we create them on this dataset. The code,benchmark, and trained models are available on the project website.