8 months ago

3D Machine Vision

Multimodal Representation

Sayan Deb Sarkar Ondrej Miksik Marc Pollefeys Daniel Barath Iro Armeni

Abstract

Building 3D scene graphs has recently emerged as a topic in scenerepresentation for several embodied AI applications to represent the world in astructured and rich manner. With their increased use in solving downstreamtasks (eg, navigation and room rearrangement), can we leverage and recycle themfor creating 3D maps of environments, a pivotal step in agent operation? Wefocus on the fundamental problem of aligning pairs of 3D scene graphs whoseoverlap can range from zero to partial and can contain arbitrary changes. Wepropose SGAligner, the first method for aligning pairs of 3D scene graphs thatis robust to in-the-wild scenarios (ie, unknown overlap -- if any -- andchanges in the environment). We get inspired by multi-modality knowledge graphsand use contrastive learning to learn a joint, multi-modal embedding space. Weevaluate on the 3RScan dataset and further showcase that our method can be usedfor estimating the transformation between pairs of 3D scenes. Since benchmarksfor these tasks are missing, we create them on this dataset. The code,benchmark, and trained models are available on the project website.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

3D Machine Vision

Multimodal Representation

Sayan Deb Sarkar Ondrej Miksik Marc Pollefeys Daniel Barath Iro Armeni

Abstract

Building 3D scene graphs has recently emerged as a topic in scenerepresentation for several embodied AI applications to represent the world in astructured and rich manner. With their increased use in solving downstreamtasks (eg, navigation and room rearrangement), can we leverage and recycle themfor creating 3D maps of environments, a pivotal step in agent operation? Wefocus on the fundamental problem of aligning pairs of 3D scene graphs whoseoverlap can range from zero to partial and can contain arbitrary changes. Wepropose SGAligner, the first method for aligning pairs of 3D scene graphs thatis robust to in-the-wild scenarios (ie, unknown overlap -- if any -- andchanges in the environment). We get inspired by multi-modality knowledge graphsand use contrastive learning to learn a joint, multi-modal embedding space. Weevaluate on the 3RScan dataset and further showcase that our method can be usedfor estimating the transformation between pairs of 3D scenes. Since benchmarksfor these tasks are missing, we create them on this dataset. The code,benchmark, and trained models are available on the project website.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp