HyperAIHyperAI

Command Palette

Search for a command to run...

Towards Learning a Generalist Model for Embodied Navigation

Duo Zheng Shijia Huang Lin Zhao Yiwu Zhong Liwei Wang

Abstract

Building a generalist agent that can interact with the world is theintriguing target of AI systems, thus spurring the research for embodiednavigation, where an agent is required to navigate according to instructions orrespond to queries. Despite the major progress attained, previous worksprimarily focus on task-specific agents and lack generalizability to unseenscenarios. Recently, LLMs have presented remarkable capabilities across variousfields, and provided a promising opportunity for embodied navigation. Drawingon this, we propose the first generalist model for embodied navigation,NaviLLM. It adapts LLMs to embodied navigation by introducing schema-basedinstruction. The schema-based instruction flexibly casts various tasks intogeneration problems, thereby unifying a wide range of tasks. This approachallows us to integrate diverse data sources from various datasets into thetraining, equipping NaviLLM with a wide range of capabilities required byembodied navigation. We conduct extensive experiments to evaluate theperformance and generalizability of our model. The experimental resultsdemonstrate that our unified model achieves state-of-the-art performance onCVDN, SOON, and ScanQA. Specifically, it surpasses the previousstats-of-the-art method by a significant margin of 29% in goal progress onCVDN. Moreover, our model also demonstrates strong generalizability andpresents impressive results on unseen tasks, e.g., embodied question answeringand 3D captioning.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp