7 months ago

He Wang Zhizheng Zhang Qiyu Dai Songlin Wei Jiazhao Zhang Xiaomeng Fang Chaoyi Xu Haoran Geng Yufei Ding

Abstract

In this work, we propel the pioneer construction of the benchmark and approach for table-top Open-instruction 6-DoF Object Rearrangement (Open6DOR). Specifically, we collect a synthetic dataset of 200+ objects and carefully design 2400+ Open6DOR tasks. These tasks are divided into the Position-track, Rotation-track, and 6-DoF-track for evaluating different embodied agents in predicting the positions and rotations of target objects. Besides, we also propose a VLM-based approach for Open6DOR, named Open6DOR-GPT, which empowers GPT-4V with 3D-awareness and simulation-assistance while exploiting its strengths in generalizability and instruction-following for this task. We compare the existing embodied agents with our Open6DOR-GPT on the proposed Open6DOR benchmark and find that Open6DOR-GPT achieves the state-of-the-art performance. We further show the impressive performance of Open6DOR-GPT in diverse real-world experiments. We plan to release the final version of the benchmark, along with our refined method, in early September, and we recommend waiting until then to download the dataset.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Robotics

Multimodal

Multimodal Representation

Research Field

Multimodality

Task/Problem

He Wang Zhizheng Zhang Qiyu Dai Songlin Wei Jiazhao Zhang Xiaomeng Fang Chaoyi Xu Haoran Geng Yufei Ding

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Robotics

Multimodal

Multimodal Representation

Research Field

Multimodality

Task/Problem

He Wang Zhizheng Zhang Qiyu Dai Songlin Wei Jiazhao Zhang Xiaomeng Fang Chaoyi Xu Haoran Geng Yufei Ding

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach

He Wang Zhizheng Zhang Qiyu Dai Songlin Wei Jiazhao Zhang Xiaomeng Fang Chaoyi Xu Haoran Geng Yufei Ding

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach

He Wang Zhizheng Zhang Qiyu Dai Songlin Wei Jiazhao Zhang Xiaomeng Fang Chaoyi Xu Haoran Geng Yufei Ding

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach

He Wang Zhizheng Zhang Qiyu Dai Songlin Wei Jiazhao Zhang Xiaomeng Fang Chaoyi Xu Haoran Geng Yufei Ding

Abstract

Build AI with AI

HyperAI Newsletters