HyperAIHyperAI

Command Palette

Search for a command to run...

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Abstract

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with2D image, language, audio, and video. Guided by ImageBind, we construct a jointembedding space between 3D and multi-modalities, enabling many promisingapplications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3Dopen-world understanding. On top of this, we further present Point-LLM, thefirst 3D large language model (LLM) following 3D multi-modal instructions. Byparameter-efficient fine-tuning techniques, Point-LLM injects the semantics ofPoint-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instructiondata, but exhibits superior 3D and multi-modal question-answering capacity. Wehope our work may cast a light on the community for extending 3D point cloudsto multi-modality applications. Code is available athttps://github.com/ZiyuGuo99/Point-Bind_Point-LLM.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp