15 days ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Hanrong Ye Chao-Han Huck Yang Arushi Goel Wei Huang Ligeng Zhu Yuanhang Su Sean Lin An-Chieh Cheng Zhen Wan Jinchuan Tian

Abstract

Advancing machine intelligence requires developing the ability to perceiveacross multiple modalities, much as humans sense the world. We introduceOmniVinci, an initiative to build a strong, open-source, omni-modal LLM. Wecarefully study the design choices across model architecture and data curation.For model architecture, we present three key innovations: (i) OmniAlignNet forstrengthening alignment between vision and audio embeddings in a sharedomni-modal latent space; (ii) Temporal Embedding Grouping for capturingrelative temporal alignment between vision and audio signals; and (iii)Constrained Rotary Time Embedding for encoding absolute temporal information inomni-modal embeddings. We introduce a curation and synthesis pipeline thatgenerates 24M single-modal and omni-modal conversations. We find thatmodalities reinforce one another in both perception and reasoning. Our model,OmniVinci, outperforms Qwen2.5-Omni with +19.05 on DailyOmni (cross-modalunderstanding), +1.7 on MMAR (audio), and +3.9 on Video-MME (vision), whileusing just 0.2T training tokens - a 6 times reduction compared toQwen2.5-Omni's 1.2T. We finally demonstrate omni-modal advantages in downstreamapplications spanning robotics, medical AI, and smart factory.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Hanrong Ye Chao-Han Huck Yang Arushi Goel Wei Huang Ligeng Zhu Yuanhang Su Sean Lin An-Chieh Cheng Zhen Wan Jinchuan Tian22 more

Abstract

Build AI with AI

Hyper Newsletters

Hanrong Ye Chao-Han Huck Yang Arushi Goel Wei Huang Ligeng Zhu Yuanhang Su Sean Lin An-Chieh Cheng Zhen Wan Jinchuan Tian