HyperAIHyperAI

Command Palette

Search for a command to run...

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Jialin Wu Xia Hu Yaqing Wang Bo Pang Radu Soricut

Abstract

Large multi-modal models (LMMs) exhibit remarkable performance acrossnumerous tasks. However, generalist LMMs often suffer from performancedegradation when tuned over a large collection of tasks. Recent researchsuggests that Mixture of Experts (MoE) architectures are useful for instructiontuning, but for LMMs of parameter size around O(50-100B), the prohibitive costof replicating and storing the expert models severely limits the number ofexperts we can use. We propose Omni-SMoLA, an architecture that uses the SoftMoE approach to (softly) mix many multimodal low rank experts, and avoidsintroducing a significant number of new parameters compared to conventional MoEmodels. The core intuition here is that the large model provides a foundationalbackbone, while different lightweight experts residually learn specializedknowledge, either per-modality or multimodally. Extensive experimentsdemonstrate that the SMoLA approach helps improve the generalist performanceacross a broad range of generative vision-and-language tasks, achieving newSoTA generalist performance that often matches or outperforms singlespecialized LMM baselines, as well as new SoTA specialist performance.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp