HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
  Generation and Understanding

Abstract

We introduce Lumina-DiMOO, an open-source foundational model for seamlessmulti-modal generation and understanding. Lumina-DiMOO sets itself apart fromprior unified models by utilizing a fully discrete diffusion modeling to handleinputs and outputs across various modalities. This innovative approach allowsLumina-DiMOO to achieve higher sampling efficiency compared to previousautoregressive (AR) or hybrid AR-Diffusion paradigms and adeptly support abroad spectrum of multi-modal tasks, including text-to-image generation,image-to-image generation (e.g., image editing, subject-driven generation, andimage inpainting, etc.), as well as image understanding. Lumina-DiMOO achievesstate-of-the-art performance on multiple benchmarks, surpassing existingopen-source unified multi-modal models. To foster further advancements inmulti-modal and discrete diffusion model research, we release our code andcheckpoints to the community. Project Page:https://synbol.github.io/Lumina-DiMOO.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding | Papers | HyperAI