Command Palette
Search for a command to run...
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Abstract
We introduce Lumina-DiMOO, an open-source foundational model for seamlessmulti-modal generation and understanding. Lumina-DiMOO sets itself apart fromprior unified models by utilizing a fully discrete diffusion modeling to handleinputs and outputs across various modalities. This innovative approach allowsLumina-DiMOO to achieve higher sampling efficiency compared to previousautoregressive (AR) or hybrid AR-Diffusion paradigms and adeptly support abroad spectrum of multi-modal tasks, including text-to-image generation,image-to-image generation (e.g., image editing, subject-driven generation, andimage inpainting, etc.), as well as image understanding. Lumina-DiMOO achievesstate-of-the-art performance on multiple benchmarks, surpassing existingopen-source unified multi-modal models. To foster further advancements inmulti-modal and discrete diffusion model research, we release our code andcheckpoints to the community. Project Page:https://synbol.github.io/Lumina-DiMOO.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.