Moebius Image Inpainting Framework Rivals 10B Models With 0.2B Parameters
Researchers from Huazhong University of Science and Technology and the VIVO AI Lab have introduced Moebius, a lightweight image inpainting framework that delivers generation quality comparable to industrial-scale foundation models while requiring minimal computational resources. The project addresses the deployment barriers of large generative AI systems, where high-fidelity tools such as the 10B-parameter FLUX.1-Fill-Dev are often hindered by excessive inference costs and memory demands. Moebius offers a pathway to scalable, high-performance image editing by decoupling quality from massive parameter counts. Built upon a Latent Diffusion Model with Latent Categories Guidance, Moebius features a redesigned denoising U-Net centered on the Local-λ Mix Interaction block. This architecture employs Local-λ and Interactive-λ modules to compress spatial contexts and global semantic priors into fixed-size linear matrices. By preserving complex latent interactions within this condensed structure, the framework mitigates the representation bottlenecks that typically degrade performance in compressed models. The design allows for significant parameter reduction without sacrificing the network's ability to reconstruct detailed imagery. To further optimize the compact architecture, Moebius utilizes an adaptive multi-granularity distillation strategy. This training approach operates within the latent space to eliminate the overhead associated with pixel-space decoding. The strategy dynamically balances gradient-based losses, enabling the lightweight model to align closely with larger teacher networks. This synergistic combination of efficient design and advanced distillation ensures that the framework maintains high fidelity despite its reduced size. Extensive testing on natural and portrait benchmarks confirms Moebius's efficacy. On standard datasets including Places2, CelebA-HQ, and FFHQ, the model rivals or exceeds the output quality of FLUX.1-Fill-Dev. Moebius achieves this performance with only 0.22 billion parameters, representing less than two percent of the 11.9 billion parameters used by FLUX.1. Additionally, the optimized inference pipeline provides a speed acceleration of more than 15 times. These results establish a new standard for efficiency in image inpainting, demonstrating that highly specialized, parameter-efficient models can match the capabilities of generalist giants.
