15 days ago

OmniGen2: Exploration to Advanced Multimodal Generation

Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, Zheng Liu

View Paper Details View Code

OmniGen2: Exploration to Advanced Multimodal Generation

Abstract

In this work, we introduce OmniGen2, a versatile and open-source generativemodel designed to provide a unified solution for diverse generation tasks,including text-to-image, image editing, and in-context generation. UnlikeOmniGen v1, OmniGen2 features two distinct decoding pathways for text and imagemodalities, utilizing unshared parameters and a decoupled image tokenizer. Thisdesign enables OmniGen2 to build upon existing multimodal understanding modelswithout the need to re-adapt VAE inputs, thereby preserving the original textgeneration capabilities. To facilitate the training of OmniGen2, we developedcomprehensive data construction pipelines, encompassing image editing andin-context generation data. Additionally, we introduce a reflection mechanismtailored for image generation tasks and curate a dedicated reflection datasetbased on OmniGen2. Despite its relatively modest parameter size, OmniGen2achieves competitive results on multiple task benchmarks, includingtext-to-image and image editing. To further evaluate in-context generation,also referred to as subject-driven tasks, we introduce a new benchmark namedOmniContext. OmniGen2 achieves state-of-the-art performance among open-sourcemodels in terms of consistency. We will release our models, training code,datasets, and data construction pipeline to support future research in thisfield. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link:https://github.com/VectorSpaceLab/OmniGen2