HyperAIHyperAI
13 days ago

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Yiyang Ma, Xingchao Liu, Xiaokang Chen, Wen Liu, Chengyue Wu, Zhiyu Wu, Zizheng Pan, Zhenda Xie, Haowei Zhang, Xingkai yu, Liang Zhao, Yisong Wang, Jiaying Liu, Chong Ruan
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
  Multimodal Understanding and Generation
Abstract

We present JanusFlow, a powerful framework that unifies image understandingand generation in a single model. JanusFlow introduces a minimalistarchitecture that integrates autoregressive language models with rectifiedflow, a state-of-the-art method in generative modeling. Our key findingdemonstrates that rectified flow can be straightforwardly trained within thelarge language model framework, eliminating the need for complex architecturalmodifications. To further improve the performance of our unified model, weadopt two key strategies: (i) decoupling the understanding and generationencoders, and (ii) aligning their representations during unified training.Extensive experiments show that JanusFlow achieves comparable or superiorperformance to specialized models in their respective domains, whilesignificantly outperforming existing unified approaches across standardbenchmarks. This work represents a step toward more efficient and versatilevision-language models.

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Latest Papers | HyperAI