HyperAI

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang
Release Date: 5/13/2025
Flow-GRPO: Training Flow Matching Models via Online RL
Abstract

We propose Flow-GRPO, the first method integrating online reinforcementlearning (RL) into flow matching models. Our approach uses two key strategies:(1) an ODE-to-SDE conversion that transforms a deterministic OrdinaryDifferential Equation (ODE) into an equivalent Stochastic Differential Equation(SDE) that matches the original model's marginal distribution at all timesteps,enabling statistical sampling for RL exploration; and (2) a Denoising Reductionstrategy that reduces training denoising steps while retaining the originalinference timestep number, significantly improving sampling efficiency withoutperformance degradation. Empirically, Flow-GRPO is effective across multipletext-to-image tasks. For complex compositions, RL-tuned SD3.5 generates nearlyperfect object counts, spatial relations, and fine-grained attributes, boostingGenEval accuracy from 63% to 95%. In visual text rendering, its accuracyimproves from 59% to 92%, significantly enhancing text generation.Flow-GRPO also achieves substantial gains in human preference alignment.Notably, little to no reward hacking occurred, meaning rewards did not increaseat the cost of image quality or diversity, and both remained stable in ourexperiments.