SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Zijian Wu, Jinjie Ni, Xiangyan Liu, Zichen Liu, Hang Yan, Michael Qizhe Shieh

발행일: 6/4/2025

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

초록

Vision-language models (VLMs) trained via reinforcement learning withverifiable reward (RLVR) have shown notable progress in scaling test-timecompute effectively. In this work, we investigate how synthesized RL data canfurther improve RLVR. To this end, we propose SynthRL-a scalable andguaranteed pipeline for automatic data scaling in reasoning-oriented RLtraining. SynthRL comprises three key stages: (1) selecting seed questions withappropriate distribution, (2) augmenting them into more challenging variantswhile preserving the original answers, and (3) a guaranteed verification stagethat ensures near-perfect correctness and difficulty enhancement. Our empiricalexperiments demonstrate SynthRL's scalability and effectiveness. When appliedto the MMK12 dataset, SynthRL synthesizes over 3.3K additional verifiable,challenging questions from approximately 8K seed samples. Models trained withour synthesized data achieve consistent gains across five out-of-domain visualmath reasoning benchmarks, with a significant improvement over baseline modelstrained on seed data alone. Notably, detailed analysis reveals that the gainsare more pronounced on the most challenging evaluation samples, highlightingSynthRL's effectiveness in eliciting deeper and more complex reasoningpatterns.

논문 세부 정보 보기