HyperAI
19 days ago

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu, Xu Cao, Xiaofeng Zhang, Yixiao He, Wenming Ye, James Matthew Rehg, Ismini Lourentzou
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Abstract

Current Vision-Language Models (VLMs) struggle with fine-grained spatialreasoning, particularly when multi-step logic and precise spatial alignment arerequired. In this work, we introduce SpatialReasoner-R1, a vision-languagereasoning model designed to address these limitations. To constructhigh-quality supervision for spatial reasoning, we design a Multi-Model MonteCarlo Tree Search (M3CTS) method that generates diverse, logically consistentLong Chain-of-Thought (LongCoT) reasoning trajectories. In addition, we proposefine-grained Direct Preference Optimization (fDPO), which introducessegment-specific preference granularity for descriptive grounding and logicalreasoning, guided by a spatial reward mechanism that evaluates candidateresponses based on visual consistency, spatial grounding, and logicalcoherence. Experimental results demonstrate that fDPO achieves an averageimprovement of 4.1% over standard DPO across spatial quality tasks, and a 9.0%gain in spatial quantity tasks. SpatialReasoner-R1, trained with fDPO, sets anew SoTA on SPATIALRGPT-Bench, outperforming the strongest baseline by 9.8% inaverage accuracy, while maintaining competitive performance on generalvision-language tasks.