HyperAI

MIA-DPO Preference Alignment Method for Multiple Image Enhancement

MIA-DPO (Multi-Image Augmented Direct Preference Optimization) is a multi-image augmented preference alignment method for large visual language models (LVLMs). It was jointly proposed by Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory, Chinese University of Hong Kong and other institutions in 2024. The related paper results are "MIA-DPO: Multi-Image Augmented Direct Preference Optimization for Large Vision-Language ModelsThe core of this method is to expand single-image data to multi-image data and design three data formats: sequence, grid collage, and picture-in-picture, which effectively reduces the cost of data collection and annotation while being highly scalable.

The key to MIA-DPO is to use the attention mechanism to identify and filter rejected responses that the model may mistakenly focus on, thereby constructing selection/rejection pairs without relying on manual annotations or additional data. Experimental results show that MIA-DPO performs well on 5 multi-image benchmarks, with an average performance improvement of 3.0% (on LLaVA-v1.5) and 4.3% (on InternLM-XC2.5), while having little impact on single image understanding capabilities.