HyperAI

MM-RLHF (Multimodal Reinforcement Learning from Human Feedback) is a high-quality, fine-grained multimodal dataset.MM-RLHF: The Next Step Forward in Multimodal LLM Alignment", first published on arXiv in 2025 by the Institute of Automation, Chinese Academy of Sciences (CASIA). This dataset aims to promote the alignment research of multimodal large language models (MLLMs) and solve the problems of truthfulness, safety, and alignment with human preferences in practical applications.

The dataset contains 120,000 pairs of fine-grained, manually annotated preference comparison data, covering three areas: image understanding, video analysis, and multimodal security. The amount of data far exceeds existing resources, covering more than 100,000 multimodal task instances. Each piece of data has been carefully scored and interpreted by more than 50 annotators to ensure the high quality and granularity of the data.

MM-RLHF Multimodal Preference Alignment Dataset

Build AI with AI

Hyper Newsletters

Command Palette

MM-RLHF Multimodal Preference Alignment Dataset

Build AI with AI

Hyper Newsletters