RLAIF-V-Dataset Large-scale Multimodal Preference Dataset
Date
Size
Publish URL
Categories
The RLAIF-V dataset is an AI-generated multimodal preference dataset that covers a variety of tasks and domains. The dataset contains more than 44,757 high-quality comparison pairs for training and evaluating multimodal large language models (MLLMs). The RLAIF-V dataset uses a novel approach to use open source large models to de-confound model responses and provide high-quality feedback data to reduce the hallucination phenomenon of different MLLMs.
In addition, the RLAIF-V dataset was used to train the MiniCPM-Llama3-V 2.5 model, which represents the first end-side GPT-4V-level MLLM17. The RLAIF-V project has open-sourced the code, weights (7B, 12B), and data for use and further research by the research community.
The main features of the RLAIF-V dataset include:
- High-quality feedback data: Effective reduction of hallucinations by different MLLMs used in the dataset.
- Open Source: The dataset is completely open source, allowing researchers to access and use it freely.
- Multi-task and multi-domain: The dataset covers a wide range of tasks and domains, providing diverse preference data.
The license of the RLAIF-V dataset is CC BY NC 4.0, which allows non-commercial use only, and models trained using this dataset should not be used outside of research purposes.