DreamBench++ Image Automatic Evaluation Benchmark Dataset
Date
Size
Publish URL
Categories

DreamBench++ is a new benchmark jointly launched in 2024 by researchers from Tsinghua University, Xi'an Jiaotong University, University of Illinois at Urbana-Champaign, Chinese Academy of Sciences, and Megvii, which aims to solve the problems in the evaluation of personalized image generation technology. It introduces the multimodal GPT-4o, achieves deep alignment with human preferences and automated evaluation, and launches a more comprehensive and diverse dataset.
Key features of DreamBench++ include:
- Automated evaluation: Use GPT-4o for automated evaluation, reducing the time and cost of manual evaluation.
- Human Preference Alignment: By designing carefully crafted prompts, GPT-4o can think like a human during the evaluation process, ensuring that the evaluation results are consistent with human intuition and preferences.
- Comprehensive dataset: A personalized dataset containing 200 keywords was constructed, covering three types of images: objects, living things, and stylized images. The image sources included Unsplash, Rawpixel, and Google Image Search. Images with clean backgrounds and large subject proportions were selected to improve image clarity and recognition.
- Experimental Results: Seven different image generation methods were evaluated, and the results showed that DreamBench++'s scores in image similarity and text adherence were highly consistent with human evaluations, with consistency reaching 79.64% and 93.18% respectively, which are more than 50% higher than the existing DINO score and CLIP score.
The launch of DreamBench++ provides new tools and methods for evaluating personalized image generation technology, which will help promote further development in this field. Related papers and datasets have been made public for researchers and developers to use and refer to.