COREVQA Visual Question Answering Benchmark Dataset
Date
Size
Publish URL
Paper URL
License
Apache 2.0
COREVQA is a visual question answering benchmark dataset released by the Algoverse Artificial Intelligence Research Center in 2025. The related paper results are "COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark", which aims to evaluate the reasoning entailment ability of visual language models (VLMs) in crowd scenes.
This dataset contains 5,608 pairs of images and true/false sentences. The images are derived from the CrowdHuman dataset. The data primarily depicts real-world crowded scenes, emphasizing challenges such as occlusion, perspective changes, and background interference. It aims to advance the fine-grained perception and reasoning capabilities of VLMs in complex social scenarios.
The data includes:
- Scene image (image_id)
- Natural language statement (question)
- Binary label (answer:TRUE / FALSE)