INFINITY-CHAT Real Open Question Answering Dataset
Date
Paper URL
INFINITY-CHAT, released in 2025 by the University of Washington in collaboration with Carnegie Mellon University, the Allen Institute for Artificial Intelligence, and other institutions, is the first large-scale dataset to address open-ended questions from real-world users. Its related research papers... Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) Awarded NeurIPS 2025 Best Paper (DB track), this paper aims to systematically study key issues such as the diversity of language models in open-ended generation, differences in human preferences, and the "artificial swarm effect".
This dataset contains over 26,000 real open-ended user questions and provides a comprehensive query classification system with 6 top-level categories and 17 subcategories. The dataset also includes answers from over 70 language models and 31,250 human annotations (including absolute scores and pairwise preferences), with an average of 25 annotators evaluating each sample. The dataset consists of four main parts: open-ended question corpus, multi-level classification labels, model-generated results, and large-scale human feedback.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.