Command Palette
Search for a command to run...
UQ Unsolved Questions Dataset
Date
Paper URL
License
CC BY-SA 4.0
*This dataset supports online use.Click here to jump.
The UQ dataset is an evaluation benchmark released in 2025 by Stanford University, the University of Washington, the University of North Carolina and other institutions. The relevant paper results are "UQ: Assessing Language Models on Unsolved Questions", which aims to evaluate the reasoning, factuality and browsing capabilities of cutting-edge large models by using real and difficult "problems that have not been answered by human society".
The dataset consists of 500 long-standing unanswered questions from the Stack Exchange platform, covering topics such as computer science theory, mathematics, science fiction, and history. It adopts a "rule filtering + LLM review + manual review" collection pipeline, and is equipped with UQ-Validators for automatic pre-screening and community review of candidate answers. Its characteristics are difficult but realistic, asynchronous evaluation, and generation-verification separation. It is suitable for scenarios such as reasoning/retrieval evaluation of cutting-edge models, long-term progress tracking, and public rankings.
Data distribution:
- Science: 395
- Technology: 52
- Culture & Recreation: 16
- Life & Arts: 35

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.