Date

10 months ago

Organization

Paper URL

2508.17580

License

CC BY-SA 4.0

Tags

Mathematics

Theoretical Computer Science

The UQ dataset is an evaluation benchmark released in 2025 by Stanford University, the University of Washington, the University of North Carolina and other institutions. The relevant paper results are "UQ: Assessing Language Models on Unsolved Questions", which aims to evaluate the reasoning, factuality and browsing capabilities of cutting-edge large models by using real and difficult "problems that have not been answered by human society". The dataset consists of 500 long-standing unanswered questions from the Stack Exchange platform, covering topics such as computer science theory, mathematics, science fiction, and history. It adopts a "rule filtering + LLM review + manual review" collection pipeline, and is equipped with UQ-Validators for automatic pre-screening and community review of candidate answers. Its characteristics are difficult but realistic, asynchronous evaluation, and generation-verification separation. It is suitable for scenarios such as reasoning/retrieval evaluation of cutting-edge models, long-term progress tracking, and public rankings.

Data distribution:

Science: 395
Technology: 52
Culture & Recreation: 16
Life & Arts: 35
Dataset construction process

Citation

@misc{nie2025uqassessinglanguagemodels, title={UQ: Assessing Language Models on Unsolved Questions}, author={Fan Nie and Ken Ziyu Liu and Zihao Wang and Rui Sun and Wei Liu and Weijia Shi and Huaxiu Yao and Linjun Zhang and Andrew Y. Ng and James Zou and Sanmi Koyejo and Yejin Choi and Percy Liang and Niklas Muennighoff}, year={2025}, eprint={2508.17580}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.17580}, }

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset Discuss on Discord

Date

10 months ago

Organization

Paper URL

2508.17580

License

CC BY-SA 4.0

Data distribution:

Science: 395
Technology: 52
Culture & Recreation: 16
Life & Arts: 35
Dataset construction process

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

a day ago

SAM 3D Artist Objects 3D Object Reconstruction Dataset

5 days ago

FigureBench Scientific Illustration Generation Benchmark Dataset

7 days ago

DeepCrack Infrastructure Crack Detection Dataset

25 days ago

SMOL Multilingual Translation Parallel Dataset

a month ago

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

13 days ago

MemLens Multimodal Long Context Benchmark Dataset

a month ago

Claw-Eval Real-World Benchmark Dataset

a month ago

QCalEval Quantum Calibration Graph Understanding Dataset

2 months ago

RSRCC Remote Sensing Area Change Understanding Benchmark Dataset

8 days ago

MDPBench Multilingual Document Parsing Benchmark Dataset

8 days ago

Simple Voice Questions Dataset

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

UQ Unsolved Questions Dataset

Data distribution:

Citation

Build AI with AI

HyperAI Newsletters

Command Palette

UQ Unsolved Questions Dataset

Data distribution:

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

DeepCrack Infrastructure Crack Detection Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

MemLens Multimodal Long Context Benchmark Dataset

Claw-Eval Real-World Benchmark Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

RSRCC Remote Sensing Area Change Understanding Benchmark Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Simple Voice Questions Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

UQ Unsolved Questions Dataset

Data distribution:

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

DeepCrack Infrastructure Crack Detection Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

MemLens Multimodal Long Context Benchmark Dataset

Claw-Eval Real-World Benchmark Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

RSRCC Remote Sensing Area Change Understanding Benchmark Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Simple Voice Questions Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

DeepCrack Infrastructure Crack Detection Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

MemLens Multimodal Long Context Benchmark Dataset

Claw-Eval Real-World Benchmark Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

RSRCC Remote Sensing Area Change Understanding Benchmark Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Simple Voice Questions Dataset

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SAM 3D Artist Objects 3D Object Reconstruction Dataset

FigureBench Scientific Illustration Generation Benchmark Dataset

DeepCrack Infrastructure Crack Detection Dataset

SMOL Multilingual Translation Parallel Dataset

chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

MemLens Multimodal Long Context Benchmark Dataset

Claw-Eval Real-World Benchmark Dataset

QCalEval Quantum Calibration Graph Understanding Dataset

RSRCC Remote Sensing Area Change Understanding Benchmark Dataset

MDPBench Multilingual Document Parsing Benchmark Dataset

Simple Voice Questions Dataset