Use this Dataset

Discuss on Discord

Date

2 years ago

Size

1.92 MB

Organization

Paper URL

Tags

ProcessBench is a benchmark dataset that focuses on identifying errors in mathematical reasoning. It aims to measure the ability of language models to identify incorrect steps in mathematical reasoning. It was launched by the Alibaba Group’s Qwen team in 2024. The related paper results are “ProcessBench: Identifying Process Errors in Mathematical Reasoning". This dataset contains 3.4k test examples, focusing on math problems of competition and Olympic difficulty. Each example is equipped with a step-by-step solution, and the errors are accurately marked by domain experts. When constructing this dataset, the research team selected questions from multiple public data sources, used various open source language models to produce solutions, and finally reviewed by experts to ensure the high quality of the data.

Example of data from PROCESSBENCH. The label 2 means that the oldest error occurred at step 2 (indexed from 0). For test cases without errors, the label is -1.

Citation

@article{processbench, title={ProcessBench: Identifying Process Errors in Mathematical Reasoning}, author={ Chujie Zheng and Zhenru Zhang and Beichen Zhang and Runji Lin and Keming Lu and Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin }, journal={arXiv preprint arXiv:2412.06559}, year={2024} }

ProcessBench.torrent

Seeding 1Downloading 0Completed 162Total Downloads 223

ProcessBench/
- README.md
  1.58 KB
- README.txt
  3.15 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp