HyperAIHyperAI

Command Palette

Search for a command to run...

SWE-bench Verified Code Generation Evaluation Benchmark Dataset

Dataset Introduction

The benchmark is an improved version (subset) of the existing SWE-bench, designed to more reliably evaluate the ability of AI models to solve real-world software problems.

To improve the robustness and reliability of SWE-bench, OpenAI launched a manual annotation campaign conducted by professional software developers to screen each sample in the SWE-bench test set to ensure that the scope of the unit test is appropriate and the problem description is clear and unambiguous.

Together with the authors of SWE-bench, they released SWE-bench Verified: a subset of the original SWE-bench test set containing 500 samples that have been verified by human annotators. This version replaces the original SWE-bench and SWE-bench Lite test sets.

On SWE-bench Verified, GPT-4o solved 33.2% samples, while the best-performing open source agent framework Agentless doubled its score to 16%.

SWE-bench_Verified.torrent
Seeding 1Downloading 0Completed 232Total Downloads 302
  • SWE-bench_Verified/
    • README.md
      1.68 KB
    • README.txt
      3.37 KB
      • data/
        • SWE-bench_Verified.zip
          1.65 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp