HyperAI

HLE Human Question Reasoning Benchmark Dataset

HLE is a multimodal human problem benchmark dataset jointly released by the Center for AI Safety and Scale AI. The related paper results are:Humanity's Last Exam", aims to build the ultimate closed evaluation system covering the frontiers of human knowledge.

The dataset contains 2,500 questions covering dozens of subjects such as mathematics, humanities, and natural sciences, including multiple-choice questions and short-answer questions suitable for automatic scoring.

Subject distribution:

  • Mathematics (41%):Abstract problems such as advanced mathematics, probability theory, and algorithm design.
  • Computer Science/Artificial Intelligence (10%):Machine learning theory, computational complexity, natural language processing.
  • Natural Sciences (27%):Physics (9%), Chemistry (7%), Biology/Medicine (11%), involving quantum physics, organic synthesis, pathological mechanisms, etc.
  • Humanities/Social Sciences (9%):Critical analysis questions in philosophy, history, economics, and sociology.
  • Engineering (4%) and other disciplines (9%):Covers engineering design, art history, and interdisciplinary cutting-edge issues.

Discipline Distribution

hle.torrent
Seeding 1Downloading 0Completed 10Total Downloads 41
  • hle/
    • README.md
      1.69 KB
    • README.txt
      3.37 KB
      • data/
        • hle.zip
          227.35 MB