HyperAI

BRIGHT Text Retrieval Benchmark Dataset

Date

9 months ago

Size

481.53 MB

Organization

Princeton University
The University of Hong Kong
University of Washington

* This dataset supports online use.Click here to jump.

This dataset is a new text retrieval benchmark launched in 2024 by the University of Hong Kong, Princeton University, University of Washington, and Google Cloud AI Research.BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval".

BRIGHT is the first text retrieval benchmark that requires deep reasoning to retrieve relevant documents. The research team collected 1,385 real queries from different fields (StackExchange, LeetCode, and math competitions), all of which are from real artificial data. The team paired these queries with web pages linked in StackExchange answers and theorems marked in Mathematical Olympiad problems.

It is specifically designed to evaluate and challenge the performance of retrieval systems when handling complex queries. These queries require not only keyword matching, but also deep reasoning capabilities to identify relevant documents. Simply put, BRIGHT tests whether the retrieval system can "understand" the logic and context behind the query, not just the surface text. For example, an economist wants to find documents about "how human activities affect the climate system." This question is not just about keyword matching, but requires understanding the relationship between human activities (such as deforestation and urbanization) and climate change.

BRIGHT.torrent
Seeding 1Downloading 1Completed 57Total Downloads 140
  • BRIGHT/
    • README.md
      2.15 KB
    • README.txt
      4.3 KB
      • data/
        • BRIGHT.zip
          481.53 MB