HyperAIHyperAI

Command Palette

Search for a command to run...

FRAMES-benchmark Retrieval Enhancement Generation Test Set

Date

a year ago

Organization

Google

Paper URL

arxiv.org

Join the Discord Community

* This dataset supports online use.Click here to jump.

FRAMES-benchmark is a comprehensive evaluation dataset released by Google in 2024. It aims to test the ability of retrieval-augmented generation (RAG) systems in terms of factuality, retrieval accuracy, and reasoning.Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation". The dataset contains 824 challenging multi-hop questions that require information from 2 to 15 Wikipedia articles. The questions cover multiple topics such as history, sports, science, animals, health, etc., and each question is labeled with the reasoning type, such as numerical, table, multiple constraints, temporal, and post-processing. The dataset also provides the golden answer and the related Wikipedia article for each question.

The main features of the FRAMES dataset include testing end-to-end RAG capabilities, integrating information from multiple sources, including complex reasoning and temporal disambiguation, and being designed to be challenging for state-of-the-art language models. The dataset can be used to evaluate the performance of RAG systems, benchmark the factuality and reasoning capabilities of language models, and develop and test multi-hop retrieval strategies.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp