HyperAI超神经

* This dataset supports online use.Click here to jump.

FRAMES-benchmark is a comprehensive evaluation dataset released by Google in 2024. It aims to test the ability of retrieval-augmented generation (RAG) systems in terms of factuality, retrieval accuracy, and reasoning.Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation". The dataset contains 824 challenging multi-hop questions that require information from 2 to 15 Wikipedia articles. The questions cover multiple topics such as history, sports, science, animals, health, etc., and each question is labeled with the reasoning type, such as numerical, table, multiple constraints, temporal, and post-processing. The dataset also provides the golden answer and the related Wikipedia article for each question.

The main features of the FRAMES dataset include testing end-to-end RAG capabilities, integrating information from multiple sources, including complex reasoning and temporal disambiguation, and being designed to be challenging for state-of-the-art language models. The dataset can be used to evaluate the performance of RAG systems, benchmark the factuality and reasoning capabilities of language models, and develop and test multi-hop retrieval strategies.

FRAMES-benchmark Retrieval Enhancement Generation Test Set

* This dataset supports online use.Click here to jump.