QAngaroo Multi-step Reasoning Reading Comprehension Dataset
Date
Size
Publish URL
Categories
The QAngaroo dataset is a reading comprehension dataset created by University College London (UCL) in 2018 that focuses on multi-hop reasoning.Constructing Datasets for Multi-hop Reading Comprehension Across DocumentsThis dataset consists of two parts: WikiHop and MedHop, which aims to build a reading comprehension method that can perform multi-hop reasoning, that is, facts scattered in different documents require multiple steps of reasoning to derive new facts.
WikiHop is an open domain dataset focusing on Wikipedia articles, containing 43,738 samples in the training set and 5,129 samples in the validation set.

MedHop is a dataset based on PubMed paper abstracts, which contains 1,620 samples in the training set and 342 samples in the validation set.

Each sample contains a query, supporting facts, candidate answers, the correct answer, and a unique identifier. These datasets provide researchers with training and evaluation resources to develop reading comprehension models that can handle complex reasoning tasks.