HyperAI

MSMARCO Machine Reading Comprehension Dataset

MSMARCO is a machine reading comprehension dataset that contains 1,010,916 anonymous questions from Bing's search query logs, including AI-generated answers and 182,669 human-rewritten answers. The dataset also contains 8,841,823 paragraphs extracted from 3,563,535 documents.

The MSMARCO dataset was released by Microsoft in 2016 and updated in 2018. In addition, the dataset has a corresponding ranking competition.

MSMARCO.torrent
Seeding 2Downloading 0Completed 1,038Total Downloads 1,418
  • MSMARCO/
    • README.md
      1.03 KB
    • README.txt
      2.06 KB
      • data/
        • dev_v2.1.json.gz
          131.9 MB
        • eval_v2.1_public.json.gz
          259.55 MB