WikiLinks Wikipedia Link Dataset
Date
2 years ago
Size
1.71 GB
Publish URL
License
CC BY-NC-SA 3.0

WikiLinks is a dataset that searches the full text of Wikipedia by paragraph, phrase, or part of the paragraph itself. The dataset considers each page on Wikipedia as representing an entity (or concept or idea), based on hyperlinks found from web searches, and uses anchor text as mentions, which can eventually provide large-scale labeled data without manual manipulation.
The dataset includes:
- Nearly 1.9 billion words from more than 4 million articles
- 40 million references to 3 million entities
- 10 compressed text files data-0000[0-9]-of-00010.gz.
This dataset was created on September 29, 2012
WikiLinks.torrent
Seeding 1Downloading 1Completed 570Total Downloads 589