HyperAI

WikiText Long Term Dependency Language Modeling Dataset Long Term Dependency Language Modeling Dataset

The WikiText long-term reliance language modeling dataset contains 100 million English words, which come from Wikipedia's high-quality articles and benchmark articles.

The dataset is divided into two versions: WikiText-2 and WikiText-103. Compared with the PTB vocabulary, it is larger in scale and each word also retains the relevant original article, which is suitable for scenarios that require long-term reliance on natural language modeling.

This dataset was released by Salesforce Research in 2016, with the main publishers being Stephen Merity, Caiming Xiong, James Bradbury and Richard Socher. The related paper is "Pointer Sentinel Mixture Models".

WikiText Long Term Dependency Language Modeling Dataset.torrent
Seeding 4Downloading 0Completed 1,080Total Downloads 2,080
  • WikiText Long Term Dependency Language Modeling Dataset/
    • README.md
      1.46 KB
    • README.txt
      2.92 KB
      • data/
        • wikitext-103-raw-v1.zip
          183.09 MB
        • wikitext-103-v1.zip
          364.51 MB
        • wikitext-2-raw-v1.zip
          369.01 MB
        • wikitext-2-v1.zip
          373.28 MB
        • 新建文本文档.txt
          373.28 MB