HyperAI

Seq-monkey Sequence Monkey Open Source Dataset 1.0

Date

a year ago

Size

10.73 GB

Organization

Publish URL

github.com

Sequence Monkey is a large-scale language model provided by Mobvoi.The Sequence Monkey dataset is a data set used to train the Sequence Monkey model. Part of the dataset is now open to the public.

The 1.0 version of the dataset covers the following areas: Chinese general text corpus, ancient poetry modern translation corpus, and text generation corpus. Among them, the Chinese general text corpus is 13 million pieces of data extracted from the sequence monkey training set and is open to the public. The ancient poetry modern translation open source dataset is a dataset of ancient and modern text translations, with 680,000 poems open. The text generation fine-tuning dataset has 5,000 question-and-answer data open, which can be used for word error detection, word error correction, and text polishing tasks.

seq-monkey.torrent
Seeding 3Downloading 1Completed 259Total Downloads 590
  • seq-monkey/
    • README.md
      1.36 KB
    • README.txt
      2.72 KB
      • data/
        • seq-monkey-data-main 2.zip
          10.73 GB