HyperAI

ChID Large-Scale Chinese Idioms Dataset

Date

a year ago

Size

328.62 MB

Organization

Tsinghua University

Publish URL

github.com

Due to the lack of rich corpora, the study of Chinese cloze-style reading comprehension is still limited. ChID (Chinese IDiom Dataset) is a large-scale Chinese cloze test dataset used to study the comprehension of idioms, a unique language phenomenon in Chinese. In this corpus, idioms in the article are replaced by blank symbols, and the correct answer needs to be selected from carefully designed candidate idioms.

The dataset contains 581K passages and 729K blanks, and covers multiple fields. In ChID, idioms in passages are replaced by blank symbols. For each blank, a list of candidate idioms including the golden idiom is provided as a choice.

ChID.torrent
Seeding 2Downloading 0Completed 129Total Downloads 151
  • ChID/
    • README.md
      1.34 KB
    • README.txt
      2.68 KB
      • data/
        • chid.zip
          328.62 MB