ChID Large-Scale Chinese Idioms Dataset
Date
a year ago
Size
328.62 MB
Publish URL
Categories
Due to the lack of rich corpora, the study of Chinese cloze-style reading comprehension is still limited. ChID (Chinese IDiom Dataset) is a large-scale Chinese cloze test dataset used to study the comprehension of idioms, a unique language phenomenon in Chinese. In this corpus, idioms in the article are replaced by blank symbols, and the correct answer needs to be selected from carefully designed candidate idioms.
The dataset contains 581K passages and 729K blanks, and covers multiple fields. In ChID, idioms in passages are replaced by blank symbols. For each blank, a list of candidate idioms including the golden idiom is provided as a choice.
ChID.torrent
Seeding 2Downloading 0Completed 129Total Downloads 151