Command Palette
Search for a command to run...
ChID Large-Scale Chinese Idioms Dataset
Date
Size
Publish URL
Paper URL
Due to the lack of rich corpora, the study of Chinese cloze-style reading comprehension is still limited. ChID (Chinese IDiom Dataset) is a large-scale Chinese cloze test dataset used to study the comprehension of idioms, a unique language phenomenon in Chinese. In this corpus, idioms in the article are replaced by blank symbols, and the correct answer needs to be selected from carefully designed candidate idioms.
The dataset contains 581K passages and 729K blanks, and covers multiple fields. In ChID, idioms in passages are replaced by blank symbols. For each blank, a list of candidate idioms including the golden idiom is provided as a choice.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.