HyperAIHyperAI

Command Palette

Search for a command to run...

ChID Large-Scale Chinese Idioms Dataset

Date

a year ago

Size

328.62 MB

Organization

Tsinghua University

Publish URL

github.com

Paper URL

arxiv.org

Due to the lack of rich corpora, the study of Chinese cloze-style reading comprehension is still limited. ChID (Chinese IDiom Dataset) is a large-scale Chinese cloze test dataset used to study the comprehension of idioms, a unique language phenomenon in Chinese. In this corpus, idioms in the article are replaced by blank symbols, and the correct answer needs to be selected from carefully designed candidate idioms.

The dataset contains 581K passages and 729K blanks, and covers multiple fields. In ChID, idioms in passages are replaced by blank symbols. For each blank, a list of candidate idioms including the golden idiom is provided as a choice.

ChID.torrent
Seeding 1Downloading 0Completed 171Total Downloads 256
  • ChID/
    • README.md
      1.34 KB
    • README.txt
      2.68 KB
      • data/
        • chid.zip
          328.62 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ChID Large-Scale Chinese Idioms Dataset | Datasets | HyperAI