HyperAIHyperAI

Command Palette

Search for a command to run...

DuIE Large-Scale Chinese Information Extraction Dataset

Date

3 years ago

Size

242.66 MB

Organization

Baidu

Publish URL

ai.baidu.com

License

Non-Commercial

DuIE is a large-scale manually annotated dataset that can be used to evaluate architecture-based knowledge extraction algorithms.

The dataset contains more than 210,000 real-world Chinese sentences, involving more than 450,000 SPO triples (i.e., Subject-Predicate-Object triples), consisting of a pre-specified structure and 49 predicates.

All sentences in this dataset are extracted from Baidu Baike and Baidu News Search. The texts in this dataset cover various fields in real-world applications, such as news, entertainment, and user-generated content.

The dataset consists of the following data:

  • 214,590 sentences, of which:
    • 172,983 sentences are used as training set;
    • 21,626 sentences are for development set;
    • 19,981 sentences are used as the test set.
  • 457,866 instances, of which:
    • 363,960 instances are training set;
    • 45,558 instances are development set;
    • 48,348 instances are in the test set.

Example data:

DuIE.torrent
Seeding 2Downloading 0Completed 596Total Downloads 1,315
  • DuIE/
    • README.md
      1.53 KB
    • README.txt
      3.07 KB
      • data/
        • all_50_schemas
          6.94 KB
        • dev_data.json
          27.1 MB
        • train_data.json
          242.66 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp