HyperAI

PKU Simplified Chinese Word Segmentation Dataset

SIGHAN 2005 The International Chinese Automatic Word Segmentation Evaluation (SIGHAN Evaluation) integrates word segmentation datasets from multiple institutions. This dataset was jointly released by Microsoft Research China, Peking University, City University of Hong Kong, and Academia Sinica in Taiwan, and is used for training and evaluating Chinese word segmentation models. PKU is a simplified Chinese word segmentation dataset.

中文分词pku.torrent
Seeding 2Downloading 0Completed 118Total Downloads 401
  • 中文分词pku/
    • README.md
      1.06 KB
    • README.txt
      2.12 KB
      • data/
        • chinese_word_pku.zip
          3.54 MB