HyperAI

FCGEC Chinese Grammar Error Detection and Correction Dataset

Date

a year ago

Size

15.51 MB

Organization

Zhejiang University

Publish URL

github.com

License

非商业用途

FCGEC stands for Fine-Grained Corpus for Chinese Grammatical Error Correction. It is a large-scale multi-reference text error correction corpus of native speakers.  Used to train and evaluate the error-correcting model system, the data source is mainly incorrect sentence test questions of primary, middle and high school students and news aggregation websites.

In order to provide more reference modification methods for sentences to achieve diverse annotation goals, each sentence will be randomly assigned to 2-4 annotators for annotation.We collected 54,026 original sentences from two data sources. After removing duplicate sentences and filtering out problematic sentences (such as text truncation), FCGEC contains a total of 41,340 sentences.

FCGEC.torrent
Seeding 1Downloading 1Completed 125Total Downloads 405
  • FCGEC/
    • README.md
      1.33 KB
    • README.txt
      2.65 KB
      • data/
        • FCGEC_test.json
          815.18 KB
        • FCGEC_train.json
          14.73 MB
        • FCGEC_valid.json
          15.51 MB