HyperAIHyperAI

FCGEC Chinese Grammar Error Detection and Correction Dataset

Date

2 years ago

Size

15.51 MB

Organization

Zhejiang University

Publish URL

github.com

Paper URL

arxiv.org

License

非商业用途

FCGEC stands for Fine-Grained Corpus for Chinese Grammatical Error Correction. It is a large-scale multi-reference text error correction corpus of native speakers.  Used to train and evaluate the error-correcting model system, the data source is mainly incorrect sentence test questions of primary, middle and high school students and news aggregation websites.

In order to provide more reference modification methods for sentences to achieve diverse annotation goals, each sentence will be randomly assigned to 2-4 annotators for annotation.We collected 54,026 original sentences from two data sources. After removing duplicate sentences and filtering out problematic sentences (such as text truncation), FCGEC contains a total of 41,340 sentences.

FCGEC.torrent
Seeding 2Downloading 0Completed 194Total Downloads 502
  • FCGEC/
    • README.md
      1.33 KB
    • README.txt
      2.65 KB
      • data/
        • FCGEC_test.json
          815.18 KB
        • FCGEC_train.json
          14.73 MB
        • FCGEC_valid.json
          15.51 MB
FCGEC Chinese Grammar Error Detection and Correction Dataset | Datasets | HyperAI