FCGEC Chinese Grammar Error Detection and Correction Dataset
Date
Size
Publish URL
License
非商业用途
Categories
FCGEC stands for Fine-Grained Corpus for Chinese Grammatical Error Correction. It is a large-scale multi-reference text error correction corpus of native speakers. Used to train and evaluate the error-correcting model system, the data source is mainly incorrect sentence test questions of primary, middle and high school students and news aggregation websites.
In order to provide more reference modification methods for sentences to achieve diverse annotation goals, each sentence will be randomly assigned to 2-4 annotators for annotation.We collected 54,026 original sentences from two data sources. After removing duplicate sentences and filtering out problematic sentences (such as text truncation), FCGEC contains a total of 41,340 sentences.