Date

3 years ago

Size

15.51 MB

Organization

Publish URL

github.com

Paper URL

arxiv.org

License

Non-Commercial

Tags

Natural Language Processing

FCGEC stands for Fine-Grained Corpus for Chinese Grammatical Error Correction. It is a large-scale multi-reference text error correction corpus of native speakers. Used to train and evaluate the error-correcting model system, the data source is mainly incorrect sentence test questions of primary, middle and high school students and news aggregation websites. In order to provide more reference modification methods for sentences to achieve diverse annotation goals, each sentence will be randomly assigned to 2-4 annotators for annotation.We collected 54,026 original sentences from two data sources. After removing duplicate sentences and filtering out problematic sentences (such as text truncation), FCGEC contains a total of 41,340 sentences.

Citation

@inproceedings{xu2022fcgec, title = “{FCGEC}: Fine-Grained Corpus for {C}hinese Grammatical Error Correction”, author = “Xu, Lvxiaowei and Wu, Jianwang and Peng, Jiawei and Fu, Jiayu and Cai, Ming”, booktitle = “Findings of the Association for Computational Linguistics: EMNLP 2022”, year = “2022”, publisher = “Association for Computational Linguistics”, url = “https://aclanthology.org/2022.findings-emnlp.137”, pages = “1900–1918” }

FCGEC.torrent

Seeding 1Downloading 0Completed 272Total Downloads 611

FCGEC/
- README.md
  1.33 KB
- README.txt
  2.65 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

3 years ago

Size

15.51 MB

Organization

Publish URL

github.com

Paper URL

arxiv.org

License

Non-Commercial

Citation

FCGEC.torrent

Seeding 1Downloading 0Completed 272Total Downloads 611

FCGEC/
- README.md
  1.33 KB
- README.txt
  2.65 KB

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

2 days ago

SMOL Multilingual Translation Parallel Dataset

a month ago

OmniParsingBench Multimodal Parsing Capability Evaluation Dataset

9 days ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FCGEC Chinese Grammar Error Detection and Correction Dataset

Citation

Build AI with AI

HyperAI Newsletters

Command Palette

FCGEC Chinese Grammar Error Detection and Correction Dataset

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SMOL Multilingual Translation Parallel Dataset

OmniParsingBench Multimodal Parsing Capability Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

FCGEC Chinese Grammar Error Detection and Correction Dataset

Citation

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SMOL Multilingual Translation Parallel Dataset

OmniParsingBench Multimodal Parsing Capability Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SMOL Multilingual Translation Parallel Dataset

OmniParsingBench Multimodal Parsing Capability Evaluation Dataset

Related Datasets

MAKIEVAL Multilingual Cultural Knowledge Assessment Dataset

SMOL Multilingual Translation Parallel Dataset

OmniParsingBench Multimodal Parsing Capability Evaluation Dataset