Date

3 years ago

Size

9.09 MB

Organization

Publish URL

www.kaggle.com

Tags

The Brown Corpus is the first text corpus of American English, drawn from newspaper texts, books, and government documents on various topics. It contains 1,014,312 words and is primarily used for language modeling. The original corpus contains manually annotated sentences, token boundaries, and word class annotations, while the converted corpus contains the full text reconstructed based on the TEI/XML version of the Brown Corpus and connected to the ontology Word class for aggregate queries via OLiA. The corpus was originally published by W. Nelson Francis and Henry Kučera of the Department of Linguistics at Brown University in 1963-1964 in the paper "Computational Analysis of Present-Day American English".

Brown Corpus.torrent

Seeding 5Downloading 0Completed 1,762Total Downloads 3,839

Brown Corpus/
- README.md
  1.49 KB
- README.txt
  2.97 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.