HyperAI

Brown Corpus

The Brown Corpus is the first text corpus of American English, drawn from newspaper texts, books, and government documents on various topics. It contains 1,014,312 words and is primarily used for language modeling.

The original corpus contains manually annotated sentences, token boundaries, and word class annotations, while the converted corpus contains the full text reconstructed based on the TEI/XML version of the Brown Corpus and connected to the ontology Word class for aggregate queries via OLiA.

The corpus was originally published by W. Nelson Francis and Henry Kučera of the Department of Linguistics at Brown University in 1963-1964 in the paper "Computational Analysis of Present-Day American English".

Brown Corpus.torrent
Seeding 5Downloading 0Completed 1,398Total Downloads 3,453
  • Brown Corpus/
    • README.md
      1.49 KB
    • README.txt
      2.97 KB
      • data/
        • Brown Corpus.zip
          9.09 MB