HyperAI

European Parliament Proceedings Parallel Corpus 1996-2011 Statistical Machine Translation Corpus

Date

6 years ago

Size

3.75 GB

Organization

University of Edinburgh

Publish URL

www.statmt.org

The European Parliament Proceedings Parallel Corpus 1996-2011 dataset is a corpus for statistical machine translation. The Europarl parallel corpus is derived from the proceedings of the European Parliament and includes versions in 21 European languages:

  • Romani (French, Italian, Spanish, Portuguese, Romanian)
  • Germanic languages (English, Dutch, German, Danish, Swedish)
  • Slavik (Bulgarian, Czech, Polish, Slovak, Slovenian)
  • Finni-Ugric (Finnish, Hungarian, Estonian)
  • Baltic (Latvian, Lithuanian)
  • Greek

The European Parliament Proceedings Parallel Corpus 1996-2011 dataset was originally published by the School of Informatics at the University of Edinburgh, Scotland in 2005, with the main publisher being Philipp Koehn.

The 7th edition of this dataset was released in 2012. The related paper is "Europarl: A Parallel Corpus for Statistical Machine Translation"

European_Parliament_Proceedings_Parallel_Corpus_1996-2011.torrent
Seeding 3Downloading 0Completed 912Total Downloads 1,464
  • European_Parliament_Proceedings_Parallel_Corpus_1996-2011/
    • README.md
      1.55 KB
    • README.txt
      3.11 KB
      • data/
        • bg-en.tgz
          40.62 MB
        • cs-en.tgz
          99.8 MB
        • da-en.tgz
          278.8 MB
        • de-en.tgz
          467.42 MB
        • el-en.tgz
          611.8 MB
        • es-en.tgz
          797.83 MB
        • et-en.tgz
          854.43 MB
        • europarl.tgz
          2.3 GB
        • fi-en.tgz
          2.47 GB
        • fr-en.tgz
          2.66 GB
        • hu-en.tgz
          2.72 GB
        • it-en.tgz
          2.9 GB
        • lt-en.tgz
          2.95 GB
        • lv-en.tgz
          3.01 GB
        • nl-en.tgz
          3.2 GB
        • pl-en.tgz
          3.25 GB
        • pt-en.tgz
          3.44 GB
        • ro-en.tgz
          3.47 GB
        • sk-en.tgz
          3.53 GB
        • sl-en.tgz
          3.58 GB
        • sv-en.tgz
          3.75 GB