European Parliament Proceedings Parallel Corpus 1996-2011 Statistical Machine Translation Corpus
Date
6 years ago
Size
3.75 GB
Publish URL
Categories
The European Parliament Proceedings Parallel Corpus 1996-2011 dataset is a corpus for statistical machine translation. The Europarl parallel corpus is derived from the proceedings of the European Parliament and includes versions in 21 European languages:
- Romani (French, Italian, Spanish, Portuguese, Romanian)
- Germanic languages (English, Dutch, German, Danish, Swedish)
- Slavik (Bulgarian, Czech, Polish, Slovak, Slovenian)
- Finni-Ugric (Finnish, Hungarian, Estonian)
- Baltic (Latvian, Lithuanian)
- Greek
The European Parliament Proceedings Parallel Corpus 1996-2011 dataset was originally published by the School of Informatics at the University of Edinburgh, Scotland in 2005, with the main publisher being Philipp Koehn.
The 7th edition of this dataset was released in 2012. The related paper is "Europarl: A Parallel Corpus for Statistical Machine Translation"
European_Parliament_Proceedings_Parallel_Corpus_1996-2011.torrent
Seeding 3Downloading 0Completed 912Total Downloads 1,464