11 days ago
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
{Ivana Lu{\v{c}}i{\'c}, Sowmya Vajjala}

Abstract
This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification.