LinCE Language Conversion Dataset
Date
3 years ago
Publish URL

LinCE stands for Linguistic Code-switching Evaluation Dataset, which is a language code switching evaluation dataset. The dataset combines ten corpora covering four different code switching language pairs (Spanish-English, Nepali-English, Hindi-English, and Modern Standard Arabic-Egyptian Arabic). The dataset completes four tasks, namely language identification, named entity recognition, part of speech labeling, and sentiment analysis. The dataset also provides scores for different popular models, including LSTM, ELMo, and multilingual BERT, so that the NLP community can compare with the most advanced systems.