HyperAIHyperAI
17 days ago

RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

{Mikhail Gronas, Anna Rumshisky, Anna Rogers, Alex Gribov, Alexey Romanov, Svitlana Volkova}
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
Abstract

This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss{'} kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.