HyperAIHyperAI
12 days ago

A study of N-gram and Embedding Representations for Native Language Identification

{Sowmya Vajjala, Sagnik Banerjee}
A study of N-gram and Embedding Representations for Native Language Identification
Abstract

We report on our experiments with N-gram and embedding based feature representations for Native Language Identification (NLI) as a part of the NLI Shared Task 2017 (team name: NLI-ISU). Our best performing system on the test set for written essays had a macro F1 of 0.8264 and was based on word uni, bi and trigram features. We explored n-grams covering word, character, POS and word-POS mixed representations for this task. For embedding based feature representations, we employed both word and document embeddings. We had a relatively poor performance with all embedding representations compared to n-grams, which could be because of the fact that embeddings capture semantic similarities whereas L1 differences are more stylistic in nature.

A study of N-gram and Embedding Representations for Native Language Identification | Latest Papers | HyperAI