6 months ago

Abstract

Part-of-speech tagging is a fundamental task that provides the elemental structure and content information for additional natural language processing. Although Part-of-speech tagging problems have traditionally been formulated as sequential labeling tasks, none of the proposed ensemble approaches have focused on sequence alignment during post-processing. Herein, we present a weighted ensemble technique using a sequence alignment approach for a Part-of-speech tagger. Through this technique, we introduce a simple but powerful post-processor, which is a sub-sequence selector using a similarity score calculated through sequence alignment methods. Such methods are based on an existing DNA alignment approach applied toward natural language. Experiments were conducted using an ensemble of sequence alignment methods with three different sub-sequence units, i.e., the sequence, word, and character span. Experiments on English and Korean datasets show that our sequence alignment ensemble technique outperforms a basic hard voting method. Most of the results of the ensemble sequence alignment approach with various sub-sequence units showed an increase in F1-score over hard voting. F1-score increased up to 0.36 for the general hard voting method on the test dataset.

Source PDF View Code