Amharic Parts-of-Speech Tagger using Neural Word Embeddings as Features

No Thumbnail Available

Date

2019-01

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

The parts-of-speech (POS) tagging for Amharic language is not matured yet to be used as one important component in other natural language processing (NLP) applications. Previous studies done on Amharic POS tagger used hand-crafted features to develop tagging models. In Amharic language, prepositions and conjunctions usually are attached with the other parts-of-speech. This forces the tags to represent more than one basic information and also decrease the total number of instances in the training corpus. In addition, the manual design of features requires longer time, more labor and linguistic background. In this study, automatically generated neural word embeddings are used as features for the development of an Amharic POS tagger. Neural word embeddings are multi-dimensional vector representations of words. The vector representations capture syntactic and semantic information about words. Another additional aspect in this study is, prepositions and conjunctions attached with the other parts-of-speech are segmented using HornMorpho morphological analyzer. Stateof- the-art deep learning algorithms are also used to develop tagging models. Long Short-Term Memory (LSTM) recurrent neural networks and their bidirectional versions (Bi-LSTM RNNs) are used to develop tagging models from the possible deep learning algorithms. The maximum evaluation result observed is 93.67% F-measure obtained from the model developed by using Bi-LSTM recurrent neural network. From the results obtained, it can be observed that word embeddings generated by neural networks can replace manually designed features which is an important advantage. Segmenting prepositions and conjunctions attached with the other parts-of-speech also improved the accuracy of the POS tagger by more than 5%. The accuracy improvement of the POS tagger is obtained from the increased total number of instances and decreased number of tags due to segmentation.

Description

Keywords

natural language processing, POS tagger, neural word embeddings, segmentation, deep learning, recurrent neural networks, Amharic

Citation