Amharic Parts-of-Speech Tagger using Neural Word Embeddings as Features
No Thumbnail Available
Date
2019-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The parts-of-speech (POS) tagging for Amharic language is not matured yet to be used as one
important component in other natural language processing (NLP) applications. Previous studies
done on Amharic POS tagger used hand-crafted features to develop tagging models. In Amharic
language, prepositions and conjunctions usually are attached with the other parts-of-speech. This
forces the tags to represent more than one basic information and also decrease the total number of
instances in the training corpus. In addition, the manual design of features requires longer time,
more labor and linguistic background.
In this study, automatically generated neural word embeddings are used as features for the
development of an Amharic POS tagger. Neural word embeddings are multi-dimensional vector
representations of words. The vector representations capture syntactic and semantic information
about words. Another additional aspect in this study is, prepositions and conjunctions attached
with the other parts-of-speech are segmented using HornMorpho morphological analyzer. Stateof-
the-art deep learning algorithms are also used to develop tagging models. Long Short-Term
Memory (LSTM) recurrent neural networks and their bidirectional versions (Bi-LSTM RNNs) are
used to develop tagging models from the possible deep learning algorithms.
The maximum evaluation result observed is 93.67% F-measure obtained from the model
developed by using Bi-LSTM recurrent neural network. From the results obtained, it can be
observed that word embeddings generated by neural networks can replace manually designed
features which is an important advantage. Segmenting prepositions and conjunctions attached with
the other parts-of-speech also improved the accuracy of the POS tagger by more than 5%. The
accuracy improvement of the POS tagger is obtained from the increased total number of instances
and decreased number of tags due to segmentation.
Description
Keywords
natural language processing, POS tagger, neural word embeddings, segmentation, deep learning, recurrent neural networks, Amharic