Morphological Segmentation for Amharic Verb Class Using Recurrent Neural Network (RNN)
No Thumbnail Available
Date
2019-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Due to the dependency of higher-level NLP task on morphological analysis, lack of an appropriate
tool for morphological analysis is a major bottleneck for research work conducted on high level
NLP application such as machine translation, speech processing, text summarization and many
more. Currently most research works done on morphological analysis for morphologically rich
languages (MRL) like Amharic are based on techniques that require high supervisions or rely on
rule-based techniques that require detailed enumeration of the rule of the language to be crafted
manually. Both of these techniques require a high-quality data in terms of capturing the rules that
exist in the language and it also require a significant quantity of training data for better
generalization. Both of these requirements are challenging to overcome due to the fact that
significant number of MRL like Amharic are under-resourced. Lack of training data in quality and
quantity is a major obstacle for research work in low resourced and morphological rich languages.
The low resource state and morphological complexity of the language demand techniques that can
provide better learning with relatively small number of example and be able to capture the
complexity of language. In this paper, we propose RNN based sequence-to-sequence model that
provides an encouraging performance in learning complex segmentation with small number of
example and with no linguistic annotation, using the state-of-the-art encoder-decoder architecture.
We have approached the problem of morphological segmentation as transformation task by
considering the surface word as an input and the segmentation as a transformation process to
produce a list of segmented morpheme. We prepared a training data by selecting different class of
verbs. We have explored different encoder unit in terms directionality, window size, encoder type
and size, and different data representation paradigm. The experiment showed that our model can
learn a (;Omplex segmentation with no linguistic annotation and with limited number of examples.
The model showed 74.2% accuracy on segmentation and 98.7% on morpheme boundary accuracy.
We have shown that it is possible to learn morphological segmentation without relying on
linguistic annotation. These contribute towards a general solution that can work on language other
than Amharic. Our work can be extended to include other common POS classes such as nouns and
adjective. Extending the work to include analysis would make the work more relevant for higher
level NLP applications such as machine translation, speech recognition and spell checker.
Key words: Recurrent Neural Network, Amharic Morphology, Encoder-Decoder Architecture.
Description
Keywords
Recurrent Neural Network, Amharic Morphology, Encoder-Decoder Architecture.