Morphological Segmentation for Amharic Verb Class Using Recurrent Neural Network (RNN)

Wondimagegnhue Tsegaye

Morphological Segmentation for Amharic Verb Class Using Recurrent Neural Network (RNN)

Files

Wondimagegnhue Tsegaye.pdf (43.4 MB)

Date

2019-09

Authors

Wondimagegnhue Tsegaye

Publisher

Addis Ababa University

Abstract

Due to the dependency of higher-level NLP task on morphological analysis, lack of an appropriate tool for morphological analysis is a major bottleneck for research work conducted on high level NLP application such as machine translation, speech processing, text summarization and many more. Currently most research works done on morphological analysis for morphologically rich languages (MRL) like Amharic are based on techniques that require high supervisions or rely on rule-based techniques that require detailed enumeration of the rule of the language to be crafted manually. Both of these techniques require a high-quality data in terms of capturing the rules that exist in the language and it also require a significant quantity of training data for better generalization. Both of these requirements are challenging to overcome due to the fact that significant number of MRL like Amharic are under-resourced. Lack of training data in quality and quantity is a major obstacle for research work in low resourced and morphological rich languages. The low resource state and morphological complexity of the language demand techniques that can provide better learning with relatively small number of example and be able to capture the complexity of language. In this paper, we propose RNN based sequence-to-sequence model that provides an encouraging performance in learning complex segmentation with small number of example and with no linguistic annotation, using the state-of-the-art encoder-decoder architecture. We have approached the problem of morphological segmentation as transformation task by considering the surface word as an input and the segmentation as a transformation process to produce a list of segmented morpheme. We prepared a training data by selecting different class of verbs. We have explored different encoder unit in terms directionality, window size, encoder type and size, and different data representation paradigm. The experiment showed that our model can learn a (;Omplex segmentation with no linguistic annotation and with limited number of examples. The model showed 74.2% accuracy on segmentation and 98.7% on morpheme boundary accuracy. We have shown that it is possible to learn morphological segmentation without relying on linguistic annotation. These contribute towards a general solution that can work on language other than Amharic. Our work can be extended to include other common POS classes such as nouns and adjective. Extending the work to include analysis would make the work more relevant for higher level NLP applications such as machine translation, speech recognition and spell checker. Key words: Recurrent Neural Network, Amharic Morphology, Encoder-Decoder Architecture.

Keywords

Recurrent Neural Network, Amharic Morphology, Encoder-Decoder Architecture.

URI

https://etd.aau.edu.et/handle/123456789/4199

Collections

Information System

Full item page

Morphological Segmentation for Amharic Verb Class Using Recurrent Neural Network (RNN)

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections