Afaan Oromo Morphological Analysis a Hybrid Approach
No Thumbnail Available
Date
2021-12-13
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This study provides relatively detailed information on developing Afaan Oromo morphological
analysis system. Morphological analyzer decomposes words into its components called
morphemes and annotates those morphemes with grammatical information. Although the module
uses machine-learning approach on morphological analysis, it used rule-based approach to
segments words into its small components, morphemes. The developed prototype focused on
inflectional forms of nominals (nouns and adjectives) and verbs since the two words classes are
mostly the ones that undergo inflection, they determine the inflectional characteristics of the
language. The protype was developed using python programming and Hidden Markov Model
(HMM). The Viterbi algorithm is used to encode the HMM model.
Then, the prototype was trained and tested using representative data. A corpus of size 4,320
nouns and 3,780 verbs are used to train the HMM model. Then the performance of the analyser
was tested using 480 nouns and 420 verbs.
Generally, the accuracy of the analyzer for nouns and verbs is 84.6 % and 82.9% respectively.
The result of the experiment was quite satisfactory, which can be improved by incorporating
simple grammatical constraints and contextual information (including information encoded in
tonal system) to minimize the ambiguities, words root database to reduce errors during
morphemes identification and additional data to emphasis the initial probability of the model.
The key limitations in this effort are limited funding opportunities, scarcity gold standard and
balanced annotated data sets and inherently multiple sources of ambiguity of the language at
different levels.
Description
Keywords
Afaan Oromo, Hidden Markov Model, Machine Learning, Morphological Analysis, NLP