Afaan Oromo Morphological Analysis a Hybrid Approach

No Thumbnail Available

Date

2021-12-13

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

This study provides relatively detailed information on developing Afaan Oromo morphological analysis system. Morphological analyzer decomposes words into its components called morphemes and annotates those morphemes with grammatical information. Although the module uses machine-learning approach on morphological analysis, it used rule-based approach to segments words into its small components, morphemes. The developed prototype focused on inflectional forms of nominals (nouns and adjectives) and verbs since the two words classes are mostly the ones that undergo inflection, they determine the inflectional characteristics of the language. The protype was developed using python programming and Hidden Markov Model (HMM). The Viterbi algorithm is used to encode the HMM model. Then, the prototype was trained and tested using representative data. A corpus of size 4,320 nouns and 3,780 verbs are used to train the HMM model. Then the performance of the analyser was tested using 480 nouns and 420 verbs. Generally, the accuracy of the analyzer for nouns and verbs is 84.6 % and 82.9% respectively. The result of the experiment was quite satisfactory, which can be improved by incorporating simple grammatical constraints and contextual information (including information encoded in tonal system) to minimize the ambiguities, words root database to reduce errors during morphemes identification and additional data to emphasis the initial probability of the model. The key limitations in this effort are limited funding opportunities, scarcity gold standard and balanced annotated data sets and inherently multiple sources of ambiguity of the language at different levels.

Description

Keywords

Afaan Oromo, Hidden Markov Model, Machine Learning, Morphological Analysis, NLP

Citation