Development of Morphological Analyzer for Afaan Oromoo Text
No Thumbnail Available
Date
2005-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Afaan Oromoo, which belongs to a branch of Afro-Asiatic languages family, is spoken
by more than 30 million people in Ethiopia and neighbor countries. It should have a good
solid works on its computational aspect especially in storage, processing and retrieval.
This study is an attempt on the development and implementation of morphological
analyzer for Afaan Oromoo text.
Reviews of •Afaan Oromoo morphology and its morphological analysis were made.
Sample corpuses of different size ranging from 6,977-48,497 were gathered from three
institutions. Documents were reviewed and discussions were made with expelis in the
field.
Emphasizing on the morphology of the language, a system that uses automatic
morphological analysis is developed. The system uses neither stem dictionary nor
morphological rules particular to the language. Rather it is based on corpus and learns
morphology using heuristic rules to guess the result for from the corpus itself.
The developed analyzer uses Linguistica beta2 as a main tool to decompose words with
in the text in to stem + affix and analyzes them applying a series of heuristics. Different
modifications and improvements were made on Linguistics beta2 so as to analyze Afaan
Oromoo words connection.
Using Alchemist, a gold-standard of smaIl size (1600 words) is developed to evaluate the
performance of the system. On experimenting with different corpus sizes, the system has
shown 92.8% of 48,497 words conectIy, which is very encouraging and satisfactory.
Description
Keywords
Morphological Analyzer