Development of Morphological Analyzer for Afaan Oromoo Text

No Thumbnail Available

Date

2005-07

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Afaan Oromoo, which belongs to a branch of Afro-Asiatic languages family, is spoken by more than 30 million people in Ethiopia and neighbor countries. It should have a good solid works on its computational aspect especially in storage, processing and retrieval. This study is an attempt on the development and implementation of morphological analyzer for Afaan Oromoo text. Reviews of •Afaan Oromoo morphology and its morphological analysis were made. Sample corpuses of different size ranging from 6,977-48,497 were gathered from three institutions. Documents were reviewed and discussions were made with expelis in the field. Emphasizing on the morphology of the language, a system that uses automatic morphological analysis is developed. The system uses neither stem dictionary nor morphological rules particular to the language. Rather it is based on corpus and learns morphology using heuristic rules to guess the result for from the corpus itself. The developed analyzer uses Linguistica beta2 as a main tool to decompose words with in the text in to stem + affix and analyzes them applying a series of heuristics. Different modifications and improvements were made on Linguistics beta2 so as to analyze Afaan Oromoo words connection. Using Alchemist, a gold-standard of smaIl size (1600 words) is developed to evaluate the performance of the system. On experimenting with different corpus sizes, the system has shown 92.8% of 48,497 words conectIy, which is very encouraging and satisfactory.

Description

Keywords

Morphological Analyzer

Citation