Development of a Stemming Algorithm for Afaan Oromoo Text
No Thumbnail Available
Date
2000-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This paper reports the design and development of a stemming algorithm for Afaan 01"011100
language. Reviews of Afaan Oromoo morphology, stemming algorithms, and other relevant
materials were made. In Afaan Oromoo, inflectional and derivational affixations are the
major word formation processes. The initial design of the stemming algorithm was based on
free-context conflation procedures following the longest-match suffix removal approach. An
accuracy rate of 71 % was obtained from this initial attempt. The improved algorithm
incorporated suffix, context-sensitive, and recording reles in the procedures. Before
stemming, functional and frequently occurring words, which were compiled as stop list, are
excluded from the input term(s) to increase the efficiency of the stemmer. Procedures for
prefix removal and for conflation of words formed by reduplication of first syllable are also
components of the modified algorithm . Using the modified stemmer an accuracy rate of92%
was gained from the test based on a sample of 1061 words. The percentage of errors
recorded as understenuning and over stemming were reduced to 4.58% and 2.5%
respectively from 10.5% and 17.5% for the first version. A substantial decrease in size of
sample text is achieved from this stemmer. The morphological complexity of the language is
the main sources of errors for the resulting inaccuracies of the stemming algorithm. For
further improvement of the stemmer therefore, detailed study ofaan oromoo morphology
is helpful. The result of this study in general shows the possibility of employing a stemming
algorithm for conflating Afaan orpmpp words.
Description
Keywords
Information Science