Development of Stemming Algorithm for Afaan Oromo Lanugage Text

dc.contributor.advisorAlemayehu, Nega
dc.contributor.authorMkeonnen, Wakshum
dc.date.accessioned2022-06-23T06:44:35Z
dc.date.accessioned2023-11-18T12:49:17Z
dc.date.available2022-06-23T06:44:35Z
dc.date.available2023-11-18T12:49:17Z
dc.date.issued2000-06
dc.description.abstractThis paper reports the design and development of a stemming algorithm for Afaan 01'011100 language. Reviews of Afaan 01'011100 morphology, stemming algorithms, and other relevant materials were made. In Afaan 01'011100, inflectional and derivational affixations are the major word formation processes. The initial design of the stemming algorithm was based on free-context conflation procedures following the longest-match suffix removal approach. An accuracy rate of 71% was obtained from this initial attempt. The improved algorithm incorporated suffix, context-sensitive, and recording rules in the procedures. Before stemming, functional and frequently occurring words, which were compiled as stoplist, are excluded from the input term(s) to increase the efficiency of the stemmer. Procedures for prefix removal and for conflation of words formed by reduplication of first syl lable are also components of the modified algorithm. Using the modified stemmer an accuracy rate of 92% was gained from the test based on a sample of 1061 words. The percentage of errors recorded as understemming and overstemming were reduced to 4.58% and 2.5% respectively from 10.5% and 17.5% for the first version. A substantial decrease in size of sample text is achieved from this stemmer. The morphological complexity of the language is the main sources of errors for the resulting inaccuracies of the stemming algoritlun. For further improvement of the stemmer therefore, detailed study of afaan 01'011100 morphology is helpful. The result of this study in general shows the possibility of employing a stemming algorithm for conflating Afaan Oromo words.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/32125
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectDevelopment of Stemming Algorithmen_US
dc.titleDevelopment of Stemming Algorithm for Afaan Oromo Lanugage Texten_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Wakshum Mekonnen.pdf
Size:
16 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: