Development of Stemming Algorithm for Wolaytta Text

dc.contributor.advisorGetachew, Mesfin
dc.contributor.advisorAlemu, Atelach
dc.contributor.advisorEngdashet, Haile Eyesus(PhD)
dc.contributor.authorLessa, Lemma
dc.date.accessioned2018-11-27T06:47:36Z
dc.date.accessioned2023-11-18T12:44:09Z
dc.date.available2018-11-27T06:47:36Z
dc.date.available2023-11-18T12:44:09Z
dc.date.issued2003-06
dc.description.abstractThis study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. Since it is the nature and characteristics of affixation that guide the development of stemmer, the Wolaytta language morphology was studied and described in order to model the language and develop an automatic procedure for conflation. The inflectional and derivational morphologies of the language are discussed. It is indicated that suffixation is the main word formation process in Wolaytta language. It is also attempted to show that the language is morphologically complex and uses extensive concatenation of suffixes. The result of the study is a prototype context sensitive iterative stemmer for Wolaytta language. Error counting technique was employed to evaluate the performance of this stemmer. The stemmer was trained on 3537 words (80% of the sample text) and the improved version reveals an accuracy of 90.6% on the training set. The number of over stemmed and understemmed words on the training set were 8.6% (304 words) and 0.8% (28 words) respectively. When the stemmer runs on the unseen sample of 884 words (20% of the sample text), it performed with an accuracy of 86.9%. The percentage of errors recorded as understemmed and overstemmed on this unseen (test set) were 9% and 4.1%, respectively. Moreover, a dictionary reduction of 38.92% was attained on the test set. The major sources of errors are also reported with possible recommendations to further improve the performance of the stemmer and also for further research.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14533
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectStemming Algorithemen_US
dc.titleDevelopment of Stemming Algorithm for Wolaytta Texten_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Lemma Lessa.pdf
Size:
495.78 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: