Designing a Stemmer for Afaan Oromo Text: A Hybrid Approach

dc.contributor.advisorAbebe, Ermias (PhD)
dc.contributor.authorTesfaye, Debela
dc.date.accessioned2018-11-26T13:12:02Z
dc.date.accessioned2023-11-29T04:56:49Z
dc.date.available2018-11-26T13:12:02Z
dc.date.available2023-11-29T04:56:49Z
dc.date.issued2010-06
dc.description.abstractMost natural language processing systems use stemmer as a separate module in their architecture. Specially, it is very significant for developing, machine translator, speech recognizer and search engines. In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form. In this thesis work, a stemming system for Afan Oromo is presented. This system takes as input a word and removes its affixes according to a rule based algorithm. This stemmer is not enough to define every rule applied in Afan Oromo word formation. Therefore, N-gram is integrated with the rule to handle cases that are not covered by rule in the hybrid version of this stemmer. The algorithm follows the known Porter algorithm for the English language and it is developed according to the grammatical rules of the Afan Oromo, as they are described in a Grammatical sketch of Written Oromo (Mewis, 2001) and Caasluga Afaan Oromoo, Jildii-1 (Oromo, 1995). Afan Oromo morphology was studied and described in order to model the language and develop an automatic procedure for conflation. The inflectional and derivational morphologies of the language are discussed. The result of the study is a prototype context sensitive iterative stemmer for Afan Oromo. Error counting technique was employed to evaluate the performance of this stemmer. For testing purpose 198 sentences (with a total of 2458 words) is collected from different public Afaan Oromo newspapers and bulletins to make the testing set address variety of issues. An evaluation of the system shows that the algorithms accuracy works with better performance than other past stemming algorithms for Afan Oromo giving 95.73 percent correct results. Finally, possible extensions of the proposed system and further evaluation methods are briefly reviewed.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/14520
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectprocessing systems use stemmer as a separate moduleen_US
dc.titleDesigning a Stemmer for Afaan Oromo Text: A Hybrid Approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Debela Tesfaye.pdf
Size:
642.96 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: