Designing A Stemmer For Ge’ez Text Using Rule Based Approach

Addis Ababa University


In this study, a stemmer of Ge’ez text was developed. In designing processes, different concepts such as background for the thesis, literatures on conflation of the stemming algorithms, morphological nature of Ge’ez language, stemming techniques and other realted things were discussed in order to model and develop an automatic procedure for conflation. When inflectional and derivational morphologies of the language were discussed, affixations such as prefixing, infixing and suffixing are the main word formation processes in Ge’ez language. The language is morphologically complex. This is because different words can be formed due to the wide concatenations of affixes. For the experiment, two techniques were used: affix removal and morphological analysis techniques. To evaluate the stemmer, manually error counting technique was used. From the experiment, three types of errors are observed: over stemmed (6%), under stemmed (4.27%) and structural problems (7.31%). When the stemmer runs on the sample texts, it performed with an accuracy of 82.42%. The dictionary reductions of the stemmer were 29.9% to the stemmed words and 62.8% to root words. Lastly, the possible recommendations to future works and improvements of this work were reported.



