Development of Stemming Algorithm for Tigrigna Text

dc.contributor.advisorAbebe Ermias (Ato)
dc.contributor.authorFisseha Yonas
dc.date.accessioned2018-11-30T12:42:15Z
dc.date.accessioned2023-11-18T12:44:08Z
dc.date.available2018-11-30T12:42:15Z
dc.date.available2023-11-18T12:44:08Z
dc.date.issued2011-06
dc.description.abstractThis paper presents the development of a rule-based stemming algorithm for Tigrigna. The algorithm is simple yet highly effective; it is based on a set of steps composed by a collection of rules. Each rule specifies the affixes to be removed; the minimum length allowed for the stem and a list of exceptions rules. In Tigrigna language there are many exceptions for making any stemming rule. The researcher has considered these exceptions in designing the stemmer. The deep study of the Tigrigna grammar as well as the analysis of the inflectional and derivational types of affixes of the language was necessary for this kind of thesis work. The stemmer was designed by new word classification according to their affixes. The stemming is performed using a rule-based algorithm that removes affixes. Research done for Tigrigna language and Tigrigna stemmer was taken in to consideration. It was necessary to conduct the research as the past research of Tigrigna language stemming is limited. By Analyzing the Tigrigna grammatical rules, the researcher decided to follow inflectional and derivational affix removal and designed a new rule-set for the Tigrigna stemmer. The goal of the research was to develop and document a new rule-based stemmer for the Tigrigna language. The Tigrigna stemmer was developed in Python programming language. The researcher tried to follow a simple structure in the algorithm, creating x small rule-sets for similar affixes, which are working as Rule-sets on the input words. The stemmer was evaluated using error counting method. The system was tested and evaluated based on the counting of actual understemming and overstemming errors using a total of 5437 word variants derived from two data sets. Results show that the stemmer has 85.8 % accuracy for the first dataset and 86.3% accuracy for the second dataset and average accuracy of 86.1%. The proposed method generates some errors. The average error rate is about 13.9%.These errors were analyzed and classified into two different categories (overstemming and understemming). Most of the errors occurred due to overstemming of words.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14768
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectStemming Algorithemen_US
dc.titleDevelopment of Stemming Algorithm for Tigrigna Texten_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Yonas Fisseha.pdf
Size:
481.88 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: