Development of Stemming Algorithm for Tigrigna Text

Fisseha Yonas

Development of Stemming Algorithm for Tigrigna Text

dc.contributor.advisor	Abebe Ermias (Ato)
dc.contributor.author	Fisseha Yonas
dc.date.accessioned	2018-11-30T12:42:15Z
dc.date.accessioned	2023-11-18T12:44:08Z
dc.date.available	2018-11-30T12:42:15Z
dc.date.available	2023-11-18T12:44:08Z
dc.date.issued	2011-06
dc.description.abstract	This paper presents the development of a rule-based stemming algorithm for Tigrigna. The algorithm is simple yet highly effective; it is based on a set of steps composed by a collection of rules. Each rule specifies the affixes to be removed; the minimum length allowed for the stem and a list of exceptions rules. In Tigrigna language there are many exceptions for making any stemming rule. The researcher has considered these exceptions in designing the stemmer. The deep study of the Tigrigna grammar as well as the analysis of the inflectional and derivational types of affixes of the language was necessary for this kind of thesis work. The stemmer was designed by new word classification according to their affixes. The stemming is performed using a rule-based algorithm that removes affixes. Research done for Tigrigna language and Tigrigna stemmer was taken in to consideration. It was necessary to conduct the research as the past research of Tigrigna language stemming is limited. By Analyzing the Tigrigna grammatical rules, the researcher decided to follow inflectional and derivational affix removal and designed a new rule-set for the Tigrigna stemmer. The goal of the research was to develop and document a new rule-based stemmer for the Tigrigna language. The Tigrigna stemmer was developed in Python programming language. The researcher tried to follow a simple structure in the algorithm, creating x small rule-sets for similar affixes, which are working as Rule-sets on the input words. The stemmer was evaluated using error counting method. The system was tested and evaluated based on the counting of actual understemming and overstemming errors using a total of 5437 word variants derived from two data sets. Results show that the stemmer has 85.8 % accuracy for the first dataset and 86.3% accuracy for the second dataset and average accuracy of 86.1%. The proposed method generates some errors. The average error rate is about 13.9%.These errors were analyzed and classified into two different categories (overstemming and understemming). Most of the errors occurred due to overstemming of words.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/12345678/14768
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Stemming Algorithem	en_US
dc.title	Development of Stemming Algorithm for Tigrigna Text	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Yonas Fisseha.pdf
Size:: 481.88 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Information Sciences