Developing a Stemming Algorithm for Awngi Text: A Longest match approach

No Thumbnail Available

Date

2013-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Stemming is the process of removing the affixes from surface words, without doing complete morphological analysis. Stemming is a procedure to reduce all words with the same stem to a common form. It is useful in many areas of computational linguistics and information-retrieval work. In this paper we present the development of a stemmer for Awngi language that reduces words to their stem. The main objective of this research is to investigate the possibility of applying automatic term conflation and finding the optimal way of developing a stemming algorithm for the language. We apply longest match approach supplemented by context-sensitive and recoding matching principle. The stemmer is evaluated on Awngi text from three domains; news articles, text books and a dictionary. According to the evaluation of the stemmer, it is concluded that an overall accuracy of 91.41% is achieved which is a very good result as it is the first attempt to develop the algorithm. As the stemmer is the first kind for Awngi language, 8.59 % error is a number that can be minimized by introducing more rules and exceptional rules. Further research is not only required in the algorithm but also in the morphological structure of Awngi language.

Description

Keywords

Developing a Stemming Algorithm

Citation