Developing a Stemming Algorithm for Awngi Text: A Longest match approach
No Thumbnail Available
Date
2013-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Stemming is the process of removing the affixes from surface words, without
doing complete morphological analysis. Stemming is a procedure to reduce all
words with the same stem to a common form. It is useful in many areas of
computational linguistics and information-retrieval work. In this paper we
present the development of a stemmer for Awngi language that reduces words
to their stem. The main objective of this research is to investigate the
possibility of applying automatic term conflation and finding the optimal way of
developing a stemming algorithm for the language.
We apply longest match approach supplemented by context-sensitive and
recoding matching principle. The stemmer is evaluated on Awngi text from
three domains; news articles, text books and a dictionary. According to the
evaluation of the stemmer, it is concluded that an overall accuracy of 91.41% is
achieved which is a very good result as it is the first attempt to develop the
algorithm.
As the stemmer is the first kind for Awngi language, 8.59 % error is a number
that can be minimized by introducing more rules and exceptional rules.
Further research is not only required in the algorithm but also in the
morphological structure of Awngi language.
Description
Keywords
Developing a Stemming Algorithm