Automatic Stemming For Amharic Text: An Experiment Using Successor Variety Approach
No Thumbnail Available
Date
2009-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The extensive use of the World Wide Web and the increasing digital availability of
information and documents accelerated the demand for technologies and tools for
an online data retrieval and extraction application. The natural language research,
with the aim of quick and reliable online information searching and access, is one
major component of the current advanced information technology development. In
this research, an indexing system was developed and programmed by using the
Successor Variety Stemming Algorithm to find stems for Amharic words. The
research has set out to discover whether the Successor Variety Stemming
Algorithm technique with the peak and plateau, entropy and complete word
methods can be used for the Amharic language or what the limitation would be. In
addition, the peak and plateau method compared with the entropy and the
complete words method. Stemming is typically used in the hope of improving the
accuracy of the search reducing the size of the index. A corpus of 6270 words was
obtained form the Ethiopian News Agency (ENA) and Walta Information Center
and used to train and test the methods.
The experiment result showed that, the peak and plateau method had a
performance of 71.8% level of accuracy, but the performance of the entropy and
complete word methods are 63.95% and 57.99% level of accuracy respectively.
Based on the observation made from the experimentation result, the successor
variety algorithm with the peak and plateau method had a better performance than
successor variety algorithm with the entropy method.
Description
Keywords
An Experiment Using Successor Variety Approach