Development of A Stemmer for Afaraf Text Retrieval
No Thumbnail Available
Date
2015-11
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This study describes the design of a stemming algorithm for Afaraf text retrieval system. Nowadays, a considerable amount of electronical information has produced in Afaraf. Information retrieval system is a mechanism that enables users to retrieve relevant unstructured information material from large collection.
The Afaraf morphology leterature reviewed in order to develop the rule-based stemmer. Each natural language has words structure in its own forms, that are different prefixes and suffixes, which need special handling of affixes with specific rules. The rule based stemmer proposed based on grammar and dictionary of Afaraf, included Numbers (singular and plural), personal pronoun, adjectives, adverbs, verbal-noun, strong and weak verb, indefinite pronoun, conditional and subjunctive mood, linkage to remove suffixes and prefixes from the word and produce stem word.
For this study text document corpus are prepared by the researcher used 300 text files of Afaraf documents, which collected from different school text books, Samara university modules, Qusebaa Maca magazines and other online and experiment is made by using eight different queries. Data pre-processing techniques of VSM involved for both document indexing and query text.
The evaluation conducted on the stemmer shows that the accuracy is 65.65 % with error rate of 4.50% for over-stemming and 29.85% for under-stemming. The information retrieval system registered effective performance of 0.785 precision and 0.233 recall.
It has been witnessed that the challenging task in developing a full-fledged Afaraf text retrieval system is handling morpholoical word variations. The performance of the system may increase if the performance of the stemming algorithm is improved and if standard test corpus is used.
Description
Keywords
Development of A Stemmer