Development of A Stemmer for Afaraf Text Retrieval

Taha, Osman

Development of A Stemmer for Afaraf Text Retrieval

Files

Osman Taha.pdf (1.64 MB)

Date

2015-11

Authors

Taha, Osman

Publisher

Addis Ababa University

Abstract

This study describes the design of a stemming algorithm for Afaraf text retrieval system. Nowadays, a considerable amount of electronical information has produced in Afaraf. Information retrieval system is a mechanism that enables users to retrieve relevant unstructured information material from large collection. The Afaraf morphology leterature reviewed in order to develop the rule-based stemmer. Each natural language has words structure in its own forms, that are different prefixes and suffixes, which need special handling of affixes with specific rules. The rule based stemmer proposed based on grammar and dictionary of Afaraf, included Numbers (singular and plural), personal pronoun, adjectives, adverbs, verbal-noun, strong and weak verb, indefinite pronoun, conditional and subjunctive mood, linkage to remove suffixes and prefixes from the word and produce stem word. For this study text document corpus are prepared by the researcher used 300 text files of Afaraf documents, which collected from different school text books, Samara university modules, Qusebaa Maca magazines and other online and experiment is made by using eight different queries. Data pre-processing techniques of VSM involved for both document indexing and query text. The evaluation conducted on the stemmer shows that the accuracy is 65.65 % with error rate of 4.50% for over-stemming and 29.85% for under-stemming. The information retrieval system registered effective performance of 0.785 precision and 0.233 recall. It has been witnessed that the challenging task in developing a full-fledged Afaraf text retrieval system is handling morpholoical word variations. The performance of the system may increase if the performance of the stemming algorithm is improved and if standard test corpus is used.

Keywords

Development of A Stemmer

URI

http://etd.aau.edu.et/handle/123456789/14704

Collections

Health Informatics

Full item page

Development of A Stemmer for Afaraf Text Retrieval

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections