Automatic Thesaurus Construction for Amharic Text Retrieval

No Thumbnail Available

Date

2009-07

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Thesauri have been used for literary composition since their inception in 1852, but nowadays their primary use is for information retrieval. Even they are among the crucial components of retrieval systems which are typically used for enhancing indexing operations and query expansions during searching. Even though Amharic language has been a written language for a couple of centuries and huge volumes of Amharic electronic documents are accumulated, not much has been done towards the development of effective and efficient Amharic retrieval systems. In this research work much effort has been exerted to generate thesaurus automatically for text retrieval in order to help the development of an effective and efficient Amharic retrieval system. The development of the automatic thesaurus generation system is based on the WOROSPACE model. The WOROSPACE model is derived from the inverted file index by applying Random Projection algorithm for dimensionality reduction. Nearest Neighboring clustering algorithm is employed to generate thesaurus automatically from the WOROSPACE model constructed An encouraging result is obtained in the experimentation of the system on Amharic Bible documents. During experimentatIOn the accuracy of the automatically generated thesaurus is evaluated The result on a random sample of ten terms shows that the system has accuracy of 58%. To further investigate its applicability for Amharic information retrieval, the thesaurus is integrated to an IR system for query expansion. The retrieval system is tested with and without using thesaurus in order to show the improvement made 111 retrieval effectiveness. Performance analysis shows that the recall of the system while using thesaurus is superior to not using it. The average recall values are 73.34% and 3729% after and before using thesaurus for query expansion, respectively keywords Amharic Thesaurus , WORDS PACE, Information Retrieval (IR)

Description

Keywords

Amharic Thesaurus, WORDS PACE, Information Retrieval, (IR)

Citation