Automatic Thesaurus Construction for Amharic Text Retrieval
No Thumbnail Available
Date
2009-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Thesauri have been used for literary composition since their inception in 1852, but
nowadays their primary use is for information retrieval. Even they are among the
crucial components of retrieval systems which are typically used for enhancing
indexing operations and query expansions during searching.
Even though Amharic language has been a written language for a couple of
centuries and huge volumes of Amharic electronic documents are accumulated,
not much has been done towards the development of effective and efficient
Amharic retrieval systems. In this research work much effort has been exerted to
generate thesaurus automatically for text retrieval in order to help the
development of an effective and efficient Amharic retrieval system.
The development of the automatic thesaurus generation system is based on the
WOROSPACE model. The WOROSPACE model is derived from the inverted file
index by applying Random Projection algorithm for dimensionality reduction.
Nearest Neighboring clustering algorithm is employed to generate thesaurus
automatically from the WOROSPACE model constructed
An encouraging result is obtained in the experimentation of the system on
Amharic Bible documents. During experimentatIOn the accuracy of the
automatically generated thesaurus is evaluated The result on a random sample of
ten terms shows that the system has accuracy of 58%. To further investigate its
applicability for Amharic information retrieval, the thesaurus is integrated to an IR
system for query expansion. The retrieval system is tested with and without using
thesaurus in order to show the improvement made 111 retrieval effectiveness.
Performance analysis shows that the recall of the system while using thesaurus is
superior to not using it. The average recall values are 73.34% and 3729% after
and before using thesaurus for query expansion, respectively
keywords Amharic Thesaurus , WORDS PACE, Information Retrieval (IR)
Description
Keywords
Amharic Thesaurus, WORDS PACE, Information Retrieval, (IR)