Skip navigation

Please use this identifier to cite or link to this item:
Title: Automatic Thesaurus Construction for Amharic Text Retrieval
???metadata.dc.contributor.*???: Dr. Million Meshesha
Mekonnen, Andargachew
Keywords: Amharic Thesaurus;Information Retrieval;Wordspace
Issue Date: Jul-2009
Publisher: Addis Abeba University
Abstract: Thesauri have been used for literary composition since their inception in 1852, but nowadays their primary use is for information retrieval. Even they are among the crucial components of retrieval systems which are typically used for enhancing indexing operations and query expansions during searching. Even though Amharic language has been a written language for a couple of centuries and huge volumes of Amharic electronic documents are accumulated, not much has been done towards the development of effective and efficient Amharic retrieval systems. In this research work much effort has been exerted to generate thesaurus automatically for text retrieval in order to help the development of an effective and efficient Amharic retrieval system. The development of the automatic thesaurus generation system is based on the WORDSPACE model. The WORDSPACE model is derived from the inverted file index by applying Random Projection algorithm for dimensionality reduction. Nearest Neighboring clustering algorithm is employed to generate thesaurus automatically from the WORDSPACE model constructed. An encouraging result is obtained in the experimentation of the system on Amharic Bible documents. During experimentation the accuracy of the automatically generated thesaurus is evaluated. The result on a random sample of ten terms shows that the system has accuracy of 58%. To further investigate its applicability for Amharic information retrieval, the thesaurus is integrated to an IR system for query expansion. The retrieval system is tested with and without using thesaurus in order to show the improvement made in retrieval effectiveness. Performance analysis shows that the recall of the system while using thesaurus is superior to not using it. The average recall values are 73.34% and 37.29% after and before using thesaurus for query expansion, respectively.
Appears in Collections:Thesis - Information Science

Files in This Item:
File Description SizeFormat 
AndargachewMekonnenGezmu.pdf1.03 MBAdobe PDFView/Open
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.