Automatic Thesaurus Construction From Wolaytta Text
No Thumbnail Available
Date
2013-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Thesaurus is a set of terms for documents classification during indexing and query expansion during
the process of searching with the aim of enhancing retrieval effectiveness. The major problem
associated with information retrieval system: in one hand, users are required to explicitly describe their
information need to the system, on the other the system itself often retrieve irrelevant documents due to
vocabulary mismatch between query term and index term. As information retrieval system compares
query term and index term at a lexical level, the mismatch is so pronounced to affect the retrieval
performance. Therefore thesaurus a means to the problem by providing precise and controlled
vocabulary of terms for indexing and searching there by resolve vocabulary mismatch.
Wolaytta is an official language of literacy in Ethiopia. Since the introduction of the Latin script in the
writing system in 1993, the language has evolved significantly from mere verbal communication to
means of instruction then to source of information. To use the language as source of information, the
retrieval system should be designed with enhanced capability in resolving what so ever mismatches that
arise between query term and index term.
This research thesis develops an automatic association thesaurus from Wolaytta text for possible
inception of enhanced retrieval system or to provide a frame work for the development of crosslanguage
retrieval system. The developed system is based on term-to-term co-occurrence based
automatically constructed association thesaurus from document corpora. In order to obtain a reasonably
good performance the system incorporated manual approaches regarding stop words and suffix list
compilation processes and achieved a better result in generating related concepts.
Description
Keywords
Construction From Wolaytta Text