Automatic Thesaurus Construction For Tigrigna Text Retrieval
No Thumbnail Available
Date
2011
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Thesaurus is a list of related terms, which helps to solve the vocabulary problem in
information retrieval raised because authors and indexers use different terms for the same
concept. Searchers may have no skill in selecting good search terms. They may use
vocabulary for submitting a query that is different from the one indexed in the
system. So, they may not get good results although there are some related
documents in the collection. Therefore, it is reasonable to expand query terms with
additional related terms drawn from a thesaurus.
Tigrigna is a language in the Ethio-Semitic family spoken mainly in Tigray region of
Ethiopia and in Eritrea. Currently, the size of electronic documents in Tigrigna language
is increasing significantly. Robust retrieval tools would therefore be needed in order to
use these documents. As thesaurus is an important component of information retrieval,
studies have been conducted on automatic thesaurus construction for Tigrigna
Information Retrieval.
Even though automatic thesaurus construction has its own drawback, it is better than the
alternative manual construction. In this thesis, an automatic approach to Tigrigna
thesaurus construction from document collection based on term to term co-occurrence
matrix is introduced. An encouraging result is obtained in the experimentation of the
system on Tigrigna documents. The result on a random sample of terms shows that the
system has accuracy of 75.28%.
Description
Keywords
Automatic Thesaurus Construction