Automatic Thesaurus Construction For Tigrigna Text Retrieval

No Thumbnail Available

Date

2011

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Thesaurus is a list of related terms, which helps to solve the vocabulary problem in information retrieval raised because authors and indexers use different terms for the same concept. Searchers may have no skill in selecting good search terms. They may use vocabulary for submitting a query that is different from the one indexed in the system. So, they may not get good results although there are some related documents in the collection. Therefore, it is reasonable to expand query terms with additional related terms drawn from a thesaurus. Tigrigna is a language in the Ethio-Semitic family spoken mainly in Tigray region of Ethiopia and in Eritrea. Currently, the size of electronic documents in Tigrigna language is increasing significantly. Robust retrieval tools would therefore be needed in order to use these documents. As thesaurus is an important component of information retrieval, studies have been conducted on automatic thesaurus construction for Tigrigna Information Retrieval. Even though automatic thesaurus construction has its own drawback, it is better than the alternative manual construction. In this thesis, an automatic approach to Tigrigna thesaurus construction from document collection based on term to term co-occurrence matrix is introduced. An encouraging result is obtained in the experimentation of the system on Tigrigna documents. The result on a random sample of terms shows that the system has accuracy of 75.28%.

Description

Keywords

Automatic Thesaurus Construction

Citation