Amharic Text retrieval: an Experiment using Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD
No Thumbnail Available
Date
2003-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The increase in the amount of electronic information has caused increasing need
for efficient information retrieval techniques. Most techniques to retrieving textual
materials from databases depend on exact term match between terms in user’s
query and terms by which documents are indexed. However, since there are
usually many ways to express the same concept, the terms in the user’s query
may not appear in a relevant document. Alternatively, many words can have
more than one meaning. Due to these facts term matching methods are likely to
miss relevant documents and also retrieve irrelevant ones (Dumais, 1992; Berry,
Dumais & Letsche, 1995). The Latent Semantic Indexing (LSI) technique of
information retrieval can partially handle these problems by organizing terms and
documents into a “semantic” structure more appropriate for information retrieval.
This is done by modeling the inherent higher-order pattern in the association of
terms with documents.
In this thesis, the potential of LSI approach in Amharic text retrieval is
investigated. 206 Amharic documents and 25 queries were used to test the
approach. Automatic indexing of the documents resulted in 9256 unique terms
which are not in the stop-word list used for the research. A 110-factor SVD of the
term by document matrix is used for indexing and retrieval. Finally, the
performance of the LSI approach is compared with the standard vector space.
Except at one standard recall level (0.80) precision of the LSI approach was
above that of the standard vector space.
Description
Keywords
Text Retrieval