The Application of Information Retrieval Techniques to Amharic Documents on the Web
No Thumbnail Available
Date
2001-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The World Wide Web is an escalating mass of interconnected data that stretches from computer
to computer across the world. Information retrieval systems on the Web provide users with
relevant information without human intervention, saving time, labor and money.
The Web contains documents of diverse content in different languages. Making those documents
accessible to users has become a difficult task with the fast growth of the Web. Hence
developing information retrieval systems to cope with inherent features of Web data has been a
research area of tile time in information science.
In this study an attempt is made to explore the possibilities of applying some information retrieval
techniques for Amharic documents on the Web. To back tile research, literature review on related
works has been made. Different information retrieval techniques and algorithms used on other
languages have been reviewed to determine the possibilities of applying them to Amharic
documents on the Web.
A database that stores Amharic Web page data, suffix list and index files has been designed.
Web page submission form was developed to allow the submission of Web page data into the
database. Designing an Amharic •query input interface was also part of the research.
Automatic indexing and searching techniques have been applied on a collection of 313 Web
pages of Amharic documents taken from Walta Information Center news publications.
Word and stem inverted index options were explored. An Amharic search interface was then
created to handle Amharic data on the Web using ColdFusion Studio and ColdFusion Server 4.0
on Windows NT 4.0 Operating System and Internet Information Server (liS).
The searching algorithm that was implemented is Expended Boolean model, which is a Boolean
model with a vector functionality that allowed to rank retrieved documents.
To measure tile performance of the prototype system, retrieval experiments have been
conducted for twenty-two queries and an average recall-precision graph is drawn. Using terms
with suffixes and prefixes removed resulted in a better performance than using words
Finally, conclusions are drawn based on the test results obtained and recommendations are
made as 10 what further researches could be done for the development of Amharic information
retrieval systems on the Web.
Description
Keywords
Information Science