The Application of Information Retrieval Techniques to Amharic Documents on the Web

No Thumbnail Available

Date

2001-07

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

The World Wide Web is an escalating mass of interconnected data that stretches from computer to computer across the world. Information retrieval systems on the Web provide users with relevant information without human intervention, saving time, labor and money. The Web contains documents of diverse content in different languages. Making those documents accessible to users has become a difficult task with the fast growth of the Web. Hence developing information retrieval systems to cope with inherent features of Web data has been a research area of tile time in information science. In this study an attempt is made to explore the possibilities of applying some information retrieval techniques for Amharic documents on the Web. To back tile research, literature review on related works has been made. Different information retrieval techniques and algorithms used on other languages have been reviewed to determine the possibilities of applying them to Amharic documents on the Web. A database that stores Amharic Web page data, suffix list and index files has been designed. Web page submission form was developed to allow the submission of Web page data into the database. Designing an Amharic •query input interface was also part of the research. Automatic indexing and searching techniques have been applied on a collection of 313 Web pages of Amharic documents taken from Walta Information Center news publications. Word and stem inverted index options were explored. An Amharic search interface was then created to handle Amharic data on the Web using ColdFusion Studio and ColdFusion Server 4.0 on Windows NT 4.0 Operating System and Internet Information Server (liS). The searching algorithm that was implemented is Expended Boolean model, which is a Boolean model with a vector functionality that allowed to rank retrieved documents. To measure tile performance of the prototype system, retrieval experiments have been conducted for twenty-two queries and an average recall-precision graph is drawn. Using terms with suffixes and prefixes removed resulted in a better performance than using words Finally, conclusions are drawn based on the test results obtained and recommendations are made as 10 what further researches could be done for the development of Amharic information retrieval systems on the Web.

Description

Keywords

Information Science

Citation