Probabilistic Information Retrieval System for Amharic Language

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Nowadays, a considerable amount of information has been produced in Ethiopia. This accumulation of information is challenging for archival and searching from the existing huge amount of information particularly written in Amharic language. Thus, developing an information retrieval (IR) system for Amharic language allows searching and retrieving relevant documents that satisfy information need of users. Accordingly, few IR systems have been developed. However, those IR systems have not registered a promising performance because they are developed based on vector space model that do not have the mechanism to define user’s information need using relevance feedback and query reformulation techniques unless other modules are integrated. Furthermore, the model does not define uncertainty that exists in IR systems. In order to solve these issues, probabilistic retrieval model that has the capability of reweighting query terms based on relevance feedback can be used. In this research, a probabilistic based IR system is developed for Amharic language. Both indexing and searching module was constructed. In these modules, different text operations such as: tokenization, normalization, stemming and stop word removal are included. Then, the retrieval system is tested and the experimental results show that probabilistic based IR system returned encouraging result even without controlling the problem of synonyms and polysemous terms that exist in Amharic text. The system registered on the average 73% F-measure. Nevertheless, the performance of the system is greatly affected by synonyms and polysemous terms that exist in the language beside its richness in morphology (variant words). Keywords: Information Retrieval, Probabilistic Model, Amharic Language.



Information Retrieval, Probabilistic Model, Amharic Language