Afaan Oromo –Amharic Cross Lingual Information Retrieval: Acorpus Based Approach

No Thumbnail Available

Date

2013-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Ethiopia is a multi lingual country with over 80 distinct languages, and with a population size of more than 73.9 million as authorities estimated on the basis of the 2007 census (Bloor, 1995). In multilingual countries like Ethiopia it‟s not uncommon to see language barriers while seeking information in language other than ones mother tongue. Afaan Oromo (also known as „Oromiffa‟) is one of the languages that are widely used and spoken in Ethiopia by the Oromo people which account up to 36.7% of the total population (Commission, 2008). Currently Afaan Oromo is an official language of Oromia regional state. On the other hand, the current official language of Federal Democratic Republic of Ethiopia is Amharic. However, there are people who are not fluent enough to create Amharic query terms but need Amharic documents for different reasons. An IR system capable of breaking language barrier in retrieval of information would clearly be helpful for such a user. This study is therefore aimed at designing and developing a corpus based Afaan Oromo–Amharic cross lingual information retrieval system so as to enable Afaan Oromo speakers to retrieve Amharic information using Afaan Oromo queries. The approach selected to be followed in the study is corpus based, particularly parallel corpus. For this study parallel documents including news articles, bible, legal documents and proclamations from customs authority were used. The system is tested with 50 queries and 50 randomly selected documents. Two experiments were conducted, the first one by allowing only one possible translation to each Afaan Oromo query term and the second by allowing all possible translations. The retrieval effectiveness of the system is measured using recall and precision for both monolingual and bilingual runs. Accordingly, the first experiment returned a maximum average precision of 0.81 and 0.45 for monolingual (Afaan Oromo queries) and bilingual (translated Amharic queries) run. The result of the second experiment showed better result of recall and precision than the first experiment. The result obtained in the second experiment is a maximum average precision of 0.60 for the bilingual run and the result for the monolingual run remained the same. From these results, it can be concluded that, cross lingual information retrieval for two local languages namely Afaan Oromo and Amharic could be developed and the performance of the retrieval system could be increased with use of larger and clean corpora. Key Words: Afaan Oromo-Amharic Cross-Lingual Information Retrieval, Information Retrieval, Afaan Oromo, Amharic

Description

Keywords

Afaan Oromo-Amharic, Cross-Lingual Information Retrieval, Information Retrieval, Afaan, Oromo, Amharic.

Citation