AAU-ETD AAU-ETD
 

Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
Faculty of Informatics >
Thesis - Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/3515

Title: AFAAN OROMO-ENGLISH CROSS-LINGUAL INFORMATION RETRIEVAL (CLIR): A CORPUS BASED APPROACH
Authors: DANIEL, BEKELE
Advisors: Dr. Dereje Teferi
Keywords: Information science
Copyright: Jun-2011
Date Added: 30-Jul-2012
Publisher: AAU
Abstract: The goal of Cross Language Information Retrieval (CLIR) is to provide users with access to information that is in a different language from their queries. It has the ability to issue a query in one language and retrieve documents in another. This is achieved by designing a system where a query in one language can be compared with documents in another. Afaan Oromo is one of the major languages that are widely spoken and used in Ethiopia. Despite the fact that Afaan Oromo has a large number of speakers, little effort has been put in conducting researches which aim at making English documents available to Afaan Oromo speakers. This study is, therefore, an attempt to develop Afaan Oromo-English CLIR system which enables Afaan Oromo native speakers to access and retrieve the vast online information sources that are available in English by writing queries using their own (native) language. In this study, the development of a corpus-based CLIR system which makes use of wordbased query translation for Afaan Oromo-English language pairs and evaluation of the system on a corpus of test documents and queries prepared for this purpose is described. This approach requires the availability of parallel documents hence such documents are collected from Bible chapters, legal and some available religious documents. Evaluation of the system is conducted by both monolingual and bilingual retrievals. In the monolingual run, the Afaan Oromo queries are given to the system and Afaan Oromo documents are retrieved while in the bilingual run the Afaan Oromo queries are given to the system after being translated into English to retrieve English documents. For the bilingual run translation of Afaan Oromo queries into their English equivalent is done by using bilingual dictionary constructed from the collected parallel corpora. The performance of the system was measured by recall and precision. In the first phase of the experimentation, the maximum average precision value of 0.421and 0.304 are obtained for the Afaan Oromo and English documents respectively. The second phase of experimentation performs slightly better than the first. Maximum average precision value of 0.468 and 0.316 are obtained for the Afaan Oromo and English documents respectively. Therefore, with the use of large and cleaned parallel Afaan Oromo-English document collections, it is possible to develop CLIR for the language pairs.
Description: A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Scie
URI: http://hdl.handle.net/123456789/3515
Appears in:Thesis - Information Science

Files in This Item:

File Description SizeFormat
DANIEL BEKELE.pdf1.06 MBAdobe PDFView/Open

Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  Last updated: May 2010. Copyright © Addis Ababa University Libraries - Feedback