Afaan Oromo-English Cross-Lingual Information Retrieval (Clir): A Corpus Based Approach
No Thumbnail Available
Date
2011-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The goal of Cross Language Information Retrieval (CLIR) is to provide users with access
to information that is in a different language from their queries. It has the ability to issue
a query in one language and retrieve documents in another. This is achieved by designing
a system where a query in one language can be compared with documents in another.
Afaan Oromo is one of the major languages that are widely spoken and used in Ethiopia.
Despite the fact that Afaan Oromo has a large number of speakers, little effort has been
put in conducting researches which aim at making English documents available to Afaan
Oromo speakers. This study is, therefore, an attempt to develop Afaan Oromo-English
CLIR system which enables Afaan Oromo native speakers to access and retrieve the vast
online information sources that are available in English by writing queries using their
own (native) language.
In this study, the development of a corpus-based CLIR system which makes use of wordbased
query translation for Afaan Oromo-English language pairs and evaluation of the
system on a corpus of test documents and queries prepared for this purpose is described.
This approach requires the availability of parallel documents hence such documents are
collected from Bible chapters, legal and some available religious documents.
Evaluation of the system is conducted by both monolingual and bilingual retrievals. In
the monolingual run, the Afaan Oromo queries are given to the system and Afaan Oromo
documents are retrieved while in the bilingual run the Afaan Oromo queries are given to
the system after being translated into English to retrieve English documents. For the
bilingual run translation of Afaan Oromo queries into their English equivalent is done by
using bilingual dictionary constructed from the collected parallel corpora.
The performance of the system was measured by recall and precision. In the first phase of
the experimentation, the maximum average precision value of 0.421and 0.304 are
obtained for the Afaan Oromo and English documents respectively. The second phase of
experimentation performs slightly better than the first. Maximum average precision value
of 0.468 and 0.316 are obtained for the Afaan Oromo and English documents
respectively. Therefore, with the use of large and cleaned parallel Afaan Oromo-English
document collections, it is possible to develop CLIR for the language pairs.
Description
Keywords
Cross-Lingual Information Retrieval (Clir)