Probabilistic Tigrigna-Amharic Cross Language Information Retrieval (Clir)
No Thumbnail Available
Date
2013-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Amharic is the official working language of the Federal Democratic Republic of Ethiopia. On
the other hand, Tigrigna is the working language of Tigray, which is one of the regional
states of Ethiopia. Furthermore Tigrigna is official working language of the neighboring
country Eritrea.
As considerable amount of information is being produced in Amharic language rapidly and
continuously; experimenting on the applicability of a cross language information retrieval
system for Tigrigna–Amharic is important. Because academicians, offices, researchers and
other individuals can be benefited. This study is, therefore, an attempt to develop Tigrigna-
Amharic Cross lingual Information Retrieval (CLIR) system which enables Tigrigna native
speakers to access and retrieve the online information sources that are available in Tigrigna
and Amharic by writing queries using their own (native) language.
Dictionary-based approach is employed to conduct the query term translation from Tigrigna
to Amharic. To this end, Tigrigna-Amharic machine-readable dictionary is used.
Because of the capability of handling the uncertain nature of Information Retrieval, the
probabilistic information retrieval model is employed. Both indexing and searching module
are constructed. In these modules, different text operations such as: tokenization,
normalization, stemming and stop word removal for both Tigrigna and Amharic languages
are included.
The performance of the system after User Relevance Feedback is measured using recall,
precision, and F-measure. The system registered an average recall of 84% and 93%, an
average precision of 75% and 64%, and average F-measure of 79% and 73% for Tigrigna and
Amharic languages respectively. Though the performance of the system is greatly affected by
the stemmer, the result obtained is encouraging,
Finally the recommendation can be drawn that the performance of the CLIR system can be
improved by designing good stemmer for both languages.
Description
Keywords
Amharic is the official working language, of the Federal Democratic Republic of Ethiopia