Bidirectional English-Amharic Machine Translation: An Experiment Using Constrained Corpus

No Thumbnail Available

Date

2013-03

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Natural language processing is the ability of computers to generate and interpret natural language. Machine translation is a sub-field of natural language processing that investigates the use of computer software to translate text or speech from one natural language to another. Ethiopia needs to cope up with the technology others are pursuing. Thus, the purpose of this study is to develop a bidirectional English-Amharic machine translation system using constrained corpus. This research work implemented the statistical machine translation approach. In order to realize the goal, two different corpora were prepared and collected; the first corpus consisted of simple sentences and the other, complex sentences. Two language models were developed, one for Amharic and the other for English so as to ensure a bi-directional translation. Translation models were built which assigns a probability that a given source language text generates a target language text. A decoder was used which searches for the shortest path and expectation maximization algorithm was used for aligning words in the accurate order. Experiments were carried out based on the dataset and results were recorded. The experiments were taken separately, one for the simple sentences and the other for complex sentences. The result obtained for the simple sentence using BLEU Score had an average of 82.22% accuracy for the English to Amharic, 90.59% for the Amharic to English and using the manual questionnaire preparation method, the accuracy from English to Amharic was 91% and from Amharic to English was 97%. For the complex sentences, the result acquired from the BLEU Score was approximately 73.38% for the English to Amharic, 84.12% for the Amharic to English and from the questionnaire method from English to Amharic was 87% and from Amharic to English was 89%. From this, we can see that the difference with the BLEU score and the questionnaire preparation method is not that visible so we can use both methods as reference. As a result, with a corpus that is very large and appropriately examined, a better translation could be achieved since more words will be available in the provided corpus with higher probability of a particular word preceding another. i

Description

Keywords

Bidirectional; English-Amharic Machine ;Translation

Citation

Collections