Bidirectional Amharic-Afaan Oromo Machine Translation Using Hybrid Approach

No Thumbnail Available

Date

3/3/2020

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Machine translation is the area of Natural Language Processing (NLP) that focuses on obtaining a target language text from a source language text by means of automatic techniques. Machine translation is a multidisciplinary field and the challenge has been approached from various points of view including linguistic and statistics. Hybrid methods focus on combining the best properties of two or more machine translation approaches. Nowadays, it has become very popular to include rules in statistical machine translation approaches. In this study, a bidirectional Amharic-Afaan Oromo machine translation system using hybrid approach has been developed. The system has four components: sentence reordering, language model, decoding and translation model. The sentence reordering is used to pre-process the structure of the source language to be more similar to the structure of the target language by using their Part of Speech (POS) tagging and to better guide the statistical engine. Since there are no publicly available POS tagger tools for both Amharic and Afaan Oromo languages, tagged corpus is prepared manually. The linguistic background and nature of the two languages have been studied in order to design the reordering rules for different types of Amharic/Afaan Oromo phrases and sentences. Language models by using IRSTLM tool and translation models by using GIZA++ have been developed for Afaan Oromo and Amharic languages because the system is bidirectional. A decoder has been used to find the best translation in the target language (Amharic/Afaan Oromo) for a given source language (Afaan Oromo/Amharic) based on the translation and language models. To check the accuracy of the system, two experiments were conducted using two different approaches. The first experiment is conducted by using a statistical approach to translate Amharic to Afaan Oromo and vice versa and has a BLEU score of 89.39% and 80.33% respectively. The second experiment is carried out by using a hybrid approach and has a BLEU score of 91.56% and 82.24% for Amharic to Afaan Oromo and Afaan Oromo to Amharic translation respectively. The result shows that the hybrid approach is slightly better than the statistical approach.

Description

Keywords

Machine Translation, Statistical Machine Translation, Hybrid Machine Translation, Reordering Rule

Citation

Collections