Geez to Amharic Machine Translation

No Thumbnail Available

Date

2018-05-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Natural Language Processing (NLP) is defined as ways for computers to analyze, understand, and derive meaning from human language in a smart and useful way. Machine Translation (MT) is one of the applications of NLP. It is the use of computers to translate from one natural language like Geez to another say Amharic. Natural languages may follow different word ordering during sentence formation for example Geez follows Subject + verb + object (SVO) and Verb + subject + object (VSO) while Amharic only follows SOV so alignment of the right Geez word with the Amharic word is of paramount importance to improve the translation quality. The purpose of this study to develop a Hybrid Geez to Amharic Machine Translation system using serial coupling of rule based Geez language word reordering followed by a standard Statistical Machine Translation (SMT) system. The proposed system is composed of two main components a Rule Based Geez Corpus Preprocessor and a Baseline SMT. The Rule Based Preprocessor takes the manually Part of Speech (POS) tagged Geez corpus and produces another corpus that contains reordered Geez sentences having similar structure with that of Amharic sentences. This component contains set of activities that process each Geez sentence in the input corpus one by one to determine POS pattern and subsequently apply the corresponding reordering rule. It first reads all sentences from the input file and iterates through all sentences and it first determines POS pattern and applies the corresponding reordering rule. After each sentence is processed the output corpus along with the Amharic corpus will be supplied as an input to the Baseline SMT. Then using the input corpora the actual translation of Geez sentence to Amharic sentences will be performed by the Decoder of the Baseline SMT by using the Language model of Amharic and Translation model. The translation quality of the proposed system is evaluated using BLEU evaluation metrics and compared with that of the Baseline SMT. Two experiments were conducted one to test the Baseline SMT and the other to test the proposed system. To test the Baseline SMT both Geez and Amharic corpus without POS were used while to test the proposed system Geez corpus with POS and Amharic corpus with no POS were used. Based on the test results the Baseline SMT scored a BLEU of 72% and the proposed system outscores it by 4% and scored 76% owing to the reordering rules applied on Geez corpus.

Description

Keywords

Geez to Amharic Machine Translation, Hybrid Machine Translation, Rule Based Word Reordering, Statistical Machine Translation, Part of Speech Tagging

Citation