Optimal Alignment for Bi-directional Afaan Oromo-English Statistical Machine Translation

No Thumbnail Available

Date

2017-06

Journal Title

Journal ISSN

Volume Title

Publisher

A.A.U

Abstract

Statistical machine translation is an approach that mainly use parallel corpus for translation, in which parallel corpus alignment of the given corpus is crucial point to have better translation performance. Alignment quality is a common problem for statistical machine translation because, if sentences are miss aligned the performance of the translation processes becomes poor. This study aims to explore the effect of word level, phrase level and sentence level alignment on bi- Directional Afaan Oromo-English statistical machine translation. In order to conduct the study the corpus was collected from different sources such as criminal code, FDRE constitution, Megleta Oromia and Holly Bible. In order to make the corpus suitable for the system different preprocessing tasks applied such as true casing, sentence splitting and sentence merging has been done. A total of 6400 simple and complex sentences are used in order to train and test the system. We use 9:1 ratio for training and testing respectively. For language model we used 19300 monolingual sentence for English and 12200 for Afaan Oromo. For the purpose of the system we used Mosses for Mere Mortal for translation process, MGIZA++, Anymalign and hunalign tools for alignment and IRSTLM for language model. After preparing the corpus different experiments were conducted. Experiment results shows that better performance of 47% and 27% BLUE score was registered using phrase level alignment with max phrase length 16 from Afaan Oromo-English and from English-Afaan Oromo translation, respectively. This depicts an improvement of on the average 37 % accuracy registered in this study. The reason for this score is length of phrase level aligned corpus handle word correspondence. This depicts that alignment has a great effect on the accuracy and quality of statistical machine translation from Afaan Oromo-English and the reverse. During machine translation alignment of a text of multiple language have different correspondence, one-one, one-many, many-one and many-many alignment. In this study, many-many alignment is a major challenge at phrase level that needs further investigation.

Description

A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of The Requirements for the Degree of Master Of Science in Information Science

Keywords

Afaan Oromo, Phrase level alignment, Sentence level alignment

Citation