Skip navigation

Please use this identifier to cite or link to this item:
Title: Optimal Alignment for Bi-directional Afaan Oromo-English Statistical Machine Translation
???metadata.dc.contributor.*???: Dr. Million Meshesha
Solomon, Yitayew
Keywords: SMT;Word Level Alignment;Phrase Level Alignment;Sentence Level Alignment
Issue Date: Jun-2017
Publisher: Addis Ababa University
Abstract: Statistical machine translation is an approach that mainly use parallel corpus for translation, in which parallel corpus alignment of the given corpus is crucial point to have better translation performance. Alignment quality is a common problem for statistical machine translation because, if sentences are miss aligned the performance of the translation processes becomes poor. This study aims to explore the effect of word level, phrase level and sentence level alignment on bi- Directional Afaan Oromo-English statistical machine translation. In order to conduct the study the corpus was collected from different sources such as criminal code, FDRE constitution, Megleta Oromia and Holly Bible. In order to make the corpus suitable for the system different preprocessing tasks applied such as true casing, sentence splitting and sentence merging has been done. A total of 6400 simple and complex sentences are used in order to train and test the system. We use 9:1 ratio for training and testing respectively. For language model we used 19300 monolingual sentence for English and 12200 for Afaan Oromo. For the purpose of the system we used Mosses for Mere Mortal for translation process, MGIZA++, Anymalign and hunalign tools for alignment and IRSTLM for language model. After preparing the corpus different experiments were conducted. Experiment results shows that better performance of 47% and 27% BLUE score was registered using phrase level alignment with max phrase length 16 from Afaan Oromo-English and from English-Afaan Oromo translation, respectively. This depicts an improvement of on the average 37 % accuracy registered in this study. The reason for this score is length of phrase level aligned corpus handle word correspondence. This depicts that alignment has a great effect on the accuracy and quality of statistical machine translation from Afaan Oromo-English and the reverse. During machine translation alignment of a text of multiple language have different correspondence, one-one, one-many, many-one and many-many alignment. In this study, many-many alignment is a major challenge at phrase level that needs further investigation.
Description: A Thesis Submitted in Partial Fulfillment of the Requirement for the Degree of Masters of Science in Information Science
Appears in Collections:Thesis - Information Science

Files in This Item:
File Description SizeFormat 
Yitayew Solomon.pdf2 MBAdobe PDFView/Open
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.