Morpheme-Based  Bi-Directional Ge’ez -Amharic  Machine Translation

Kassa, Tadesse

Morpheme-Based Bi-Directional Ge’ez -Amharic Machine Translation

dc.contributor.advisor	Meshesha, Million (PhD)
dc.contributor.author	Kassa, Tadesse
dc.date.accessioned	2019-07-29T12:24:23Z
dc.date.accessioned	2023-11-18T12:47:25Z
dc.date.available	2019-07-29T12:24:23Z
dc.date.available	2023-11-18T12:47:25Z
dc.date.issued	2018-10-04
dc.description.abstract	This study aims to explore the effect of morpheme level translation unit for bi-directional Ge’ez-Amharic machine translation. Using word as a translation unit is a problem in statistical machine translation while conducting translation between two morphologically rich languages such as Ge’ez and Amharic. At word level, data scarcity and unavailability of well prepared corpus is a challenge for under resourced language. And, at word level, it is difficult to manage many forms of a single word, not specific and lacks consistency. At morpheme level sub parts of words are specific, easy to manage specific parts and has consistency our many words of the same class. To conduct the experiment, parallel corpus was collected from online sources. Such Online sources include Old Testament of Holy bible and anaphora (or Kidase). The corpus include manually prepared bitext from Wedase Maryam, Anketse Berhane, yewedesewa melahekete, Kidan and Liton. To make the corpus suitable for the system, different preprocessing tasks such as tokenization, cleaning and normalization have been done. The data set contains a total of 13,833 simple and complex sentences, out of which 90% and 10% are used for training and testing, respectively. To build a language model for both languages we used 12, 450 parallel sentences. For both statistical and rule-based approachs we used Mosses for translation process, MGIZA++ for alignment of word and morpheme, morfessor and rules were used for morphological segmentation and IRSTLM for language modeling. After preparing and designing the prototype and the corpus, different experiments were conducted. Experimental results showed a better performance of 15.14% and 16.15% BLEU scores using morpheme-based from Geez to Amharic and from Amharic to Geez translation, respectively. As compared to word level translation there is on the average 6.77% and 7.73% improvement from Geez-Amharic and Amharic-Ge’ez respectively. This result further shows that morpheme-level translation performs better than word-level translation. As a result, using morpheme as a translation unit we conducted further experiment using unsupervised and rule-based morpheme segmentation approaches. Accordingly, the performance of rule-based morphological segmentation is better than unsupervised with an average BLEU score of 0.6% and 1.27% for Ge’ez to Amharic and Amharic to Ge’ez respectively. Alignments of Amharic and Ge’ez text have shown correspondence, such as one-one, one-to-many, many-one and many-many alignment. In this study, many-to-many alignment is the major challenge. So further research is needed to handle many-to-many, word order and morphology of the two languages.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/12345678/18688
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	SMT	en_US
dc.subject	Morpheme Level Alignment	en_US
dc.subject	Morfessor	en_US
dc.subject	Amharic	en_US
dc.subject	Geez	en_US
dc.title	Morpheme-Based Bi-Directional Ge’ez -Amharic Machine Translation	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Tadesse Kassa 2018.pdf
Size:: 4.37 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Information Sciences