Morpheme-Based Bi-Directional Ge’ez -Amharic Machine Translation

dc.contributor.advisorMeshesha, Million (PhD)
dc.contributor.authorKassa, Tadesse
dc.date.accessioned2019-07-29T12:24:23Z
dc.date.accessioned2023-11-18T12:47:25Z
dc.date.available2019-07-29T12:24:23Z
dc.date.available2023-11-18T12:47:25Z
dc.date.issued2018-10-04
dc.description.abstractThis study aims to explore the effect of morpheme level translation unit for bi-directional Ge’ez-Amharic machine translation. Using word as a translation unit is a problem in statistical machine translation while conducting translation between two morphologically rich languages such as Ge’ez and Amharic. At word level, data scarcity and unavailability of well prepared corpus is a challenge for under resourced language. And, at word level, it is difficult to manage many forms of a single word, not specific and lacks consistency. At morpheme level sub parts of words are specific, easy to manage specific parts and has consistency our many words of the same class. To conduct the experiment, parallel corpus was collected from online sources. Such Online sources include Old Testament of Holy bible and anaphora (or Kidase). The corpus include manually prepared bitext from Wedase Maryam, Anketse Berhane, yewedesewa melahekete, Kidan and Liton. To make the corpus suitable for the system, different preprocessing tasks such as tokenization, cleaning and normalization have been done. The data set contains a total of 13,833 simple and complex sentences, out of which 90% and 10% are used for training and testing, respectively. To build a language model for both languages we used 12, 450 parallel sentences. For both statistical and rule-based approachs we used Mosses for translation process, MGIZA++ for alignment of word and morpheme, morfessor and rules were used for morphological segmentation and IRSTLM for language modeling. After preparing and designing the prototype and the corpus, different experiments were conducted. Experimental results showed a better performance of 15.14% and 16.15% BLEU scores using morpheme-based from Geez to Amharic and from Amharic to Geez translation, respectively. As compared to word level translation there is on the average 6.77% and 7.73% improvement from Geez-Amharic and Amharic-Ge’ez respectively. This result further shows that morpheme-level translation performs better than word-level translation. As a result, using morpheme as a translation unit we conducted further experiment using unsupervised and rule-based morpheme segmentation approaches. Accordingly, the performance of rule-based morphological segmentation is better than unsupervised with an average BLEU score of 0.6% and 1.27% for Ge’ez to Amharic and Amharic to Ge’ez respectively. Alignments of Amharic and Ge’ez text have shown correspondence, such as one-one, one-to-many, many-one and many-many alignment. In this study, many-to-many alignment is the major challenge. So further research is needed to handle many-to-many, word order and morphology of the two languages.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/18688
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectSMTen_US
dc.subjectMorpheme Level Alignmenten_US
dc.subjectMorfessoren_US
dc.subjectAmharicen_US
dc.subjectGeezen_US
dc.titleMorpheme-Based Bi-Directional Ge’ez -Amharic Machine Translationen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tadesse Kassa 2018.pdf
Size:
4.37 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: