Geez to Amharic Machine Translation

dc.contributor.advisorLibise, Mulugeta (PhD)
dc.contributor.authorAbel, Biruk
dc.date.accessioned2019-08-19T10:20:38Z
dc.date.accessioned2023-11-29T04:06:01Z
dc.date.available2019-08-19T10:20:38Z
dc.date.available2023-11-29T04:06:01Z
dc.date.issued2018-05-05
dc.description.abstractNatural Language Processing (NLP) is defined as ways for computers to analyze, understand, and derive meaning from human language in a smart and useful way. Machine Translation (MT) is one of the applications of NLP. It is the use of computers to translate from one natural language like Geez to another say Amharic. Natural languages may follow different word ordering during sentence formation for example Geez follows Subject + verb + object (SVO) and Verb + subject + object (VSO) while Amharic only follows SOV so alignment of the right Geez word with the Amharic word is of paramount importance to improve the translation quality. The purpose of this study to develop a Hybrid Geez to Amharic Machine Translation system using serial coupling of rule based Geez language word reordering followed by a standard Statistical Machine Translation (SMT) system. The proposed system is composed of two main components a Rule Based Geez Corpus Preprocessor and a Baseline SMT. The Rule Based Preprocessor takes the manually Part of Speech (POS) tagged Geez corpus and produces another corpus that contains reordered Geez sentences having similar structure with that of Amharic sentences. This component contains set of activities that process each Geez sentence in the input corpus one by one to determine POS pattern and subsequently apply the corresponding reordering rule. It first reads all sentences from the input file and iterates through all sentences and it first determines POS pattern and applies the corresponding reordering rule. After each sentence is processed the output corpus along with the Amharic corpus will be supplied as an input to the Baseline SMT. Then using the input corpora the actual translation of Geez sentence to Amharic sentences will be performed by the Decoder of the Baseline SMT by using the Language model of Amharic and Translation model. The translation quality of the proposed system is evaluated using BLEU evaluation metrics and compared with that of the Baseline SMT. Two experiments were conducted one to test the Baseline SMT and the other to test the proposed system. To test the Baseline SMT both Geez and Amharic corpus without POS were used while to test the proposed system Geez corpus with POS and Amharic corpus with no POS were used. Based on the test results the Baseline SMT scored a BLEU of 72% and the proposed system outscores it by 4% and scored 76% owing to the reordering rules applied on Geez corpus.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/18800
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectGeez to Amharic Machine Translationen_US
dc.subjectHybrid Machine Translationen_US
dc.subjectRule Based Word Reorderingen_US
dc.subjectStatistical Machine Translationen_US
dc.subjectPart of Speech Taggingen_US
dc.titleGeez to Amharic Machine Translationen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Biruk Abel 2018.pdf
Size:
2.75 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: