English -Tigrigna Factored Statistical Machine Translation

dc.contributor.advisorYifiru., Martha(Dr.)
dc.contributor.authorTsegaye, Tariku
dc.date.accessioned2018-11-11T14:02:46Z
dc.date.accessioned2023-11-18T12:43:53Z
dc.date.available2018-11-11T14:02:46Z
dc.date.available2023-11-18T12:43:53Z
dc.date.issued2014-06-08
dc.description.abstractIn this paper, English to Tigrigna translation was conducted using Statistical machine translation approach. A total of 17,649 sentence pairs were used as a bilingual corpus to develop, train and test the translation system. Experiment was conducted using MOSES employing three types of corpus namely baseline, Segmented and finally factored corpus that integrates linguistic knowledge at word level. Some preliminary preprocessing task were performed namely sentence level segmentation and tokenization. These preprocessing tasks were done using a program codes written with python. In addition to that a lot of manual cleaning tasks were done when the preprocessing task required the researcher's judgment. After preprocessing, morphological segmentation, stemming and POS tagging were performed to prepare the factored corpora. The performance of the system was then tested using the BLEU metric. The result revealed that segmentation has contributed for the overall performance of the segmented system that has shown better performance compared to the baseline phrase-based system. When compared with the same segmented reference, the BLEU score for the segmented system is 22.65% which is a 1.61% increase from the baseline system that has a BLEU score 21.04. The factored corpus has shown a decrease of 6.15% from the segmented and 4.53% from the baseline system. The researcher believes that, the low performance of the factored system is accounted to the POS tags attached since the tagger was trained using a small manually tagged corpus prepared by the researcher.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14145
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectMachine Translation ;English to Tigrigna translationen_US
dc.titleEnglish -Tigrigna Factored Statistical Machine Translationen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tariku Tsegaye.pdf
Size:
1.51 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: