English -Tigrigna Factored Statistical Machine Translation

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


In this paper, English to Tigrigna translation was conducted using Statistical machine translation approach. A total of 17,649 sentence pairs were used as a bilingual corpus to develop, train and test the translation system. Experiment was conducted using MOSES employing three types of corpus namely baseline, Segmented and finally factored corpus that integrates linguistic knowledge at word level. Some preliminary preprocessing task were performed namely sentence level segmentation and tokenization. These preprocessing tasks were done using a program codes written with python. In addition to that a lot of manual cleaning tasks were done when the preprocessing task required the researcher's judgment. After preprocessing, morphological segmentation, stemming and POS tagging were performed to prepare the factored corpora. The performance of the system was then tested using the BLEU metric. The result revealed that segmentation has contributed for the overall performance of the segmented system that has shown better performance compared to the baseline phrase-based system. When compared with the same segmented reference, the BLEU score for the segmented system is 22.65% which is a 1.61% increase from the baseline system that has a BLEU score 21.04. The factored corpus has shown a decrease of 6.15% from the segmented and 4.53% from the baseline system. The researcher believes that, the low performance of the factored system is accounted to the POS tags attached since the tagger was trained using a small manually tagged corpus prepared by the researcher.



Machine Translation ;English to Tigrigna translation