English -Tigrigna Factored Statistical Machine Translation

Tsegaye, Tariku

English -Tigrigna Factored Statistical Machine Translation

Files

Tariku Tsegaye.pdf (1.51 MB)

Date

2014-06-08

Authors

Tsegaye, Tariku

Publisher

Addis Ababa University

Abstract

In this paper, English to Tigrigna translation was conducted using Statistical machine translation approach. A total of 17,649 sentence pairs were used as a bilingual corpus to develop, train and test the translation system. Experiment was conducted using MOSES employing three types of corpus namely baseline, Segmented and finally factored corpus that integrates linguistic knowledge at word level. Some preliminary preprocessing task were performed namely sentence level segmentation and tokenization. These preprocessing tasks were done using a program codes written with python. In addition to that a lot of manual cleaning tasks were done when the preprocessing task required the researcher's judgment. After preprocessing, morphological segmentation, stemming and POS tagging were performed to prepare the factored corpora. The performance of the system was then tested using the BLEU metric. The result revealed that segmentation has contributed for the overall performance of the segmented system that has shown better performance compared to the baseline phrase-based system. When compared with the same segmented reference, the BLEU score for the segmented system is 22.65% which is a 1.61% increase from the baseline system that has a BLEU score 21.04. The factored corpus has shown a decrease of 6.15% from the segmented and 4.53% from the baseline system. The researcher believes that, the low performance of the factored system is accounted to the POS tags attached since the tagger was trained using a small manually tagged corpus prepared by the researcher.

Keywords

Machine Translation ;English to Tigrigna translation

URI

http://etd.aau.edu.et/handle/12345678/14145

Collections

Information Sciences

Full item page

English -Tigrigna Factored Statistical Machine Translation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections