A Bidirectional Tigrigna – English Statistical Machine Translation

Hailegebreal, Mulubrhan

A Bidirectional Tigrigna – English Statistical Machine Translation

Files

Mulubirhan hailegebriel_2017.pdf (2.94 MB)

Date

2017-06-04

Authors

Hailegebreal, Mulubrhan

Publisher

Addis Ababa University

Abstract

Machine Translation (MT) is one task in Natural Language Processing (NLP), where automatic systems are used to translate text from one (source) language to another (target) while preserving the meaning of source language. Since there is a need for translation of documents between Tigrigna and English languages, there needs to be a mechanism to do so. Hence, this study explored the possibility of developing Tigrigna – English statistical machine translation and improving the translation quality by applying linguistic information. In this work, experimental quantitative research method is used. In order to achieve the objective of this research work, a corpora are collected from different domain and classified into five sets of corpora, and prepared in a format suitable for use in the development process. In order to realize the goal, three sets of experiments are conducted: baseline (phrase based machine translation system), morph-based (based on morphemes obtained using unsupervised method) and post processed segmented systems (based on morphemes obtained by post-processing the output of the unsupervised segmenter). We work on MOSES which is a free statistical machine translation framework, which allows automatically training translation model using parallel corpus. Since the system is bidirectional, four language models are developed; one for English and the other three are for Tigrigna language includes for baseline, morph-based and the other for the post processed experiment. Translation models which assigns a probability that a given source language text generates a target language text are built and a decoder which searches for the shortest path is used. BLUE score is used to evaluate the performance of each set of experiment. Accordingly, the result obtained from the post processed experiment using corpus II has outperformed the other, and the result obtained has a BLUE score of 53.35 % for Tigrigna – English and 22.46 % for English – Tigrigna translations. This research focuses on segmenting prepositions and conjunctions because of data scarcity . Therefore future research should focus to further improve the BLUE score by applying semi supervised segmentation to include the remaining linguistic information.

Keywords

Machine translation, Statistical Machine Translation, Segmentation, TigrignaEnglishBidirectional Tigrigna – English Machine Translation

URI

http://etd.aau.edu.et/handle/12345678/14121

Collections

Information Sciences

Full item page

A Bidirectional Tigrigna – English Statistical Machine Translation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections