Geez To Amharic Automatic Machine Translation: A Statistical Approach

No Thumbnail Available

Date

2015-05-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Machine Translation (MT) is the task of automatically translating a text from one natural language to another. MT is essential for many applications including multilingual information retrieval, speech to speech and others. The theme of this thesis is Geez to Amharic MT based on statistical approach which addresses the problem of automatically translating Geez text to Amharic text. Geez is classical South Semitic language which is attested in many inscriptions including historic, medical, religious and other since the early 4th century. Today Geez remains only as a spoken language and the liturgy language of the Ethiopian Orthodox Tewahedo Church. Whereas, Amharic is among the most spoken language in Ethiopia and the official working language of the Federal Government of Ethiopia, where it has about 30 million native and non-native speakers. The machine translation of Geez document to Amharic will be of paramount importance in order to enable Amharic user to easily access the invaluable indigenous knowledge decoded in Geez language. Therefore, the thesis is focused on investigating the application of corpus based machine translation approach in order to translate Geez documents to Amharic. The method that is employed to conduct the experimentation is a Statistical Machine Translation (SMT) approach. This approach requires availability of a large volume of parallel documents prepared in Geez and Amharic. The experiment was conducted using Moses (statistical Machine Translation tool), GIZA++ word alignment toolkit and IRSTLM language modeling tools on 12, 840 parallel bilingual sentences and an average translation accuracy of BLUE score 8.26 was achieved on 10-fold cross validation experimentation. With the use sufficiently large parallel Geez-Amharic corpus collection and language synthesizing tool, it is possible to develop a better translation system for the language pairs.

Description

Keywords

Geez To Amharic ;A Statistical Approach

Citation