Incorporating Linguistic features in bi-directional Amharic - English Statistical Machine Translation
No Thumbnail Available
Date
2019-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
In this study, linguistic features of a word have been incorporated to improve bidirectional
Amharic - English SMT accuracy using factored phrase-based translation models. With this
approach, we have used not only the most common linguistic feature such as POS tags and
lemma of a word but also morphemes of a token employing a factored mode1 to improve the
quality of translations.
Experiments were carried out on a most recent data set, a corpus prepared at Addis Ababa
University, under a thematic research project of collecting parallel Corpora for bi- lingual English.
Ethiopian pairs available for the research community. We have used a segment of the corpus
Amharic -English pairs, a religious bible text of 30,646 sentence pairs.
Results show that using morpheme segment's on the Amharic side in combination with lemma
and POS tag of a word improves the BLEU scores of the translation significantly. The best
results we have using these factored phrase-based l110del obtained with the same data used r r
an ordinary base line system, increasing the BLEU score from 9.52 to 15.84 (a from English to
Amharic translation. We also improve the accuracy up to a 25. 48 BLEU score from 23. 12 (or
Amharic to English translation.
To summarize, this study introduces an approach of applying morpheme segments on the
Amharic data and lemmatization on the English data to build an English to Amharic statistical
machine translation system. The results of this study arc compared to the benchmark systems
which were built with the same data sets and arc found t0 be significantly higher than those.
Description
Keywords
Incorporating Linguistic features in bi-directional Amharic - English Statistical Machine Translation