Word Sequence Prediction for Amharic Language
No Thumbnail Available
Date
2014-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The significance of computers and handheld devices are not deniable in the modern world of today. Texts are entered to these devices using word processing programs as well as other techniques. Text prediction is one of the techniques that facilitates data entry to computers and other devices. Predicting words a user intends to type based on context information is the task of word sequence prediction, and it is the main focus of this study. Word prediction can be used as a stepping stone for further researches as well as to support various linguistic applications like handwriting recognition, mobile phone or PDA texting, and assisting people with disabilities. Even though Amharic is used by a large number of populations, no significant work is done on the topic of word sequence prediction. In this study, Amharic word sequence prediction model is developed using statistical methods and linguistic rules. Statistical models are constructed for root or stem, and morphological properties of words like aspect, voice, tense, and affixes using the training corpus. Consequently, morphological features like gender, number, and person are captured from a user‘s input to ensure grammatical agreements among words. Initially, root or stem words are suggested using root or stem statistical models. Then, morphological features for the suggested root or stem words are predicted using voice, tense, aspect, affixes statistical information and grammatical agreement rules of the language. Predicting morphological features is essential in Amharic because of its high morphological complexity, and this approach is not required in less inflected languages since there is a possibility of storing all word forms in a dictionary. Finally, surface words are generated based on the proposed root or stem words and morphological features. Evaluation of the model is performed using developed prototype and keystroke savings (KSS) as a metrics. According to our experiment, prediction result using a hybrid of bi-gram and tri-gram model has higher KSS and it is better compared to bi-gram and tri-gram models. Therefore, statistical and linguistic rules have quite good potential on word sequence prediction for Amharic language.
Keywords: Hornmorph, Keystroke Saving, Natural Language Processing, Parts-of-Speech, Word Prediction
Description
Keywords
Hornmorph, Keystroke Saving, Natural Language Processing, Parts-Of-Speech, Word Prediction