Word Sequence Prediction for Amharic
No Thumbnail Available
Date
2011-02-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Word prediction is a popular machine learning task, which consists of predicting the next word in sequence of words. Literature shows that word sequence prediction could play a great role in real life applications including electronic based data entry. Word prediction deals with guessing what word comes after, based on some current information, and it is the main focus of this study. Even though Amharic is used by a large number of population, few works are done on the topic of word sequence prediction. Previous works on word prediction shows that statistical methods are not enough with highly inflected language and needs syntactical information.
In this study, we developed Amharic word sequence prediction following the Design science research methodology with statistical methods using Hidden Markov Model. We used around 138,000 phrases to train the model by incorporating detailed parts of speech. The experiments were done using bigram and trigram models on a window size of two, five and seven. We explained the efficacy of part of speech tag in Amharic word sequence prediction.
Evaluation was performed using developed model and keystroke savings (KSS) as a metrics. According to our experiment, prediction results using a bi-gram with detailed Parts of Speech tag model has higher KSS and performed slightly better compared to those without Parts of Speech tag. Therefore, statistical approach with detailed POS with window size of five has good potential on word sequence prediction for Amharic language.
Description
Keywords
Word Sequence Prediction, Parts of Speech, N-Gram