Word Sequence Prediction for Afaan Oromo

No Thumbnail Available

Date

2018-03-03

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Data entry is a core aspect of human computer interaction. Text prediction is one of data entry systems to a computer and other hand held electronics device. It is a process of guessing the words which are likely to follow in a given text segment by displaying a list of the most probable words that could appear in that position. Word sequence prediction assists physically disabled individuals who have typing difficulties, speed up typing speed by decreasing keystrokes, helps in spelling and error detection and it also helps in speech recognition and hand writing recognition. Even if Afaan Oromo is one of the major languages widely spoken and written in Ethiopia, there is no research conducted on the area of word sequence prediction. Hence, due to the absence of word sequence prediction for Afaan Oromo, people are not enjoying the core benefits of word sequence prediction. In this study, word sequence prediction model is designed and developed. We used the bi and tri-word statistics, and the bi-, and tri POS tag statistics of the language. Initially, the training corpus and user inputs are tokenized and then morphologically analyzed. Subsequently, word statistics model is built for root or stem word and POS tag statistics model is built for root or stem with tag like noun, verb, adjective, pronoun, adverb, conjunction and etc. by using training corpus. After that, the most likely probable root or stem words are suggested. Finally, lexical words are synthesized based on the proposed root or stem words. The designed model is evaluated based on the developed prototype. Keystroke Saving (KSS) is used to evaluate systems performance. According the evaluation the primary word-based statistical system achieved 20.5% KSS, and the second system that used syntactic categories with word-statistics achieved 22.5% KSS. Therefore, statistical and linguistic rules have good potential on word sequence prediction for Afaan Oromo.

Description

Keywords

Word Prediction, Statistical Language Modeling, POS Tagging, Keystroke Saving

Citation