Word Sequence Prediction Model for Tigrigna Language

No Thumbnail Available

Date

6/9/2020

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Data entry is an important aspect of human computer interaction. It can be performed through the use of a keyboard, or other means. Writing text for work, study or communicating is frequent and time-consuming activity for most computer users. Better data entry performance can be obtained using Word prediction systems. Word Prediction is the task of forecasting words that are expected to follow a given fragment of text. Word prediction software is mainly used to minimize keystrokes for different users especially for people with disabilities, for people having limited language proficiency, for people with frequent spelling errors and for non-native users. Huge volume of Tigrigna documents are being written and made available on the Internet. In this study we designed and developed a word sequence prediction model for Tigrigna language. This is done using n-gram statistical models based on two Markov language models, one for tag, the other for words which are developed using manually tagged corpus, and grammatical rules of the language. The designed model is evaluated based on a precision evaluation metric that is used to evaluate performance of the system. According to our evaluation, On the average 85 % performance of correctly predicted words are obtained using Sequence of two tags and 81.5 % performance of correctly predicted words are obtained using Sequence of Three tags. According to our result, Word prediction using Sequence of two tags provides better performance than Sequence of Three tag.

Description

Keywords

Word Prediction, Natural Language Processing, Statistical Language Modelling, Pos Tagging, Precision

Citation

Collections