Word Sequence Prediction for Amharic Language

dc.contributor.advisorAssabie, Yaregal (PhD)
dc.contributor.authorTensou, Tigist
dc.date.accessioned2018-06-26T05:28:06Z
dc.date.accessioned2023-11-04T12:23:55Z
dc.date.available2018-06-26T05:28:06Z
dc.date.available2023-11-04T12:23:55Z
dc.date.issued2014-10
dc.description.abstractThe significance of computers and handheld devices are not deniable in the modern world of today. Texts are entered to these devices using word processing programs as well as other techniques. Text prediction is one of the techniques that facilitates data entry to computers and other devices. Predicting words a user intends to type based on context information is the task of word sequence prediction, and it is the main focus of this study. Word prediction can be used as a stepping stone for further researches as well as to support various linguistic applications like handwriting recognition, mobile phone or PDA texting, and assisting people with disabilities. Even though Amharic is used by a large number of populations, no significant work is done on the topic of word sequence prediction. In this study, Amharic word sequence prediction model is developed using statistical methods and linguistic rules. Statistical models are constructed for root or stem, and morphological properties of words like aspect, voice, tense, and affixes using the training corpus. Consequently, morphological features like gender, number, and person are captured from a user‘s input to ensure grammatical agreements among words. Initially, root or stem words are suggested using root or stem statistical models. Then, morphological features for the suggested root or stem words are predicted using voice, tense, aspect, affixes statistical information and grammatical agreement rules of the language. Predicting morphological features is essential in Amharic because of its high morphological complexity, and this approach is not required in less inflected languages since there is a possibility of storing all word forms in a dictionary. Finally, surface words are generated based on the proposed root or stem words and morphological features. Evaluation of the model is performed using developed prototype and keystroke savings (KSS) as a metrics. According to our experiment, prediction result using a hybrid of bi-gram and tri-gram model has higher KSS and it is better compared to bi-gram and tri-gram models. Therefore, statistical and linguistic rules have quite good potential on word sequence prediction for Amharic language. Keywords: Hornmorph, Keystroke Saving, Natural Language Processing, Parts-of-Speech, Word Predictionen_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/3382
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectHornmorphen_US
dc.subjectKeystroke Savingen_US
dc.subjectNatural Language Processingen_US
dc.subjectParts-Of-Speechen_US
dc.subjectWord Predictionen_US
dc.titleWord Sequence Prediction for Amharic Languageen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tigist Tensou.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections