Design and Development of Part-of-speech Tagger for Kafi-noonoo Language
dc.contributor.advisor | Assabie, Yaregal(PhD) | |
dc.contributor.author | Mekuria, Zelalem | |
dc.date.accessioned | 2018-06-26T13:24:14Z | |
dc.date.accessioned | 2023-11-04T12:22:35Z | |
dc.date.available | 2018-06-26T13:24:14Z | |
dc.date.available | 2023-11-04T12:22:35Z | |
dc.date.issued | 2013-11 | |
dc.description.abstract | Part-Of-Speech tagger is a program that reads text in given language and assigns parts-of-speech such as noun, verb, adjective, etc. to each word and other token within the text. Several part-of-speech taggers are available on the web for different languages including Amharic, Oromifa and Tigrigna. However, these POS taggers cannot be applied directly for Kafi-noonoo language. Thus, this thesis presents a research work on Kafi-noonoo part-of-speech tagger. In order to develop the tagger, the study employed a hybrid approach i.e. HMM and rule-based tagger at sentence level. Developing part-of-speech tagger for a language has many advantages such as: it can be used as input for full parser; it can be used in text-to-speech system to correct the way of pronunciation, it can be used for surface linguistic analysis, it can be used as a pre-processing step for researchers who want to conduct higher level NLP application development and it also provide a way of learning the language by discovering the word category and grammar construction of the language. For training and testing purpose, 354 untagged Kafi-noonoo sentences are collected from two genres and annotated using an incremental corpus preparation approach. In addition to this, 34 part-of-speech tags are identified for tagging purpose. After assigning word class information on each word within the sentences, both HMM and rule-based taggers are trained on 90% of the tagged sentences to generate probabilities i.e. lexical and transitional probability for the statistical component of the hybrid tagger and set of transformation rules for the rule-based component of the hybrid tagger. Based on these probabilities and transformation rules, the hybrid tagger (combination of HMM and rule-based tagger) assigns the most suitable word class information for the given untagged Kafi-noonoo texts. The performance of the prototypes i.e. HMM, rule-based and hybrid taggers are tested using different experiments. As a result, HMM and rule-based tagger with unigram initial state tagger shows 77.19% and 61.88%accuracy respectively whereas, the hybrid tagger improve the accuracy to 80.47%. Key words: Part of speech tagger, HMM, Rule-based, Hybrid tagger and Transformation rules | en_US |
dc.identifier.uri | http://etd.aau.edu.et/handle/123456789/3752 | |
dc.language.iso | en | en_US |
dc.publisher | Addis Ababa University | en_US |
dc.subject | Part of Speech Tagger | en_US |
dc.subject | Hmm | en_US |
dc.subject | Rule-Based | en_US |
dc.subject | Hybrid Tagger and Transformation Rules | en_US |
dc.title | Design and Development of Part-of-speech Tagger for Kafi-noonoo Language | en_US |
dc.type | Thesis | en_US |