Design and Development of Part-of-speech Tagger for Kafi-noonoo Language

Mekuria, Zelalem

Design and Development of Part-of-speech Tagger for Kafi-noonoo Language

dc.contributor.advisor	Assabie, Yaregal(PhD)
dc.contributor.author	Mekuria, Zelalem
dc.date.accessioned	2018-06-26T13:24:14Z
dc.date.accessioned	2023-11-04T12:22:35Z
dc.date.available	2018-06-26T13:24:14Z
dc.date.available	2023-11-04T12:22:35Z
dc.date.issued	2013-11
dc.description.abstract	Part-Of-Speech tagger is a program that reads text in given language and assigns parts-of-speech such as noun, verb, adjective, etc. to each word and other token within the text. Several part-of-speech taggers are available on the web for different languages including Amharic, Oromifa and Tigrigna. However, these POS taggers cannot be applied directly for Kafi-noonoo language. Thus, this thesis presents a research work on Kafi-noonoo part-of-speech tagger. In order to develop the tagger, the study employed a hybrid approach i.e. HMM and rule-based tagger at sentence level. Developing part-of-speech tagger for a language has many advantages such as: it can be used as input for full parser; it can be used in text-to-speech system to correct the way of pronunciation, it can be used for surface linguistic analysis, it can be used as a pre-processing step for researchers who want to conduct higher level NLP application development and it also provide a way of learning the language by discovering the word category and grammar construction of the language. For training and testing purpose, 354 untagged Kafi-noonoo sentences are collected from two genres and annotated using an incremental corpus preparation approach. In addition to this, 34 part-of-speech tags are identified for tagging purpose. After assigning word class information on each word within the sentences, both HMM and rule-based taggers are trained on 90% of the tagged sentences to generate probabilities i.e. lexical and transitional probability for the statistical component of the hybrid tagger and set of transformation rules for the rule-based component of the hybrid tagger. Based on these probabilities and transformation rules, the hybrid tagger (combination of HMM and rule-based tagger) assigns the most suitable word class information for the given untagged Kafi-noonoo texts. The performance of the prototypes i.e. HMM, rule-based and hybrid taggers are tested using different experiments. As a result, HMM and rule-based tagger with unigram initial state tagger shows 77.19% and 61.88%accuracy respectively whereas, the hybrid tagger improve the accuracy to 80.47%. Key words: Part of speech tagger, HMM, Rule-based, Hybrid tagger and Transformation rules	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/3752
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Part of Speech Tagger	en_US
dc.subject	Hmm	en_US
dc.subject	Rule-Based	en_US
dc.subject	Hybrid Tagger and Transformation Rules	en_US
dc.title	Design and Development of Part-of-speech Tagger for Kafi-noonoo Language	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zelalem Mekuria.pdf
Size:: 1.75 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science