Design and Development of Part-of-speech Tagger for Kafi-noonoo Language

dc.contributor.advisorAssabie, Yaregal(PhD)
dc.contributor.authorMekuria, Zelalem
dc.date.accessioned2018-06-26T13:24:14Z
dc.date.accessioned2023-11-04T12:22:35Z
dc.date.available2018-06-26T13:24:14Z
dc.date.available2023-11-04T12:22:35Z
dc.date.issued2013-11
dc.description.abstractPart-Of-Speech tagger is a program that reads text in given language and assigns parts-of-speech such as noun, verb, adjective, etc. to each word and other token within the text. Several part-of-speech taggers are available on the web for different languages including Amharic, Oromifa and Tigrigna. However, these POS taggers cannot be applied directly for Kafi-noonoo language. Thus, this thesis presents a research work on Kafi-noonoo part-of-speech tagger. In order to develop the tagger, the study employed a hybrid approach i.e. HMM and rule-based tagger at sentence level. Developing part-of-speech tagger for a language has many advantages such as: it can be used as input for full parser; it can be used in text-to-speech system to correct the way of pronunciation, it can be used for surface linguistic analysis, it can be used as a pre-processing step for researchers who want to conduct higher level NLP application development and it also provide a way of learning the language by discovering the word category and grammar construction of the language. For training and testing purpose, 354 untagged Kafi-noonoo sentences are collected from two genres and annotated using an incremental corpus preparation approach. In addition to this, 34 part-of-speech tags are identified for tagging purpose. After assigning word class information on each word within the sentences, both HMM and rule-based taggers are trained on 90% of the tagged sentences to generate probabilities i.e. lexical and transitional probability for the statistical component of the hybrid tagger and set of transformation rules for the rule-based component of the hybrid tagger. Based on these probabilities and transformation rules, the hybrid tagger (combination of HMM and rule-based tagger) assigns the most suitable word class information for the given untagged Kafi-noonoo texts. The performance of the prototypes i.e. HMM, rule-based and hybrid taggers are tested using different experiments. As a result, HMM and rule-based tagger with unigram initial state tagger shows 77.19% and 61.88%accuracy respectively whereas, the hybrid tagger improve the accuracy to 80.47%. Key words: Part of speech tagger, HMM, Rule-based, Hybrid tagger and Transformation rulesen_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/3752
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectPart of Speech Taggeren_US
dc.subjectHmmen_US
dc.subjectRule-Baseden_US
dc.subjectHybrid Tagger and Transformation Rulesen_US
dc.titleDesign and Development of Part-of-speech Tagger for Kafi-noonoo Languageen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Zelalem Mekuria.pdf
Size:
1.75 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections