Part-Of-Speech Tagging For Afaan Oromo Language

dc.contributor.advisorMeshesha, Million (PhD)
dc.contributor.authorMamo, Getachew
dc.date.accessioned2018-11-28T11:38:16Z
dc.date.accessioned2023-11-29T04:56:59Z
dc.date.available2018-11-28T11:38:16Z
dc.date.available2023-11-29T04:56:59Z
dc.date.issued2009-01
dc.description.abstractMost natural language processing systems use part-of-speech (POS) tagger as a separate module in their architecture. Specially, it is very significant for developing parser, machine translator, speech recognizer and search engines. Tagging is a process of labeling part-of-speech tags to words of a text such that contextual information can be obtained from word labels. The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identify tagset and word categories, the study adopts Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. HMM is a statistical approach which is used in this study for part-of–speech tagging for Afaan Oromo words in a given corpus. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. A database of lexical probabilities (LexProb) and transitional probabilities (TransProb) are developed from this annotated corpus. These two probabilities are from which the tagger learn and tag sequence of words in a sentence.Java programming language is used to develop the tagger prototype based on the Viterbi algorithm with unigram and bigram models. It is also used to compute both lexical probabilities and transitional probabilities. The performance of the prototype, Afaan Oromo tagger is tested using ten fold cross validation mechanism. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively. Based on experimental analysis, concluding remarks and recommendations are forwarded.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/14636
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectPart-of-Speech Taggingen_US
dc.titlePart-Of-Speech Tagging For Afaan Oromo Languageen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Getachew Mamo.pdf
Size:
515.44 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: