Part of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approach

dc.contributor.advisorMidekso, Dida (PhD)
dc.contributor.authorHussen, Mohammed
dc.date.accessioned2018-06-21T13:28:29Z
dc.date.accessioned2023-11-04T12:22:29Z
dc.date.available2018-06-21T13:28:29Z
dc.date.available2023-11-04T12:22:29Z
dc.date.issued2010-02
dc.description.abstractpurpose of this research is to develop part-of-speech tagger for Afaan Oromo using Transformational Error driven Learning (TEL) approach and compare it with other approach. Most natural language processing systems use part-of-speech (POS) tagger as a one of their component in their system. Specially, it is very significant for developing parser, machine translator, speech recognizer and search engines. Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. Based on this, 18 tagsets are identified and used on 223 sentences (1708 words) for the experiment. The study customized Brill transformational error driven learning tagger for Afaan Oromo. Some template in the original Brill tagger was modified to fit Afaan Oromo morphological nature. After training data is analyzed for its appropriateness using learning curve analysis, the study used 10- fold validation method for the experiment. Moreover experiment was conducted to determine the percentage of training data for contextual and lexical rule learner. Best accuracy of the tagger was achieved when contextual rule learner training data is 35% and lexical rule learning data is 65%.This shows the morphological rule dominance over contextual rule for the language. After modification on the templates of the Brill’s tagger about 2.44% improvements over the original Brill tagger was achieved. This means 80.08% accuracy of the tagger was achieved in modifying the templates where the accuracy of the original tagger is 77.64%. Error of the modified tagger was also analyzed for further improvements using confusion matrix for the tagger. The result obtained in both original Brill tagger and modified Brill tagger is compared with Hidden Markov Model approach (bigram and unigram approach).The comparison shows that Brill tagger is by far better than Hidden Markov Model in all the cases for Afaan Oromo i.e Hidden Markov Model accuracy for bigram approach is 70.63% and for unigram 68.08% whereas that of original Brill tagger without modification is 77.64 and 80.08% for modified Brill tagger. Keywords: Natural Language processing, parts of speech tagging, Brill Tagger, Transformational Error driven Learning, Hidden Markov Model, Bigram, N-Gram.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/2775
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectNatural Language Processing; Parts Of Speech Tagging; Brill Tagger, Transformational Error Driven Learning; Hidden Markov Model, Bigram; N-Gram.en_US
dc.titlePart of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mohammed-Hussen.pdf
Size:
344.79 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections