Part of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approach

Hussen, Mohammed

Part of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approach

dc.contributor.advisor	Midekso, Dida (PhD)
dc.contributor.author	Hussen, Mohammed
dc.date.accessioned	2018-06-21T13:28:29Z
dc.date.accessioned	2023-11-04T12:22:29Z
dc.date.available	2018-06-21T13:28:29Z
dc.date.available	2023-11-04T12:22:29Z
dc.date.issued	2010-02
dc.description.abstract	purpose of this research is to develop part-of-speech tagger for Afaan Oromo using Transformational Error driven Learning (TEL) approach and compare it with other approach. Most natural language processing systems use part-of-speech (POS) tagger as a one of their component in their system. Specially, it is very significant for developing parser, machine translator, speech recognizer and search engines. Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. Based on this, 18 tagsets are identified and used on 223 sentences (1708 words) for the experiment. The study customized Brill transformational error driven learning tagger for Afaan Oromo. Some template in the original Brill tagger was modified to fit Afaan Oromo morphological nature. After training data is analyzed for its appropriateness using learning curve analysis, the study used 10- fold validation method for the experiment. Moreover experiment was conducted to determine the percentage of training data for contextual and lexical rule learner. Best accuracy of the tagger was achieved when contextual rule learner training data is 35% and lexical rule learning data is 65%.This shows the morphological rule dominance over contextual rule for the language. After modification on the templates of the Brill’s tagger about 2.44% improvements over the original Brill tagger was achieved. This means 80.08% accuracy of the tagger was achieved in modifying the templates where the accuracy of the original tagger is 77.64%. Error of the modified tagger was also analyzed for further improvements using confusion matrix for the tagger. The result obtained in both original Brill tagger and modified Brill tagger is compared with Hidden Markov Model approach (bigram and unigram approach).The comparison shows that Brill tagger is by far better than Hidden Markov Model in all the cases for Afaan Oromo i.e Hidden Markov Model accuracy for bigram approach is 70.63% and for unigram 68.08% whereas that of original Brill tagger without modification is 77.64 and 80.08% for modified Brill tagger. Keywords: Natural Language processing, parts of speech tagging, Brill Tagger, Transformational Error driven Learning, Hidden Markov Model, Bigram, N-Gram.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/2775
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Natural Language Processing; Parts Of Speech Tagging; Brill Tagger, Transformational Error Driven Learning; Hidden Markov Model, Bigram; N-Gram.	en_US
dc.title	Part of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approach	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mohammed-Hussen.pdf
Size:: 344.79 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science