Part of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approach

No Thumbnail Available

Date

2010-02

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

purpose of this research is to develop part-of-speech tagger for Afaan Oromo using Transformational Error driven Learning (TEL) approach and compare it with other approach. Most natural language processing systems use part-of-speech (POS) tagger as a one of their component in their system. Specially, it is very significant for developing parser, machine translator, speech recognizer and search engines. Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. Based on this, 18 tagsets are identified and used on 223 sentences (1708 words) for the experiment. The study customized Brill transformational error driven learning tagger for Afaan Oromo. Some template in the original Brill tagger was modified to fit Afaan Oromo morphological nature. After training data is analyzed for its appropriateness using learning curve analysis, the study used 10- fold validation method for the experiment. Moreover experiment was conducted to determine the percentage of training data for contextual and lexical rule learner. Best accuracy of the tagger was achieved when contextual rule learner training data is 35% and lexical rule learning data is 65%.This shows the morphological rule dominance over contextual rule for the language. After modification on the templates of the Brill’s tagger about 2.44% improvements over the original Brill tagger was achieved. This means 80.08% accuracy of the tagger was achieved in modifying the templates where the accuracy of the original tagger is 77.64%. Error of the modified tagger was also analyzed for further improvements using confusion matrix for the tagger. The result obtained in both original Brill tagger and modified Brill tagger is compared with Hidden Markov Model approach (bigram and unigram approach).The comparison shows that Brill tagger is by far better than Hidden Markov Model in all the cases for Afaan Oromo i.e Hidden Markov Model accuracy for bigram approach is 70.63% and for unigram 68.08% whereas that of original Brill tagger without modification is 77.64 and 80.08% for modified Brill tagger. Keywords: Natural Language processing, parts of speech tagging, Brill Tagger, Transformational Error driven Learning, Hidden Markov Model, Bigram, N-Gram.

Description

Keywords

Natural Language Processing; Parts Of Speech Tagging; Brill Tagger, Transformational Error Driven Learning; Hidden Markov Model, Bigram; N-Gram.

Citation

Collections