Part of Speech Tagger for Afaan Oromo Language Using Transformational Error Driven Learning (Tel) Approach
No Thumbnail Available
Date
2010-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
purpose of this research is to develop part-of-speech tagger for Afaan Oromo using
Transformational Error driven Learning (TEL) approach and compare it with other approach.
Most natural language processing systems use part-of-speech (POS) tagger as a one of their
component in their system. Specially, it is very significant for developing parser, machine
translator, speech recognizer and search engines.
Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the
language and also to identify possible tagsets. Based on this, 18 tagsets are identified and used on
223 sentences (1708 words) for the experiment.
The study customized Brill transformational error driven learning tagger for Afaan Oromo. Some
template in the original Brill tagger was modified to fit Afaan Oromo morphological nature. After
training data is analyzed for its appropriateness using learning curve analysis, the study used 10-
fold validation method for the experiment. Moreover experiment was conducted to determine the
percentage of training data for contextual and lexical rule learner. Best accuracy of the tagger was
achieved when contextual rule learner training data is 35% and lexical rule learning data is
65%.This shows the morphological rule dominance over contextual rule for the language.
After modification on the templates of the Brill’s tagger about 2.44% improvements over the
original Brill tagger was achieved. This means 80.08% accuracy of the tagger was achieved in
modifying the templates where the accuracy of the original tagger is 77.64%. Error of the modified
tagger was also analyzed for further improvements using confusion matrix for the tagger.
The result obtained in both original Brill tagger and modified Brill tagger is compared with
Hidden Markov Model approach (bigram and unigram approach).The comparison shows that Brill
tagger is by far better than Hidden Markov Model in all the cases for Afaan Oromo i.e Hidden
Markov Model accuracy for bigram approach is 70.63% and for unigram 68.08% whereas that of
original Brill tagger without modification is 77.64 and 80.08% for modified Brill tagger.
Keywords: Natural Language processing, parts of speech tagging, Brill Tagger,
Transformational Error driven Learning, Hidden Markov Model, Bigram, N-Gram.
Description
Keywords
Natural Language Processing; Parts Of Speech Tagging; Brill Tagger, Transformational Error Driven Learning; Hidden Markov Model, Bigram; N-Gram.