Improving Brill’s Tagger Lexical and Transformation Rule for Afaan Oromo Language
No Thumbnail Available
Date
2013-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Natural Language Processing (NLP) refers to Human-like language processing which reveals that it is a discipline within the field of Artificial Intelligence (AI). However, the ultimate goal of research on Natural Language Processing is to parse and understand language, which is not fully achieved yet. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Lack of standard part of speech tagger for Afaan Oromo will be the main obstacle for researchers in the area of machine translation, spell checkers, dictionary compilation and automatic sentence parsing and constructions.
Even though several works have been done on POS tagging for Afaan Oromo, the performance of the tagger is not sufficiently improved yet. Hence, this thesis has developed Afaan Oromo POS tagger to improve Brill’s tagger lexical and transformation rule with sufficiently large training corpus. Accordingly, Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. As a result, 26 broad tagsets were identified and 17,473 words from around 1100 sentences containing 6750 distinct words were tagged for training and testing purpose. From which 258 sentences are taken from the previous work. Transformation-based Error driven learning are adapted for Afaan Oromo part of speech tagging. Different experiments are conducted for the rule based approach taking 20% of the whole data for testing. A comparison with the previously adapted Brill’s Tagger is made. The previously adapted Brill’s Tagger shows an accuracy of 89.8% whereas the improved Brill’s Tagger result shows an accuracy of 95.6% which has an improvement of 5.8%.
Hence, it is found that the size of the training corpus, the rule generating system in the lexical rule learner, and moreover, using Afaan Oromo HMM tagger as initial state tagger have a significant effect on the improvement of the tagger. Since there is only a few readymade standard corpuses, the manual tagging process to prepare corpus for this work was challenging and hence, it is recommended that a standard corpus is prepared.
Keywords: Afaan Oromo, POS tagger, NLP, Brill’s Tagger
Description
Keywords
Afaan Oromo; POS Tagger; NLP, Brill’s Tagger