Development Of Part Of Speech Tagger Using Hybrid
dc.contributor.advisor | Teferra (Phd), Solomon | |
dc.contributor.author | Emiru, Getachew | |
dc.date.accessioned | 2018-11-09T10:19:35Z | |
dc.date.accessioned | 2023-11-18T12:43:55Z | |
dc.date.available | 2018-11-09T10:19:35Z | |
dc.date.available | 2023-11-18T12:43:55Z | |
dc.date.issued | 2016-10-05 | |
dc.description.abstract | Part of speech tagger is one of the subtasks in NLP application which is essential for other Natu-ral Language Processing (NLP) applications. It is a process of assigning a corresponding POS tag for a word that describes how the word is used in a sentence. Even though the accuracy is less, different researchers developed part of speech taggers for Afaan Oromoo using ap-proaches like rule based and HMM separately. In this thesis, the development of part of speech tagger using hybrid approach that combines rule based and HMM approaches was con-ducted for Afaan Oromoo. The transformation based learner, which is a rule based tagger, tag the words based on rules, or transformations induced directly from the training corpus without human intervention or expert knowledge. The HMM tagger, tags the words based on the most probable path for a given sequence of words. The hybrid approach of Afaan Oromoo part of speech taggers developed in this thesis uses HMM tagger as initial annotators and Brill’s’ tag-ger as a corrector based on fixed threshold value. NLTK 3.0.2 and python 3.4.3 were used for the implementation and experiment. To minimize data requirement and the cost of data prepara-tion we used bootstrapping method. To train and test the model 1517 sentences were used, that is collected from Afaan Oromoo news agencies and Medias. For experimental analysis we used 85% for training and the remaining 15% was used for testing. The performance analysis of the three taggers, namely: HMM, rule based and hybrid tagger were tested with the same training and testing set they achieved accuracy of 91.9%, 96.4% and 98.3%, respectively. In conclusion, the accuracy of the hybrid tagger clearly shows that a clear improvement performance rather than separated taggers. To increase the performance of the tagger wide coverage/domain area of training data and morphologically segmented words were recommended for future works. | en_US |
dc.identifier.uri | http://etd.aau.edu.et/handle/12345678/14045 | |
dc.language.iso | en | en_US |
dc.publisher | Addis Ababa University | en_US |
dc.subject | Hybrid Tagger, Afaan Oromoo, Artificial Intelligence (AI), part of speech tagging | en_US |
dc.title | Development Of Part Of Speech Tagger Using Hybrid | en_US |
dc.type | Thesis | en_US |