College of Natural and Computational Sciences
Permanent URI for this college
Browse
Browsing College of Natural and Computational Sciences by Subject "(Pos)"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item The Application of Decision Tree for Part of Speech (Pos) Tagging For Amharic(Addis Ababa University, 2009-09) Kebede, Gebeyehu; Abebe, Ermias (PhD)Automatic understanding of natural languages requires a set of language processing tools. POS tagger, which assigns the proper parts of speech (like noun, verb, adjective, etc) to words in a sentence, is one of these tools. This study investigates the possibility of applying decision tree based POS tagger for Amharic. The tagger was developed using j48 decision tree classifier algorithm, which is Weka’s implementation of C4.5 algorithm. In the process, a corpus developed by ELRC annotation team was used to get the required data for training and testing the models. The dataset is comprised of 1065 news documents; 210,000 words. A sample of some 800 sentences are selected and used for model development and evaluation. The dataset was preprocessed in line with the requirements of the Weka’s data mining tool. In order to support decision tree classification models, a table that contains the contextual and orthographic information is constructed semi-automatically and used as training and testing dataset. The right and left neighboring words tags for each word are used as contextual information. Moreover, orthographic information about the word like the first and last character, the prefix and suffix, existence of numeric digit within the word and so on are included in the table to provide useful information to the word to be tagged. Performance tests were conducted at various stages using 10-fold cross validation test option. Experimental results show that, only two successive left and right words tag provide useful contextual information; contextual information beyond two doesn’t provide useful information rather noise. In the end, an over all, including ambiguous and unknown words, 84.9% correctness (or accuracy) was obtained using 10-fold cross validation test option. Even though, the accuracy of this study is encouraging further study to improve the accuracy so as to reach at implementation level is recommended.