Skip navigation

Please use this identifier to cite or link to this item:
Title: Automatic Amharic Text News Classification
Other Titles: A Neural Networks Approach
???metadata.dc.contributor.*???: Dr. Million Meshesha
Kelemework, Worku
Keywords: Neural Networks Approach;Amharic Text
Issue Date: Sep-2009
Publisher: Addis Abeba University
Abstract: Text classification is one of the methods used to organize massively available textual information in a meaningful context to maximize utilization of information. Automatic text classification is the preferred method for accomplishing classification in large volumes of information. Research works on automatic classification is flourishing in the context of other languages; whereas, research on automatic Amharic text classification is in its infancy stage and very few attempts have been made till now. This study puts forward its own contribution for automatic Amharic text classification. Before the classifier is constructed, preprocessing has been done on the data to make it ready for the learning algorithm including changing various Amharic characters with the same sound to one common form; stemming word variants; and removing stop words, punctuation marks and numbers. And Document Frequency (DF) threshold is applied to select features of news items. Two weighting schemes, Term Frequency (TF) and Term Frequency by Inverse Document Frequency (TF*IDF), are used so as to weight the features in news documents to construct news by features matrix, which is fed to the learning algorithm. This study considers one of the neural networks learning methods called Learning Vector Quantization (LVQ), to see its suitability for automatic Amharic text news classification. In the course of this study, it is found that TF weighting scheme outperforms TF*IDF weighting scheme by 3.54% on average. Using the TF weight method, 94.81%, 61.61% and 70.08% accuracies are obtained at three, six and nine categories experiments respectively with an average of 75.5% accuracy. For similar experiments, the application of TF*IDF weight method resulted in 69.63%, 78.22% and 68.03% accuracies with an average of 71.96% accuracy. Previous research works on Amharic text classification show that, accuracy decreases consistently with the increase in categories. The result of this study shows that accuracy does not depend on the number of news items and categories considered; rather, representing each category with enough number of subclasses determines accuracy. Therefore, further works focusing on finding the optimum number of subclasses is the major direction of research with regard to Amharic text news classification using LVQ.
Appears in Collections:Thesis - Information Science

Files in This Item:
File Description SizeFormat 
Worku Kelemework.pdf709.75 kBAdobe PDFView/Open
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.