A Semi- Supervised Approach for Amharic News Classification
No Thumbnail Available
Date
2012-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Text classification is getting more attention and there is an increasing need for text classification
technique that provides automatic, fast, and accurate classification with the least human
interaction with such systems. Many techniques of supervised learning and unsupervised
learning do exist in the literature for data classification. Semi-supervised learning is halfway
between the supervised and unsupervised learning. In addition to unlabeled data, the algorithm is
provided with some supervision information but not necessarily for all example data.
The paper explored the semi-supervised text classification which is applied to different types of
vectors that are generated from the Amharic text documents. 3,154 news articles were used to do
this research. To come up with good results document preparation and preprocessing was done.
Weka package is used for the classification of the preprocessed data. Machine learning
techniques, Expectation maximization clustering algorithm with Naïve Bayes, Hyperpipe, and
RBF Network classification algorithm were used to categorize the Amharic news items.
The accuracy of the classifiers was better when the number of classes is less. The best result was
obtained by the Naïve Bayes , Hyperpipe and RBF Networks classifiers with four classes (83.44
%, 82.8 and 82.4%) and the least performance is shown on the 10 categories (55.42%,57.26%
and 51.9%) respectively. This research indicated that Naïve Bayes is more applicable to semisupervised
categorization of Amharic news items.
Keywords: Text categorization, semi-supervised machine Learning, Naïve Bayes, Hyperpipe
and RBF Networks
Description
Keywords
Text categorization, semi-supervised machine Learning, Naïve Bayes,, Hyperpipe and RBF Networks