A Semi- Supervised Approach for Amharic News Classification

No Thumbnail Available

Date

2012-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Text classification is getting more attention and there is an increasing need for text classification technique that provides automatic, fast, and accurate classification with the least human interaction with such systems. Many techniques of supervised learning and unsupervised learning do exist in the literature for data classification. Semi-supervised learning is halfway between the supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information but not necessarily for all example data. The paper explored the semi-supervised text classification which is applied to different types of vectors that are generated from the Amharic text documents. 3,154 news articles were used to do this research. To come up with good results document preparation and preprocessing was done. Weka package is used for the classification of the preprocessed data. Machine learning techniques, Expectation maximization clustering algorithm with Naïve Bayes, Hyperpipe, and RBF Network classification algorithm were used to categorize the Amharic news items. The accuracy of the classifiers was better when the number of classes is less. The best result was obtained by the Naïve Bayes , Hyperpipe and RBF Networks classifiers with four classes (83.44 %, 82.8 and 82.4%) and the least performance is shown on the 10 categories (55.42%,57.26% and 51.9%) respectively. This research indicated that Naïve Bayes is more applicable to semisupervised categorization of Amharic news items. Keywords: Text categorization, semi-supervised machine Learning, Naïve Bayes, Hyperpipe and RBF Networks

Description

Keywords

Text categorization, semi-supervised machine Learning, Naïve Bayes,, Hyperpipe and RBF Networks

Citation