AAU-ETD :: Browsing by Author "Mulugeta, Wondwossen"

Browsing by Author "Mulugeta, Wondwossen"

Now showing 1 - 2 of 2

Hierarchical Amharic News Text Classification
(Addis Ababa University, 2010-07) Kumilachew, Alemu; Mulugeta, Wondwossen
The advancement of the present day technology enables the production of huge amount of information. Retrieving useful information out of these huge collections necessitates proper organization and structuring. Automatic text classification is an inevitable solution in this regard. However, the present approach focuses on the flat classification, where each topic is treated as a separate class, which is inadequate in text classification where there are a large number of classes and a huge number of relevant features needed to distinguish between them. This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of Amharic News Text. The approach utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. An experiment had been conducted using a categorical data collected from Ethiopian News Agency (ENA) using SVM to see the performances of the hierarchical classifiers on Amharic News Text. The findings of the experiment show the accuracy of flat classification decreases as the number of classes and documents (features) increases. Moreover, the accuracy of the flat classifier decreases at an increasing number of top feature set. The peak accuracy of the flat classifier was 68.84 % when the top 3 features were used. The findings of the experiment done using hierarchical classification show an increasing performance of the classifiers as we move down the hierarchy. The maximum accuracy achieved was 90.37% at level-3(last level) of the category tree. Moreover, the accuracy of the hierarchical classifiers increases at an increasing number of top feature set compared to the flat classifier. The peak accuracy was 89.06% using level three classifier when the top 15 features were used. Furthermore, the performance between flat classifier and hierarchical classifiers are compared using the same test data. Thus, it shows that use of the hierarchical structure during classification has resulted in a significant improvement of 29.42 % in exact match precision when compared with a flat classifier. Keywords: Automatic Text Classification, Flat Classification, Hierarchical Classification, Support Vector Machine
OCR For Special Type of Handwritten Amharic Text ("Yekum Tsifet") Neural Network Approach
(Addis Ababa University, 2004-06) Mulugeta, Wondwossen; Negi, Atul (PhD); Tadesse, Nigussie (PhD)
Verbal and written communications, which are integral components of human society, have been tram formed by the development of the respective communication devices. Through the swift development in processing devices, the need and access to digitize printed information items by means of Optical Character Recognition (OCR) became possible. Despite the fact that most world languages are beneficiaries of this technology, the application of character recognition technology to Amharic text is a recent experience, and in its infancy stage when handwritten recognition is considered. This study is then an attempt made to develop a recognition engine for Amharic handwringer text written in a special type of writing style, which is called "Yekum Tsihuf" (የቁም ጽሁፍ)Before mechanical and electronic text processors were introduced in Ethiopia, information used to be recorded on natural materials by hand writing, animal skin being the dominant one. Those handwritten documents, wine in this writing style, hold vital information about history, tradition, religion, nature and etc., which render undeniable contribution to current and future studies. The availability of this information in an electronic form would greatly help preservation and communication. In this study, the application of handwritten character recognition with Artificial Neural Network implementation for the 231 main character set of Amharic language is all empted. The training and test data sets are produced by scribers who are trained to write text using the writing style. The study used various techniques at each phase from digitization to recognition levels. Preprocessing methods like image binarization, character segmentation, and size normalization and neural network recognition is made using Visual C++.Net and MATLAB programming environments. While segmentation rate of 95.96% is attained using stage-by-stage segmentation algorithm, recognition rate that ranges from 98.8% to 20.3% is obtained for different test cases. Based on the findings and the knowledge acquired during the experimentation, topics for filature research are also identified.

Browsing by Author "Mulugeta, Wondwossen"

Results Per Page

Sort Options