A Generalized Approach to Optical Character Recognition (OCR) Amharic Texts
No Thumbnail Available
Date
2000-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
These days research in Optical Character Recognition is popular for its application potential
in banks, post offices, insurance, and other governmental and non-governmental
organizations. Other application areas include library automation and natural language
processing.
As Amharic is the working language of Ethiopia and used as a means of communication by
most governmental and non-governmental organizations, there is a huge collection of
document and processing that could benefit from OCR system. To this end, since recent
times, research in the area of Amharic OCR system has been undertaken at SISA. The
present research is a continuation with the aim of improving the performance of the system
under investigation at SISA in recognizing characters written in different font types.
To this end, feature-based approach was considered after thoroughly studying features of
Amharic characters. Algorithms for thinning and feature extractions were reviewed from
literature. An attempt was made to implement some of these algorithms so as to see their
performance on Amharic text printed in different typeface s. Previous algorithms
implemented for segmentation (stage-by-stage segmentation) and feature
extraction/detection (tree-based topological features extraction teclmique) are incorporated
with some modification to complete the Amharic OCR. The system is then tested on sample
Amharic documents of actual cases (written in Agafari, Washra and Visual Geez) and test
results obtained for each of the case is repOt1ed. Recommendations are also drawn to
highlight areas of further research so as to improve the current work and incorporate other
features to Amharic OCR system.
Description
Keywords
Information Science