A Generalized Approach to Optical Character Recognition (OCR) Amharic Texts

No Thumbnail Available

Date

2000-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

These days research in Optical Character Recognition is popular for its application potential in banks, post offices, insurance, and other governmental and non-governmental organizations. Other application areas include library automation and natural language processing. As Amharic is the working language of Ethiopia and used as a means of communication by most governmental and non-governmental organizations, there is a huge collection of document and processing that could benefit from OCR system. To this end, since recent times, research in the area of Amharic OCR system has been undertaken at SISA. The present research is a continuation with the aim of improving the performance of the system under investigation at SISA in recognizing characters written in different font types. To this end, feature-based approach was considered after thoroughly studying features of Amharic characters. Algorithms for thinning and feature extractions were reviewed from literature. An attempt was made to implement some of these algorithms so as to see their performance on Amharic text printed in different typeface s. Previous algorithms implemented for segmentation (stage-by-stage segmentation) and feature extraction/detection (tree-based topological features extraction teclmique) are incorporated with some modification to complete the Amharic OCR. The system is then tested on sample Amharic documents of actual cases (written in Agafari, Washra and Visual Geez) and test results obtained for each of the case is repOt1ed. Recommendations are also drawn to highlight areas of further research so as to improve the current work and incorporate other features to Amharic OCR system.

Description

Keywords

Information Science

Citation