A Generalized Approach to Optical Character Recognition (OCR) Amharic Texts

Meshesha, Million

A Generalized Approach to Optical Character Recognition (OCR) Amharic Texts

Files

Million Meshesha.pdf (33.34 MB)

Date

2000-05

Authors

Meshesha, Million

Publisher

Addis Ababa University

Abstract

These days research in Optical Character Recognition is popular for its application potential in banks, post offices, insurance, and other governmental and non-governmental organizations. Other application areas include library automation and natural language processing. As Amharic is the working language of Ethiopia and used as a means of communication by most governmental and non-governmental organizations, there is a huge collection of document and processing that could benefit from OCR system. To this end, since recent times, research in the area of Amharic OCR system has been undertaken at SISA. The present research is a continuation with the aim of improving the performance of the system under investigation at SISA in recognizing characters written in different font types. To this end, feature-based approach was considered after thoroughly studying features of Amharic characters. Algorithms for thinning and feature extractions were reviewed from literature. An attempt was made to implement some of these algorithms so as to see their performance on Amharic text printed in different typeface s. Previous algorithms implemented for segmentation (stage-by-stage segmentation) and feature extraction/detection (tree-based topological features extraction teclmique) are incorporated with some modification to complete the Amharic OCR. The system is then tested on sample Amharic documents of actual cases (written in Agafari, Washra and Visual Geez) and test results obtained for each of the case is repOt1ed. Recommendations are also drawn to highlight areas of further research so as to improve the current work and incorporate other features to Amharic OCR system.

Keywords

Information Science

URI

http://etd.aau.edu.et/handle/12345678/21826

Collections

Information Sciences

Full item page

A Generalized Approach to Optical Character Recognition (OCR) Amharic Texts

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections