Optical Character Recognition of Typewritten Amharic Text

No Thumbnail Available

Date

1999-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Optical Character Recognition is an area of research where a system is made to accept a document image and convert it into ASCII code so that it will be easy for storage, retrieval, and filterer processing. OCR helps to convert a bulk of information available on paper to electronically processable format without human intervention -- saving time, money, and labor Recently Optical Character Recognition for the Amharic Script has become an area of research interest. Some developments have been made in recognizing characters with specific type style, font, and font size. All the trials in this regard are on very high quality laser printouts on white papers. In reality, however, most Amharic typewritten documents that need to be converted into machine-readable format are typewritten and on non-white paper III this study an attempt is made to explore the possibilities of developing an OCR system for typewritten Amharic text. To this end, features of the typewritten Amharic characters are thoroughly studied. Some algorithms for noise removal and segmentation are reviewed. These algorithms are implemented to see their performance on typewritten Amharic text. Previous algorithm implemented for recognition of Amharic characters is modified to incorporate the specific features of typewritten Amharic characters. The segmentation and the noise removal algorithms are integrated with this algorithm. The result is tested on typewritten Amharic documents, and test results are presented. Recommendations are also drawn to point out issues to be investigated filterer for the development of typewritten Amharic OCR system with better performance.

Description

Keywords

Information Science

Citation