Optical Character Recognition of Typewritten Amharic Text
No Thumbnail Available
Date
1999-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Optical Character Recognition is an area of research where a system is made to accept a
document image and convert it into ASCII code so that it will be easy for storage, retrieval,
and filterer processing. OCR helps to convert a bulk of information available on paper to
electronically processable format without human intervention -- saving time, money, and
labor Recently Optical Character Recognition for the Amharic Script has become an area of
research interest. Some developments have been made in recognizing characters with
specific type style, font, and font size. All the trials in this regard are on very high quality
laser printouts on white papers. In reality, however, most Amharic typewritten documents
that need to be converted into machine-readable format are typewritten and on non-white
paper III this study an attempt is made to explore the possibilities of developing an OCR system for
typewritten Amharic text. To this end, features of the typewritten Amharic characters are
thoroughly studied. Some algorithms for noise removal and segmentation are reviewed. These
algorithms are implemented to see their performance on typewritten Amharic text. Previous
algorithm implemented for recognition of Amharic characters is modified to incorporate the
specific features of typewritten Amharic characters. The segmentation and the noise removal
algorithms are integrated with this algorithm. The result is tested on typewritten Amharic
documents, and test results are presented. Recommendations are also drawn to point out
issues to be investigated filterer for the development of typewritten Amharic OCR system with
better performance.
Description
Keywords
Information Science