Recognition of Formatted Amharic Text Using Optical Character Recognition (OCR) Techniques
No Thumbnail Available
Date
1998-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
At this age of ours, information is the driving force behind every human endeavor.
Information in computer processable format is specially valuable since it can be stored,
manipulated, and transferred with a minimum of labor and financial cost. For this,
information in paper and other documents should be converted to computer processable
format. Quite for some time now, it has been a practice to develop character recognition systems.
Scripts such as Latin, Arabic, Kanji, Cyrillic, etc. have enjoyed a significant amount of
research in the area, while other scripts like Amharic and Kannada have little work done. The testing of OCR techniques on the Amharic script is a recent phenomenon. Worku
Alemu, a 1997 graduate of SISA, was able to adopt an OCR algorithm for the Amharic
script. Without applying pre- and post-processing techniques to detect and correct errors,
the combination of the segmentation and recognition algorithms he used yielded a
significant accuracy level for laser printouts of text with 12 point size and normal type style
of Washrag font (the main test case). However, his algorithm was not capable of recognizing texts written in different font sizes
and styles (such as italics and outline). In the current work, it is tried to further his work by
introducing some per-processing techniques so that his algorithm recognizes texts written
in different sizes and styles
Description
Keywords
Information Science