Hand-Written Amharic Character Recognition: The Case of Postal Addresses
No Thumbnail Available
Date
2003-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Currently researchers are attracted to the area of Optical Character
recognition primarily due to challenging nature of the research and secondly
due to the industrial importance that it provides in the area of Reading
machine for the Blind, postal Address interpretation, Bank Curtsey amount
processing, hand filled form processing, and the like.
Research III the area of Amharic OCR systems is ongoing Since 1997.
Attempts were made III adopting algorithm to Amharic language,
incorporating preprocessing techniques to the adopted algorithm, and in
generalizing the system so as it recognizes Type written characters a::; well as
hand written characters
Sufficient amount of work is done in the areas of preprocessing such as
segmentation and Noise Removal. However, the consideration given to the
simplification of the feature extraction and the efforts made to alleviate the
problems of high dimensional input still requires the contribution of many
additional researches in order to come up with a system that the society can
use to solve real world problems.
To this end, Line fitting is used to Amharic Optical character recognition by
applying simple geometric calculations to determine features which could
represent and describe the character as uniquely and precisely as possible.
The image of a segmented character which is normalized into 32x32 pixels is
divided into 16 smaller squares of 8x8 pixels. Then the least square technique
was applied to fit a linear model to the distribution of foreground pixels and
three features were extracted from each smaller square
Finally, a feed forward Neural Network trained using a back propagation
algorithm is used on handwriting of three individuals using a cross validation
technique as well as a separate test set and results are depicted on tables and
confusion matrices
Relevant Conclusions were drawn and some valid recommendations were
forwarded to indicate future direction of further works on the area.
combining the shape of the letters so as to form written words) [Plamondon
and Srihari, 2000].
Handwriting, since it entails an individualistic skill and contains artificial
graphical marks on the surface, is still a challenge in pattern recognition. The
success of handwritten optical character recognition system is attributed to
the availability of machine learning techniques [Lecun et.al 1998]. However,
the availability of machine learning techniques alone is not able to solve the
problems of offline OCR systems. To this end, some of the problems remain
rather far away from being solved successfully.
Since 1951, a time remarked by the invention of GISMO - a robot reader
writer, many OCR systems were developed due to the advantages that they
provide in overcoming the problem of repetitive and labor intensive tasks
[Srihari & Lam, 1996]. At present hundreds of OCR systems are
commercially available, and they are less expensive, faster , and more reliable
due to less expensive electronic components, and extensive researches in the
area [Yare gal, 2002].
Technically, Handwriting Recognition Systems compnse procedures like
Scanning documents, Binarization, segmentation, feature extraction,
recognition, and/or possible post processing [Million, 2000; De Lesa, 2001].
As Dereje mentioned in 1999, the OCR systems are highly influenced by
factors like mode of writing, condition of the input, quality of the paper, and
the presence of extraneous marks. In order to increase the performance of
OCR systems, various preprocessing tasks like noise removal, skew detection
and correction, and slant correction were applied to printed and type written
scripts. Effort was also made in using structural features partly to increase
the versatility of OCR systems [Yare gal, 2002) .
In addition to the problems of machine printed and type written scripts,
handwriting recognition has additional inconveniences introduced because
of the great inconsistency of writing styles, and handwriting instruments.
Description
Keywords
Information Science