Hand-Written Amharic Character Recognition: The Case of Postal Addresses

No Thumbnail Available

Date

2003-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Currently researchers are attracted to the area of Optical Character recognition primarily due to challenging nature of the research and secondly due to the industrial importance that it provides in the area of Reading machine for the Blind, postal Address interpretation, Bank Curtsey amount processing, hand filled form processing, and the like. Research III the area of Amharic OCR systems is ongoing Since 1997. Attempts were made III adopting algorithm to Amharic language, incorporating preprocessing techniques to the adopted algorithm, and in generalizing the system so as it recognizes Type written characters a::; well as hand written characters Sufficient amount of work is done in the areas of preprocessing such as segmentation and Noise Removal. However, the consideration given to the simplification of the feature extraction and the efforts made to alleviate the problems of high dimensional input still requires the contribution of many additional researches in order to come up with a system that the society can use to solve real world problems. To this end, Line fitting is used to Amharic Optical character recognition by applying simple geometric calculations to determine features which could represent and describe the character as uniquely and precisely as possible. The image of a segmented character which is normalized into 32x32 pixels is divided into 16 smaller squares of 8x8 pixels. Then the least square technique was applied to fit a linear model to the distribution of foreground pixels and three features were extracted from each smaller square Finally, a feed forward Neural Network trained using a back propagation algorithm is used on handwriting of three individuals using a cross validation technique as well as a separate test set and results are depicted on tables and confusion matrices Relevant Conclusions were drawn and some valid recommendations were forwarded to indicate future direction of further works on the area. combining the shape of the letters so as to form written words) [Plamondon and Srihari, 2000]. Handwriting, since it entails an individualistic skill and contains artificial graphical marks on the surface, is still a challenge in pattern recognition. The success of handwritten optical character recognition system is attributed to the availability of machine learning techniques [Lecun et.al 1998]. However, the availability of machine learning techniques alone is not able to solve the problems of offline OCR systems. To this end, some of the problems remain rather far away from being solved successfully. Since 1951, a time remarked by the invention of GISMO - a robot reader writer, many OCR systems were developed due to the advantages that they provide in overcoming the problem of repetitive and labor intensive tasks [Srihari & Lam, 1996]. At present hundreds of OCR systems are commercially available, and they are less expensive, faster , and more reliable due to less expensive electronic components, and extensive researches in the area [Yare gal, 2002]. Technically, Handwriting Recognition Systems compnse procedures like Scanning documents, Binarization, segmentation, feature extraction, recognition, and/or possible post processing [Million, 2000; De Lesa, 2001]. As Dereje mentioned in 1999, the OCR systems are highly influenced by factors like mode of writing, condition of the input, quality of the paper, and the presence of extraneous marks. In order to increase the performance of OCR systems, various preprocessing tasks like noise removal, skew detection and correction, and slant correction were applied to printed and type written scripts. Effort was also made in using structural features partly to increase the versatility of OCR systems [Yare gal, 2002) . In addition to the problems of machine printed and type written scripts, handwriting recognition has additional inconveniences introduced because of the great inconsistency of writing styles, and handwriting instruments.

Description

Keywords

Information Science

Citation