OCR For Special Type of Handwritten Amharic Text ("Yekum Tsifet") Neural Network Approach

No Thumbnail Available

Date

2004-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Verbal and written communications, which are integral components of human society, have been tram formed by the development of the respective communication devices. Through the swift development in processing devices, the need and access to digitize printed information items by means of Optical Character Recognition (OCR) became possible. Despite the fact that most world languages are beneficiaries of this technology, the application of character recognition technology to Amharic text is a recent experience, and in its infancy stage when handwritten recognition is considered. This study is then an attempt made to develop a recognition engine for Amharic handwringer text written in a special type of writing style, which is called "Yekum Tsihuf" (የቁም ጽሁፍ)Before mechanical and electronic text processors were introduced in Ethiopia, information used to be recorded on natural materials by hand writing, animal skin being the dominant one. Those handwritten documents, wine in this writing style, hold vital information about history, tradition, religion, nature and etc., which render undeniable contribution to current and future studies. The availability of this information in an electronic form would greatly help preservation and communication. In this study, the application of handwritten character recognition with Artificial Neural Network implementation for the 231 main character set of Amharic language is all empted. The training and test data sets are produced by scribers who are trained to write text using the writing style. The study used various techniques at each phase from digitization to recognition levels. Preprocessing methods like image binarization, character segmentation, and size normalization and neural network recognition is made using Visual C++.Net and MATLAB programming environments. While segmentation rate of 95.96% is attained using stage-by-stage segmentation algorithm, recognition rate that ranges from 98.8% to 20.3% is obtained for different test cases. Based on the findings and the knowledge acquired during the experimentation, topics for filature research are also identified.

Description

Keywords

Information Science

Citation