Optical Character Recognition of Amharic Text: An Integrated Approach

Assabie, Yaregal

Optical Character Recognition of Amharic Text: An Integrated Approach

Files

Yaregal Assabie.pdf (30.96 MB)

Date

2002-06

Authors

Assabie, Yaregal

Publisher

Addis Ababa University

Abstract

Optical Character Recognition (OCR) is an area of research and development where a system is made to recognize document images. Cultural considerations and enormous flood of documents motivated the development of OCR across the world. Unlike other scripts, OCR development for Amharic characters has been started recently at SISA. Some developments have been made in recognizing specific font styles, font sizes, and font types. But, as the font style, size or type changes the recognition accuracy falls down The purpose of this study is, therefore, to explore the possibilities of developing a versatile OCR system that is independent of sizes of Amharic characters. To this end, different preprocessing techniques and pattern recognition techniques have been reviewed. Since the segmentation algorithm that was used by previous studies in the area works well, it is incorporated in this study with some modifications. Template matching, statistical, syntactic/structural, and neural network approaches are found to be the most commonly used pattern recognition techniques and the pros and cons of each technique is reviewed. To take their advantage, a hybrid system of syntactic/structural and neural network approaches is implemented. Syntactic/structural approach enables the developed OCR system to extract primitive structures of characters and generate a unique pattern for each character to be used by the neural network. The neural network enables the developed OCR system to classify/recognize the patterns generated and it can also predict for new cases. The network takes the output of the syntactic/structural approach as an input. With this procedure, the neural network is trained with VG2000 Agazian font of sizes J 0 and J 2. The performance of the developed system is tested with documents written using VG2000 Agazian font of sizes 8, 12, and 14. The results showed that, with minor differences, the developed OCR system classifies/recognizes the test cases of different font sizes with more or less the same level of accuracy. Based on the results, further research areas are a/so recommended.

Keywords

Information Science

URI

http://etd.aau.edu.et/handle/12345678/21846

Collections

Information Sciences

Full item page

Optical Character Recognition of Amharic Text: An Integrated Approach

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections