The Application of OCR Techniques to the Amharic Script

Addis Ababa University


Nowadays, it is becoming increasingly important to have information available for examination and manipulation in digital format, and Optical Character Recognition (OCR) is being recognized as one of valuable instruments in this respect. OCR systems take optical images of a handwritten or printed material, and by recognizing the characters that make up the material, automatically convert the text in the material into digital format for further processing and manipulation - thereby bypassing the labour-intensive and error prone as well as time consuming process of keying. While the use and application of OCR systems seems to have been well developed in languages that use scripts based on Latin, Chinese, Japanese, Bangia, to mention but a few, there is not as yet any effort in this direction for the Amharic language. This study is an attempt to approach the development of an Amharic OCR system by drawing experience elsewhere - to investigate the extent to which suggested OCR algorithms to work with other scripts would apply to recognizing Amharic characters. To this end, algorithms suggested for use in other languages are reviewed from published literature. The Amharic writing system is described in terms of size, shape, style, etc. Algorithms of general appeal to the Amharic character recognition are selected. Experimentation with a step-by-step segmentation and recognition based on topological features of Amharic characters is presented. Recommendations are also made to further the experiment and enhance the performance and applicability of the selected algorithms.



Information Science