Recognition of Formatted Amharic Text Using Optical Character Recognition (OCR) Techniques

Addis Ababa University


At this age of ours, information is the driving force behind every human endeavor. Information in computer processable format is specially valuable since it can be stored, manipulated, and transferred with a minimum of labor and financial cost. For this, information in paper and other documents should be converted to computer processable format. Quite for some time now, it has been a practice to develop character recognition systems. Scripts such as Latin, Arabic, Kanji, Cyrillic, etc. have enjoyed a significant amount of research in the area, while other scripts like Amharic and Kannada have little work done. The testing of OCR techniques on the Amharic script is a recent phenomenon. Worku Alemu, a 1997 graduate of SISA, was able to adopt an OCR algorithm for the Amharic script. Without applying pre- and post-processing techniques to detect and correct errors, the combination of the segmentation and recognition algorithms he used yielded a significant accuracy level for laser printouts of text with 12 point size and normal type style of Washrag font (the main test case). However, his algorithm was not capable of recognizing texts written in different font sizes and styles (such as italics and outline). In the current work, it is tried to further his work by introducing some per-processing techniques so that his algorithm recognizes texts written in different sizes and styles



Information Science