Optical Character Recognition of Amharic Text: An Integrated Approach
dc.contributor.advisor | Teferri, Dereje (PhD) | |
dc.contributor.advisor | Meshesha, Million (PhD) | |
dc.contributor.author | Assabie, Yaregal | |
dc.date.accessioned | 2020-06-25T08:29:59Z | |
dc.date.accessioned | 2023-11-18T12:46:40Z | |
dc.date.available | 2020-06-25T08:29:59Z | |
dc.date.available | 2023-11-18T12:46:40Z | |
dc.date.issued | 2002-06 | |
dc.description.abstract | Optical Character Recognition (OCR) is an area of research and development where a system is made to recognize document images. Cultural considerations and enormous flood of documents motivated the development of OCR across the world. Unlike other scripts, OCR development for Amharic characters has been started recently at SISA. Some developments have been made in recognizing specific font styles, font sizes, and font types. But, as the font style, size or type changes the recognition accuracy falls down The purpose of this study is, therefore, to explore the possibilities of developing a versatile OCR system that is independent of sizes of Amharic characters. To this end, different preprocessing techniques and pattern recognition techniques have been reviewed. Since the segmentation algorithm that was used by previous studies in the area works well, it is incorporated in this study with some modifications. Template matching, statistical, syntactic/structural, and neural network approaches are found to be the most commonly used pattern recognition techniques and the pros and cons of each technique is reviewed. To take their advantage, a hybrid system of syntactic/structural and neural network approaches is implemented. Syntactic/structural approach enables the developed OCR system to extract primitive structures of characters and generate a unique pattern for each character to be used by the neural network. The neural network enables the developed OCR system to classify/recognize the patterns generated and it can also predict for new cases. The network takes the output of the syntactic/structural approach as an input. With this procedure, the neural network is trained with VG2000 Agazian font of sizes J 0 and J 2. The performance of the developed system is tested with documents written using VG2000 Agazian font of sizes 8, 12, and 14. The results showed that, with minor differences, the developed OCR system classifies/recognizes the test cases of different font sizes with more or less the same level of accuracy. Based on the results, further research areas are a/so recommended. | en_US |
dc.identifier.uri | http://etd.aau.edu.et/handle/12345678/21846 | |
dc.language.iso | en | en_US |
dc.publisher | Addis Ababa University | en_US |
dc.subject | Information Science | en_US |
dc.title | Optical Character Recognition of Amharic Text: An Integrated Approach | en_US |
dc.type | Thesis | en_US |