Character Recognition of Bilingual Amharic-Latin Printed Documents

dc.contributor.advisorMenore, Tekeba (Mr.)
dc.contributor.authorAbeto, Alemu
dc.date.accessioned2019-06-23T05:54:46Z
dc.date.accessioned2023-11-04T15:14:40Z
dc.date.available2019-06-23T05:54:46Z
dc.date.available2023-11-04T15:14:40Z
dc.date.issued2018-11
dc.description.abstractOptical character recognition (OCR), is system that automatically converts captured images of handwritten, typewritten or printed text documents into machine encoded text. In Ethiopia more than 80 language are spoken and those languages use either Amharic scripts or adopted Latin scripts. In such environment, in order to reach a larger cross section of people, it is necessary that a document should be composed of text contents in different languages written in Amharic and/or Latin characters. To prepare dataset, several documents were collected from different sources for both script types. Character images were collected for 231 Amharic characters and 52 characters for English (merged capital and small letters). Totally for 257-character classes, 49,087-character image are prepared to train and test the system. Randomly selected 80% of dataset were used to train the system where as remaining 20% for purpose of testing the accuracy. Data acquisition, image binarization, noise removal, skew correction, character segmentation, feature extraction and character classification are steps in developing character recognition system. A number of algorithms were implemented to develop the proposed OCR system. In this research work, it was discussed the process of developing an OCR for bilingual Amharic and Latin script using Convolutional Neural Network (CNN) which is feature extraction and character classification model. From the experiment 99.20% of classification accuracy was obtained when the number of neurons is 256 and with adaptive learning rate. In character segmentation stage, average of 98.85% accuracy was achieved for clear sample document and 95.86% for unclear sample documents. Therefore, overall recognition accuracy become 98.06 % and 95.09 % respectively.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/18541
dc.language.isoen_USen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectBilingual OCRen_US
dc.subjectCNN, Neural Networken_US
dc.subjectConvolutional Neural Networken_US
dc.subjectEthiopic OCRen_US
dc.titleCharacter Recognition of Bilingual Amharic-Latin Printed Documentsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Abeto Alemu.pdf
Size:
2.94 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: