Character Recognition of Bilingual Amharic-Latin Printed Documents

Abeto, Alemu

Character Recognition of Bilingual Amharic-Latin Printed Documents

dc.contributor.advisor	Menore, Tekeba (Mr.)
dc.contributor.author	Abeto, Alemu
dc.date.accessioned	2019-06-23T05:54:46Z
dc.date.accessioned	2023-11-04T15:14:40Z
dc.date.available	2019-06-23T05:54:46Z
dc.date.available	2023-11-04T15:14:40Z
dc.date.issued	2018-11
dc.description.abstract	Optical character recognition (OCR), is system that automatically converts captured images of handwritten, typewritten or printed text documents into machine encoded text. In Ethiopia more than 80 language are spoken and those languages use either Amharic scripts or adopted Latin scripts. In such environment, in order to reach a larger cross section of people, it is necessary that a document should be composed of text contents in different languages written in Amharic and/or Latin characters. To prepare dataset, several documents were collected from different sources for both script types. Character images were collected for 231 Amharic characters and 52 characters for English (merged capital and small letters). Totally for 257-character classes, 49,087-character image are prepared to train and test the system. Randomly selected 80% of dataset were used to train the system where as remaining 20% for purpose of testing the accuracy. Data acquisition, image binarization, noise removal, skew correction, character segmentation, feature extraction and character classification are steps in developing character recognition system. A number of algorithms were implemented to develop the proposed OCR system. In this research work, it was discussed the process of developing an OCR for bilingual Amharic and Latin script using Convolutional Neural Network (CNN) which is feature extraction and character classification model. From the experiment 99.20% of classification accuracy was obtained when the number of neurons is 256 and with adaptive learning rate. In character segmentation stage, average of 98.85% accuracy was achieved for clear sample document and 95.86% for unclear sample documents. Therefore, overall recognition accuracy become 98.06 % and 95.09 % respectively.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/18541
dc.language.iso	en_US	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Bilingual OCR	en_US
dc.subject	CNN, Neural Network	en_US
dc.subject	Convolutional Neural Network	en_US
dc.subject	Ethiopic OCR	en_US
dc.title	Character Recognition of Bilingual Amharic-Latin Printed Documents	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abeto Alemu.pdf
Size:: 2.94 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Engineering