Bilingual Script Identification for Optical Character Recognition of Amharic and English Printed Document

dc.contributor.advisorTeferi, Dereje (PhD)
dc.contributor.authorAbebe, Sertse
dc.date.accessioned2018-11-30T07:42:21Z
dc.date.accessioned2023-11-29T04:56:48Z
dc.date.available2018-11-30T07:42:21Z
dc.date.available2023-11-29T04:56:48Z
dc.date.issued2011-06
dc.description.abstractOCR is a type of document image analysis techniques to recognize the informative content in the text documents to be archived in softcopy for different purposes. The technique involves in conversion of the given image of text to its most probable similar character in a given domain language scripts. A line of a multilingual document page may contain text words in different languages. To recognize, such a document page, it is necessary to identify different script forms before running an individual OCR system. In this paper, a system that distinctly identifies Amharic and English Scripts from a document image is presented. The system addresses the language identification problem on the word level. In extracting the important feature values of word-image of the scripts, preprocessing activities such as noise removal, binarization, segmentation, size and style normalization activities are performed. Maximum Horizontal Projection profiles from three selected region, extent of the word image, and the ratio of the number of connected component to the word-image width are the important feature value to discriminate the two languages script. Support Vector Machine algorithm is applied to classify new instance word images. The proposed algorithm is tested with significant number of words with various font styles and sizes. The results obtained are quite promising and encouragingen_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/14739
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectBilingual Script Identificationen_US
dc.titleBilingual Script Identification for Optical Character Recognition of Amharic and English Printed Documenten_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Sertse Abebe.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: