Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
Faculty of Informatics >
Thesis - Information Science >
Please use this identifier to cite or link to this item:
|Title: ||BILINGUAL SCRIPT IDENTIFICATION FOR OPTICAL CHARACTER RECOGNITION OF AMHARIC AND ENGLISH PRINTED DOCUMENT|
|Authors: ||SERTSE, ABEBE|
|Advisors: ||Dr. Dereje Teferi|
|Keywords: ||Information science|
|Copyright: ||Jun-2011 |
|Date Added: ||28-Jul-2012 |
|Abstract: ||OCR is a type of document image analysis techniques to recognize the informative
content in the text documents to be archived in softcopy for different purposes. The
technique involves in conversion of the given image of text to its most probable similar
character in a given domain language scripts.
A line of a multilingual document page may contain text words in different language. To
recognize, such a document page, it is necessary to identify different script forms before
running an individual OCR system.
In this paper, a system that distinctly identifies Amharic and English Scripts from a
document image is presented. The system address the language identification problem on
the word level. In extracting the important feature values of word-image of the scripts,
preprocessing activities such as noise removal, binarization, segmentation, size and style
normalization activities are performed. Maximum Horizontal projection profiles from
three selected region, extent of the word image, and the ratio of the number of connected
component to the word-image width are the important feature value to discriminate the
two languages script.
Support Vector Machine algorithm is applied to classify new instance word images. The
proposed algorithm is tested with significant number of words with various font styles
and sizes. The results obtained are quite promising and encouraging.|
|Description: ||A Thesis Submitted to the School of Graduate Studies of Addis
Ababa University in Partial Fulfillment of the Requirements for the
Degree of Master of Science in Information Science|
|Appears in:||Thesis - Information Science|
Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.