Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
School of Information Science and Computer Science >
Thesis - Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/3516

Advisors: Million Meshesha (Ph.D)
Keywords: Information science
Copyright: Jun-2011
Date Added: 30-Jul-2012
Publisher: AAU
Abstract: The ubiquity of digital computers and the boom of the Internet and World Wide Web resulted in massive information explosion over the entire world. Different types of information are uploaded in the Internet such as text documents, document images and other multimedia files. Document images facilitate office automation by preserving scanned documents in a document image database. However, information retrieving from document image database becomes a difficult task for organizations due to lack of efficient retrieval schemes. To overcome this challenge, recognition based and recognition free retrieval approaches are attempted by researchers. Recognition based retrieval first applies optical character recognition (OCR) to convert document images into text and then performs text retrieval using search engines. On the other hand, recognition free approach attempts to search and retrieve directly from document images relying on image features. Due to the limitation of OCR systems, recognition based retrieval is not effective. Hence, attempts are made by different researchers to develop a document image retrieval system without explicit recognition. On top of this, attempts are made to develop effective Amharic document image retrieval system. As a continuation, the current study is initiated to explore and design feature extraction and matching schemes that are insensitive to word variants, difference in font types, sizes and styles and degradation. In doing so, eight feature extraction methods and four matching techniques are tested. Of the four matching schemes dynamic time warping is insensitive to font types, sizes and styles difference. The eight feature extraction techniques are tested for performance, and then each feature is combined systematically following best stepwise feature selection method. The result shows that combined features score better performance than individuals. Using the best performer matching algorithm stemming is performed in image domain to handle word variants. Accordingly, promising experimental results are registered for word variants. The explored matching, feature extraction and stemming techniques are integrated with the previous Amharic document image retrieval system and tested on noisy document images. As the experimentation, the performance of the current system outperforms the previous attempts. Besides, relevant conclusions are drawn and some valid recommendations are forwarded to future investigation.
Description: A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Science
URI: http://hdl.handle.net/123456789/3516
Appears in:Thesis - Information Science

Files in This Item:

File Description SizeFormat
final edited1.pdf980.34 kBAdobe PDFView/Open

Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.


  Last updated: May 2010. Copyright © Addis Ababa University Libraries - Feedback