Restoration and Retrieval of Historical Amharic Document Images
No Thumbnail Available
Date
2014-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Many historical document image collections are now being scanned and made available over the Internet
or in digital libraries. However, it is to be noted that effective access to such information sources is
limited because of lack of efficient retrieval schemes.
The existing methods of searching and retrieving from document images can be conducted with the help
of recognition-based (Optical Character Recognition) and recognition-free (Document Image Retrieval)
or a combination of these two approaches. These algorithms try to analyze the global or local layout
structure for different document images and estimate the similarity among them.
A few researches have been conducted to develop a recognition-free document image retrieval system
that extracts information from document images relying on image features only. These systems are highly
affected by degradation in historical documents which results from paper aging, folding or scanning. In
this study, an attempt is made to integrate effective image restoring techniques to enhance the
effectiveness of the system in searching within historical document images. This study also improves the
online searching process of the system by accepting N-query terms for retrieving relevant documents in
addition to image viewer, towards enhancing the interface to the Amharic Document Image Retrieval
System.
In this study different images restoration techniques are experimented, such as Dilate, Erode and
Combination of Mathematical Morphology techniques as well as Haar, Daubechies, and Symlet wavelet
techniques. These techniques are experimented in historical documents as well as real life documents.
Performance analysis shows that best result is obtained by combining mathematical morphology with
Otsu thresholding. Finally, the performance of the system is evaluated before and after the integration of
the selected restoring techniques in which an average overall performance of 87.02 % F-measure is
registered in documents having low, medium and high levels of degradation with an improvement of
retrieval effectiveness by 4.65 % F-measure. The performance registered in this study shows promising
result for designing applicable Amharic document image retrieval. The major challenge is unavailability
of standardized corpus and the dataset contains limited number of historical document images. Therefore,
in the future a standardized corpus should be prepared and used for experimentation in similar studies.
Description
Keywords
Retrieval of Historical Amharic Document Images