Amharic Document Image Retrieval without Explicit Recognition

No Thumbnail Available

Date

2009-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Retrieval of the stored information is a key issue. Especially image retrieval needs an emphasis, because the nature of the data is complex and difficult to retrieve. There are many problems to be studied in the area of image retrieval. From these, Document Image Retrieval is one of the issues that have to be given attention. Document retrieval can use either a textual-based retrieval system or an image-based retrieval system. Document image retrieval system can also be done in two ways: recognition-based document image retrieval or document image retrieval without explicit recognition. Currently, little has been done on the Amharic document retrieval systems. The Amharic text retrieval systems which are covered by the researchers considered limited Amharic documents that are available only in hardcopy format. The proposed system incorporates document images and user queries. The document image is preprocessed, segmented at word level and the feature of each word is extracted. Then the textual query is rendered to convert into an image query, preprocessed, segmented and the feature is extracted. The technique used for feature extraction considers the word shape analysis. The extracted feature of the image query is matched with the feature of the document images, at word level using Euclidean and cosine similarity measures. Finally relevant document images are retrieved in ranked order in response to the given query. To verify the validity of the approach proposed, experiment is carried out on 121 scanned Amharic documents that are selected from printed legal documents and news items. The data retrieval effectiveness is measured using retrieval measures such as precision, recall and F-Score. The experimental results confirmed the validity of the model for retrieving relevant document images from the collection of scanned document images.

Description

Keywords

Retrieval without Explicit Recognition

Citation