Amharic Document Image Retrieval Without Explicit Recognition
No Thumbnail Available
Date
2009-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Retrieval of the stored information is a key issue. Especially image retrieval needs
an emphasis , because the nature of the data is complex and difficult to retrieve .
There are many problems to be studied in the area of image retrieval. From these ,
Document Image Retrieval is one of the issues that have to be given attention
Document retrieval can use either a textual-based retrieval system or animate based
retrieval system. Document image retrieval system can also be done in two
ways: recognition-based document image retrieval or document image retrieval
without explicit recognition
Currently , little has been done on the Amharic document retrieval systems . The
Amharic text retrieval systems which are covered by the researchers considered
limited Amharic documents that are available only in hardcopy format
The proposed system incorporates document images and user queries . The
document image is preprocessed , segmented at word le vela and the feature of each
word is extracted . Then the textual query is rendered to convert into an image
query, preprocessed , segmented and the feature is extracted . The technique used
for feature extraction considers the word shape analysis . The extracted feature of the
image query is matched with the feature of the document images , at word level using
Euclidean and cosine similarity measures . Finally relevant document images are
retrieved in ranked order in response to the given query.
To verify the validity of the approach proposed , experiment is carried out on 121
scanned Amharic documents that are selected from printed legal documents and
news items.
The data retrieval effectiveness is measured using retrieval measures such as
precision , recall and F-Score .
The experimental results confirmed the validity of the model for retrieving relevant
document images from the collection of scanned document images.
Description
Keywords
Information Science