Searching in Amharic Document Image Corpus

dc.contributor.advisorMeshesha, Million (PhD)
dc.contributor.authorGebretsadik, Abreham
dc.date.accessioned2018-11-23T14:17:38Z
dc.date.accessioned2023-11-29T04:57:19Z
dc.date.available2018-11-23T14:17:38Z
dc.date.available2023-11-29T04:57:19Z
dc.date.issued2010-07
dc.description.abstractThe introduction of World Wide Web has made access to digital information easier than ever before. Many information providers have therefore been started to digitize existing paper materials to enable access through networked information service. Nowadays, document retrieval becomes the main issue in information retrieval in order to search for relevant document as per users query. There are two types of document retrieval, text retrieval or document image retrieval. Document image retrieval can be recognition-based or without explicit recognition. There are a number of researches that have been done on document image retrieval throughout the world but there is few research in Amharic document image retrieval. As a result, this study deals with searching from Amharic document image corpus without explicit recognition. This study aims at improving efficiency and effectiveness of the retrieval system from document image collection. To this ends an inverted index file is created to store index terms after removing stopwords and grouping together variant words. Prefix and suffix of word variants are detected by modifying cosine similarity measure. The index file is constructed using inverted file structure. The search result of the system is displayed in ranked order based on TF*IDF weight, and performance evaluation of the system shows a promising result. However there is a need to solve issues related to feature extraction, word variation detection and noise detection and removal. Accordingly, further research works are recommended.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/14466
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectWorld Wide Web has made access to digital informationen_US
dc.titleSearching in Amharic Document Image Corpusen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Abreham Gebretsadik.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: