Ontology-Based Semantic Indexing for Amharic Text in Football Domain
No Thumbnail Available
Date
2013-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Enormous amount of data has been produced in electronic format in Amharic language which led
to information explosion. This has created a major challenge for information managers in
processing information and providing it to users quickly and easily. Therefore, some indexing
methods have been proposed for Amharic language by researchers so far. However, these
methods are not capable enough to capture the semantics of documents. In this research, an effort
has been made to build a semantic indexer for Amharic football news articles by applying
domain ontology.
The main purpose of the study is to construct an index which is embedded with the ontology so
as to minimize query processing time. Ontology development, Document indexing, and Query
processing are the core components of the study. Document indexing component is composed of
Concept Tagger, Information Extraction, Concept Weighting, and Ontology Population modules.
The role of Concept Tagger module is to annotate documents with concepts from the ontology
whereas Information Extraction Module is responsible for identifying new individuals and
determining the relationship between concepts in the tagged/annotated documents. The Concept
Weighting module involves calculating weights for concepts and individuals using the domain
ontology. The weights computed for the concepts and individuals are added to the ontology by
using the Ontology Population module.
The query processing component is built with the purpose of testing the performance of the
indexer with user queries. This component has Query Caching, Individual Creator, Document
Retrieval, and Document Ranking modules. Query caching is the process of registering original
and tagged query pairs in order to avoid running preprocessing and tagging modules whenever
the same query is posed by users. Individual Creator module is intended to produce new
individuals from queries and adding them to the ontology. Finally, the Document Retrieval and
Document Ranking modules are used to retrieve and rank documents according to their level of
relevance. Concept reasoning or inferencing is the main task in the document retrieval process.
The precision, recall, and F-measure techniques are used to evaluate the performance of the
proposed system and the classical IR based on the relevance information provided by experts.
The result shows that the proposed semantic indexer has better performance than the lucene
indexer used in the classical IR.
X
Key Words: Semantic indexing, Football domain ontology, Rule-based information extraction,
Semantic information retrieval, Query processor, Concept tagging.
Description
Keywords
Semantic Indexing; Football Domain Ontology; Rule-Based Information Extraction; Semantic Information Retrieval; Query Processor; Concept Tagging