Concept -Based Automatic Amharic Document Categorization
No Thumbnail Available
Date
2009-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Along with the continuously growing volume of information availability, there is a growing interest
towards better solutions for finding, filtering and organizing these resources. Automatic text
categorization can play an important role in a wide variety of more flexible, dynamic , and
personalized information management tasks.
The process of automatic text categorization involves calculating similarities between documents
and categories using the information extracted from the document. In recent years, ontology-based
document categorization method is introduced to solve the problem of document classifier.
Previous works on keyword-based document categorization miss some important issues of
considering semantic relationships between words. In order to resolve the existing problems, this
study proposes a framework that automatically categorizes Amharic documents into predefined
categories using knowledge represented in the News ontology. At the heart of the classification
system is the knowledge base that enables the representation of different domain concepts.
During the classification process, all the documents pass through pre-processing stages. Then index
terms are extracted from a given document which is mapped onto their corresponding concepts in
the ontology. Finally, the selected document is classified into a predefined category, based on the
weighted concept.
With the help of News domain entomologist, this study categorizes a given Amharic document into a
specific predefined category . The study shows that the use of concepts for Amharic document
categorize results in 92.9% accuracy which is a promising outcome.
Keywords: Ontology, Keyword-based, Concept-based text categorization, Knowledge representation .
Description
Keywords
Ontology, Keyword-based, Concept-based text categorization, Knowledge representation