Concept-Based Automatic Amharic Document Categorization

No Thumbnail Available

Date

2009-01

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Along with the continuously growing volume of information availability, there is a growing interest towards better solutions for finding, filtering and organizing these resources. Automatic text categorization can play an important role in a wide variety of more flexible, dynamic, and personalized information management tasks. The process of automatic text categorization involves calculating similarities between documents and categories using the information extracted from the document. In recent years, ontology-based document categorization method is introduced to solve the problem of document classifier. Previous works on keyword-based document categorization miss some important issues of considering semantic relationships between words. In order to resolve the existing problems, this study proposes a framework that automatically categorizes Amharic documents into predefined categories using knowledge represented in the News ontology. At the heart of the classification system is the knowledge base that enables the representation of different domain concepts. During the classification process, all the documents pass through pre-processing stages. Then index terms are extracted from a given document which is mapped onto their corresponding concepts in the ontology. Finally, the selected document is classified into a predefined category, based on the weighted concept. With the help of News domain ontologies, this study categorizes a given Amharic document into a specific predefined category. The study shows that the use of concepts for Amharic document categorizer results in 92.9% accuracy which is a promising outcome. Keywords: Ontology, Keyword-based, Concept-based text categorization, Knowledge representation

Description

Keywords

Ontology; Keyword-Based; Concept-Based Text Categorization; Knowledge Representation

Citation