Amharic Named Entity Recognition Using A Hybrid Approach

No Thumbnail Available

Date

2014-08

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Named Entity Recognition (NER) is a subcomponent of information extraction (IE) that detects and classifies named entities (NE) which, among others can be proper nouns representing person, location, and organization names and also date, time, and measurements. NER has been also found to be vital for other NLP applications, such as Information Retrieval, Question and Answering, Machine Translation, and Text summarization to mention a few. This research reports the performance of Amharic NER (ANER) built using the hybrid approach and different feature sets to detect and classify NEs of type person, location, and organization. Two state of the art machine learning (ML) algorithms, namely decision tree and support vector machines (SVM), are used to investigate the performance of the hybrid ANER. This is the first research that has used these ML algorithms for ANER and also the first research to explore ANER using the hybrid approach. The rule-based component of the hybrid ANER has been built using two rules that base their predictions on the presence of trigger words before and after NEs. The ML component is built using decision tree (J48) and SVM (libsvm). The hybrid ANER integrates those two components by using the NE class predicted from the rule-based component as a feature in the ML component. We have conducted different experiments to compare the performance of the hybrid approach with that of the pure ML approach by using different feature sets. From our experiments we have obtained a high performing model for both J48 and libsvm algorithms without using the rule-based feature but using POS feature with the nominal flag feature with an F-measure of 96.1% for J48 and 85.9% for libsvm. Based on the experimental results we have concluded that the pure ML approach with POS and nominal flag feature outperformed the hybrid approach. This is because the rule-based component used in the experiment uses only trigger words. Using rules prepared by linguists and gazetteers may improve the rule based component and consequently the hybrid ANER system. Keywords: Amharic Named Entity Recognition, Information Extraction, Decision tree, Support Vector Machine, Hybrid Named Entity Recognition System.

Description

Keywords

Amharic Named Entity Recognition, Information Extraction, Decision tree, Support Vector Machine, Hybrid Named Entity Recognition System

Citation