Design of Amharic Anaphora Resolution Model

No Thumbnail Available

Date

2014-04

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Anaphora resolution is the process of finding an entity which points backward to a word or phrase that has been introduced with more descriptive phrase in the text than the entity or expression which is pointing back. An entity referring back is called anaphor, whereas the word or phrase being referred is called antecedent. Anaphora resolution is used as a component in NLP applications like machine translation, information extraction, question answering and others to increase their effectiveness. Building complete anaphora resolution systems that incorporate all linguistic information is complex and still not achieved because of the different nature of languages and their complexities. In the case of Amharic language, it is even more complex because of its rich morphology. In addition to independent anaphors, unlike other languages like English, Amharic language has anaphors embedded inside words (hidden anaphors). In this work, we have proposed Amharic anaphora resolution model using knowledge poor anaphora resolution approach. The approach uses low levels of linguistic knowledge like morphology to build anaphora resolution systems avoiding the need of complex knowledge like semantic analysis, world knowledge and others. The proposed model takes Amharic texts as input and preprocesses to tag the texts with word classes and various chunks. Anaphors, both independent and hidden, and antecedents are identified from the preprocessed dataset. The model deals with both intrasentential and intersentential type of anaphors. Finally, the resolution process uses constraint and preference rules to identify the correct antecedent referred by the anaphor. To evaluate the performance of the model, Amharic texts are collected from Walta Information Center (WIC) and Amharic Holy Bible and used as datasets. The collected dataset was divided into training and testing datasets based on 10-fold cross validation technique. Based on the collected dataset, we achieved a success rate of 81.79% for resolution of hidden anaphors whereas an accuracy of 70.91% was obtained for resolution of independent anaphors. Keywords: Amharic anaphora resolution, knowledge poor anaphora resolution approach, hidden anaphors.

Description

Keywords

Amharic Anaphora Resolution, Knowledge Poor Anaphora Resolution Approach, Hidden Anaphors

Citation

Collections