Word Sense Disambiguation for Amharic Text: a Machine Learning Approach
No Thumbnail Available
Date
2010-06-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The theme of this thesis is Word Sense Disambiguation (WSD) for Amharic which addresses the problem of automatically deciding the correct sense of an ambiguous word based on its surrounding context. WSD is essential for many applications like Machine Translation and Information Retrieval. For the purposes of this research, we report experiments on five selected Amharic ambiguous words.
A corpus based approach to disambiguation is used, where machine learning techniques are applied to a corpus of Amharic sentences so as to acquire disambiguation information automatically. A total of 1045 English sense examples for the five ambiguous words are collected from British National Corpus (BNC) and the sense examples are translated to Amharic using dictionary. The sense examples are manually annotated and preprocessed to make it ready for experiment. Corpus based approach suffers from the so-called knowledge acquisition bottleneck. It needs large quantities of sense examples to learn disambiguation rules .This is very challenging for linguistic resource-deficient languages like Amharic.
Naive-Bayes classifier is employed from Weka 3.62 package in both the training and testing phases to perform the supervised learning on the preprocessed dataset using 10-fold cross-validation. We have evaluated the classifiers for the five ambiguous words and achieved accuracy within the range of 70% to 83% which is very encouraging but further experiments for other ambiguous words and using different approaches needs to be conducted.
Description
Keywords
Sense Disambiguation, Amharic Text, Machine Learning Approach