Hybrid Word Sense Disambiguation Approach for Afaan Oromo Words
No Thumbnail Available
Date
2016-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
A.A.U
Abstract
Word Sense Disambiguation (WSD) is a technique in the field of NLP where the main task is to
find the appropriate sense in which ambiguous word occurs in a particular context. In this thesis
we have developed the prototype that offer the related meaning of the ambiguous word based on
the underlying contexts. It is found to be a vital to help applications such as Machine Translation,
Text Summarization, Question and Answering and Information Retrieval.
The problem this research attempts to address is, due to ambiguity of word in Afaan oromo there
is a problem in, retrieving documents, text preprocessing, document translation and grammar
analysis.
Thus objective of this thesis is to design and test a hybrid system which finds the meaning of
words based on surrounding contexts combining unsupervised with rule based approach. Hence,
this work presents a WSD strategy which combines unsupervised approach that exploits sense in
a corpus and the manually crafted rule using hybrid method. The idea behind the approach is to
overcome the problem of a bottleneck for the machine learning approaches, while hybrid method
can improve the accuracy and suitable when there is scarcity of training data. A fundamental
problem with corpus-based approach is sparseness of the training contexts for ambiguous word
for assigning appropriate senses. This makes our approach suitable for disambiguation of
languages when there is lack of resource and sense definitions. In this work, the meaning and
context of a given word is captured using term co-occurrences within a defined window size of
words. We have conducted experiments to define the optimal window size in this research. We
conclude that the window size for extracting semantic contexts is window 1 and 2 words to the
right and left of the ambiguous word achieved better result. The similar contexts of a given
senses of ambiguous word are clustered using hierarchical and partitional clustering. Each cluster
representing a unique sense. Of the test set, ambiguous words have two senses to five senses.
The partitional Clustering (EM and K-means) has yields significantly higher accuracy as
compared to hierarchical clustering for context clustering. The achieved result was encouraging;
despite it is less resource requirement. The system yield accuracy of 76.05% for the unsupervised
and 89.47% hybrid approach respectively. Yet; further experiments using different approaches
that extend this work are needed for a better performance.
Description
A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of
The Requirements for the Degree of Master
Of Science in Information Science
Keywords
Afaan Oromo, Ambiguous Word, Disambiguation, Rule Based