Hybrid Word Sense Disambiguation Approach for Afaan Oromo Words

No Thumbnail Available

Date

2016-06

Journal Title

Journal ISSN

Volume Title

Publisher

A.A.U

Abstract

Word Sense Disambiguation (WSD) is a technique in the field of NLP where the main task is to find the appropriate sense in which ambiguous word occurs in a particular context. In this thesis we have developed the prototype that offer the related meaning of the ambiguous word based on the underlying contexts. It is found to be a vital to help applications such as Machine Translation, Text Summarization, Question and Answering and Information Retrieval. The problem this research attempts to address is, due to ambiguity of word in Afaan oromo there is a problem in, retrieving documents, text preprocessing, document translation and grammar analysis. Thus objective of this thesis is to design and test a hybrid system which finds the meaning of words based on surrounding contexts combining unsupervised with rule based approach. Hence, this work presents a WSD strategy which combines unsupervised approach that exploits sense in a corpus and the manually crafted rule using hybrid method. The idea behind the approach is to overcome the problem of a bottleneck for the machine learning approaches, while hybrid method can improve the accuracy and suitable when there is scarcity of training data. A fundamental problem with corpus-based approach is sparseness of the training contexts for ambiguous word for assigning appropriate senses. This makes our approach suitable for disambiguation of languages when there is lack of resource and sense definitions. In this work, the meaning and context of a given word is captured using term co-occurrences within a defined window size of words. We have conducted experiments to define the optimal window size in this research. We conclude that the window size for extracting semantic contexts is window 1 and 2 words to the right and left of the ambiguous word achieved better result. The similar contexts of a given senses of ambiguous word are clustered using hierarchical and partitional clustering. Each cluster representing a unique sense. Of the test set, ambiguous words have two senses to five senses. The partitional Clustering (EM and K-means) has yields significantly higher accuracy as compared to hierarchical clustering for context clustering. The achieved result was encouraging; despite it is less resource requirement. The system yield accuracy of 76.05% for the unsupervised and 89.47% hybrid approach respectively. Yet; further experiments using different approaches that extend this work are needed for a better performance.

Description

A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of The Requirements for the Degree of Master Of Science in Information Science

Keywords

Afaan Oromo, Ambiguous Word, Disambiguation, Rule Based

Citation