Amharic Word Sense Disambiguation Using wordnet
No Thumbnail Available
Date
2015-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Words can have more than one distinct meaning and many words can be interpreted in multiple ways
depending on the context in which they occur. The process of automatically identifying the meaning of
a polysemous word in a sentence is a fundamental task in Natural Language Processing (NLP). This
phenomenon poses challenges to Natural Language Processing systems. There have been many efforts
on word sense disambiguation for English; however, the amount of efforts for Amharic is very little.
Many natural language processing applications, such as Machine Translation, Information Retrieval,
Question Answering, and Information Extraction, require this task, which occurs at the semantic level.
In this thesis, a knowledge-based word sense disambiguation method that employs Amharic WordNet
is developed. Knowledge-based Amharic WSD extracts knowledge from word definitions and relations
among words and senses. The proposed system consists of preprocessing, morphological analysis and
disambiguation components besides Amharic WordNet database. Preprocessing is used to prepare the
input sentence for morphological analysis and morphological analysis is used to reduce various forms
of a word to a single root or stem word. Amharic WordNet contains words along with its different
meanings, synsets and semantic relations with in concepts. Finally, the disambiguation component is
used to identify the ambiguous words and assign the appropriate sense of ambiguous words in a
sentence using Amharic WordNet by using sense overlap and related words.
We have evaluated the knowledge-based Amharic word sense disambiguation using Amharic
WordNet system by conducting two experiments. The first one is evaluating the effect of Amharic
WordNet with and without morphological analyzer and the second one is determining an optimal
windows size for Amharic WSD. For Amharic WordNet with morphological analyzer and Amharic
WordNet without morphological analyzer we have achieved an accuracy of 57.5% and 80%,
respectively. In the second experiment, we have found that two-word window on each side of the
ambiguous word is enough for Amharic WSD. The test results have shown that the proposed WSD
methods have performed better than previous Amharic WSD methods.
Keywords: Natural Language Processing, Amharic WordNet, Word Sense Disambiguation,
Knowledge Based Approach, Lesk Algorithm
Description
Keywords
Natural Language Processing, Amharic Wordnet, Word Sense Disambiguation, Knowledge Based Approach, Lesk Algorithm